Functional equivalent of GCG Findpatterns?

Tim Cutts timc at chiark.greenend.org.uk
Wed Jan 19 07:17:11 EST 2000


In article <sub1z7e4cf8.fsf at europa.sanger.ac.uk>,
Peter Rice  <pmr at sanger.ac.uk> wrote:
>Keith James <kdj at fes1.sanger.ac.uk> writes:
>
>> Yes, you are right. Having only used it for searching using IUB
>> ambiguity codes I assumed that when it was prompting for 'search
>> pattern' it would accept some sort of regular expression whose syntax
>> was described elsewhere in the docs.
>> 
>> I've just tried it with GCG and Unix regexp type patterns and it
>> doesn't accept them. Now I know!
>
>As already posted, fuzznuc and fuzzpro accept prosite patterns.
>
>However, as EMBOSS includes the Henry Spencer regular expression
>library (both versions) it is very easy to implement simple regular
>expression searches. They are complicated by ambiguity codes in the
>pattern and in the sequence(s), which is why Alan's implementation
>of prosite patterns is currently preferred to making modifications
>to the regexp code.
>
>If there is interest, we could very easily implement a simple regular
>expression search (add regular expressions to the ACD types for the
>user interface, simple loop, about 30 minutes should do it. Then a
>little longer to find a catchy name :-)

I wrote a program, called tpatterns, to do both regexp matching of
nucleotide patterns, but also regexp matching protein patterns against
nucleotide databases translated on the fly inall reading frames.

I initially used Henry Spencer's regexp engine, but switched to Phil
Hazel's PCRE package, for the following reasons:

1)  It's faster than Spencer's.
2)  It stands for Perl Compatible Regular Expressions; i.e. it supports
all those funky extras that perl has.

Anyone interested in a copy of the tpatterns source should mail me, and
if there's sufficient interest I'll put the source up on the WWW
somewhere.

Tim.






More information about the Bio-soft mailing list