Dear colleagues,
We have designed and implemented an new algorithm which searches a sequence
database (such as the EMBL Data Library, Genbank or the Swiss-Prot) to find all
such sequences in the database that contain a block (a subsequence) which
is very similar to some given pattern sequence in the sense that the
differences (according to some distance metric, such as the edit distance)
between the block (subsequence) found and the pattern sequence is no more than
some given constant.
The algorithm is expected to be very efficient theoretically. But we want to
know if it is efficient and usefull in biological practice too. For this purpose
we need a number of real (DNA, RNA or protein) sequences which can be used
as pattern sequences which will make the database searches meaningful. The
pattern sequences used for searching a nucleotide sequence database probably
should be different from those used for searching a protein sequence database?
Could you please send us or point us to some of such pattern sequences
(seperately, in order to search nucleotide database and protein database
separately) ?! We are pure computer scientists. We need your help, particularly
the help from biologists and chemists!
Thank you very much in advance!
--------------------------------------------------------
Shi Fei
Institut fuer Theoretische Informatik
ETH-Zentrum
Ch-8092 Zurich
Switzerland
e-mail: shi at inf.ethz.ch
Fax: 0041-1-262-3973
Phone: 0041-1-254-7403
--------------------------------------------------------------