Repetitive sequence database

Pankaj Agarwal agarwal at ibc.wustl.edu
Fri Aug 26 13:18:03 EST 1994


In article <33khkk$4kj at mserv1.dl.ac.uk> LAVOLPE at IIGBNA.IIGB.NA.CNR.IT (Adriana) writes:

 >> I have welcome with much interest the database of repetitive DNA sequence

Thank you for your interest. 

 >> - There is no indicization of the classified sequences with the already
 >> described ones (easy to get from literature, the Genbank and in most cases
 >> also mapped on the C. elegans physical map). 

Each family is identified by including the blast results of comparison
of a consensus sequence for the family against the C. elegans portion
of genbank.

 >> - I tried my self to identify to which family in the database correspond
 >> each of those already known, in most cases (i.e. Rc35) to one described
 >> family correspond 14 in the database, what are these 14? are different loci
 >> of the same family or what? If so are the 7629 total families much less
 >> than that?

In fact there are only 3212 families in the database (the ISMB-94
paper is slightly out of date), and the number of "true" families is a
lot less. In this classification, we DO NOT combine sequences with
vastly different lengths into the same family, and families may also
be split up due to insertion or deletion mutations.  Thus, there is
significant overlap between families.  However, we will soon be able
list all the families related to any given family on the WWW server.

 >> -Once identified an interesting family how get more information about it?

It would be useful, if you could specify what other information you
are interested in and we will make it available if possible. A .ace
file for acedb is also available (currently by sending me email),
which will help you locate the family on the physical map.

 >> - Why if I digit a 12bp DNA query sequence the program only search for 11bp?

There was a program bug in the search, and should be fixed.

 >> - I failed to pick up transposons with this program, it is correct?

That is correct. However, there are families related the various
transposons. The problem is with the consensus sequences. We are
using an extended code for specifying bases in the consensus sequence,
and blastn does not score those as matches. We plan to use a  modified
version of blastp to achieve better sensitivity. If you have any
suggestions for a different identification scheme, we would like to
hear about it.

 >> Adriana La Volpe

Thanks for the feedback. This is extremely useful. We are constantly
examining ways to improve the server.

Pankaj
--
Pankaj Agarwal
Institute for Biomedical Computing
Washington University, St. Louis, MO, USA
agarwal at ibc.wustl.edu



More information about the Celegans mailing list