jwoodget at oci.utoronto.ca
Wed May 31 09:30:22 EST 1995
In article <obrien-300595130948 at brass2.med.upenn.edu>,
obrien at pharm.med.upenn.edu (PJOB) writes:
> Is there a FAQ regarding the Expressed Sequence Tag database and
> the clones described therein?
> If not, can anyone tell me if the sequences in the database represent
> the entire insert of a given clone, or just a single sequencing run
> from one of the two universal primers that flank the insert?
Most of the ESTs are single primer (usually UP or RP) runs and are submitted
to the database as independent files (so the same clone will have two entries,
typically). Since the 5' ends are variable, one cDNA can have multiple
entries. The actual databases make no allowance for this (i.e. no attempt has
been made to contig the individual sequences). In addition, several files
I've checked out (as we had the full length clones) have been disappointingly
error-prone. One sequence of 278 bps contained over 15 errors (not including
unassigned bases). The files run from about 75 bp to over 400 in size but
commonly are around 250-300. There's a lot of garbage in there as well as
some really useful stuff and its rate of growth is phenomenal (which should
have a positive effect on error reduction).
What's missing from the public domain are the means to pan the gold from the
More information about the Methods