EST clones

Jim Woodgett jwoodget at oci.utoronto.ca
Wed May 31 09:30:22 EST 1995


In article <obrien-300595130948 at brass2.med.upenn.edu>, 
obrien at pharm.med.upenn.edu (PJOB) writes:
 
> Is there a FAQ regarding the Expressed Sequence Tag database and 
> the clones described therein? 
>  
>  If not, can anyone tell me if the sequences in the database represent 
> the entire insert of a given clone, or just a single sequencing run 
> from one of the two universal primers that flank the insert?  

Most of the ESTs are single primer (usually UP or RP) runs and are submitted 
to the database as independent files (so the same clone will have two entries, 
typically).  Since the 5' ends are variable, one cDNA can have multiple 
entries.  The actual databases make no allowance for this (i.e. no attempt has 
been made to contig the individual sequences).  In addition, several files 
I've checked out (as we had the full length clones) have been disappointingly 
error-prone.  One sequence of 278 bps contained over 15 errors (not including 
unassigned bases).  The files run from about 75 bp to over 400 in size but 
commonly are around 250-300.  There's a lot of garbage in there as well as 
some really useful stuff and its rate of growth is phenomenal (which should 
have a positive effect on error reduction).

What's missing from the public domain are the means to pan the gold from the 
gravel.

Jim







More information about the Methods mailing list