IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

EST-insert size

Kevin J. Laddison laddisonkj at pfizer.com
Thu Oct 24 08:28:43 EST 1996

> Discrepancy actual EST-insert size and length of the database
> sequence
> Dear collegues,
> I recently posted a message that EST clones obtained from a Resource
> centre did not contain any insert. Somebody suggested to order clones
> from an another centre which worked nice because these clones
> have inserts. But only one from five different ESTs carries an insert of the
> expected size (0.5 kb) wheras the other inserts are larger (around
> one kb).  I regret if the question is stupid, but do the EST people only enter
> the size of the SEQUENCED DNA into the database and ignore the rest
> of the clone?

That is correct.

> If so why do they not give the insert size?

Based on my experience with Merck/WashU and Incyte EST clones, the 
people that do the sequencing don't know the insert size.  If I 
understand the process properly robots pick colonies which are then 
grown and mini-prepped by another robot.  The resulting DNA is then 
sequenced by single-pass automatic sequencing.  The data resulting is 
then anotated and dumped into a database.  There is very little human 
time spent on EST clones, and since sizing an insert does take hands 
on effort, most groups don't bother with it until a clone is 
determined to be interesting for some other reason 

> If not should I dump the 1 kb clones or are they worth further
> characterization and sequencing.

As I said above, the sequence for the EST is only single pass, which 
is often about 500bp.  I have recieved plasmids which correspond to 
published EST's of that length that had up to 2kb inserts.  I would at 
least end-sequence your clones to be certain about what they contain.

Another way to get at the size of the insert which corresponds to an 
EST is to check to see if there are both 5-prime and 3-prime sequences 
of the plasmid available.  That usually depends on which group was 
doing the sequencing, of course.  For example, look at the Genbank 
definition field for R18795.  If you do a search for yf66e08 you 
should get two sequences, if they overlap then you know how large the 
insert is, if they don't than you know that the insert is larger than 
the length of the sequences added together.  Either that or one of the 
sequences is miss-annotated, you'll have to determine that yourself.

Kevin J. Laddison
Pfizer Central Research
Groton, Ct	USA

More information about the Methods mailing list

Send comments to us at biosci-help [At] net.bio.net