EST clustering and assembling

Phillip San Miguel pmiguel at
Tue Jan 29 08:26:10 EST 2002

Andrea Hansen wrote:

> Hi,
> is there anybody with experience in clustering and assembling of
> EST data? What do you think about CAP3?
> How many sequences can I assemble with this program?
> [...]

I use phrap and CAP3. The problem with phrap ( is that
it will typically create nonsense contigs where every read in the
contig has 75% of its bases disagreeing with the consensus. I
typically use the following parameters:

phredPhrap -shatter_greedy -penalty -9 -minscore 50

and they help somewhat. But generally I end up with at least a few
nonsense contigs. So I use the .fasta.screen and .fasta.screen.qual
files generated by phredPhrap to do a CAP3 assembly.  I've assembled
1000's of ESTs this way. 10,000's should work too. CAP3 is slower
than phrap. It might take 24 hrs to run with 7000 ESTs--whereas
phrap will only take a couple of hours. This is on a Sun E450 server
with 4 GB of RAM.

Phillip SanMiguel
Purdue Genomics Core Facility

More information about the Methods mailing list