More on CAP3
Phillip San Miguel
pmiguel at purdue.edu
Tue Mar 13 12:24:21 EST 2001
After the fairly negative review I gave CAP3 in my last post on this subject, I thought I should
post an update.
We are sequencing barley, wheat, rice, maize and sorghum BACs. Frequently plant BACs have large
numbers of retrotransposons in them. These retrotransposons can cause phrap to misassemble
contigs--especially at standard stringencies. CAP3 takes about 10x longer to run than phrap, but
its assemblies seem correct. I attribute this to its using forward/reverse constraints to aid in the
This is an important consideration for finishing. If a BAC is mis-assembled, then the gaps
present may not be the real gaps. Hence, any naive attempt to close them will meet with failure.
Basically you are just chasing an assembly artifact. So we are finding CAP3 very valuable.
I'm not slamming phrap here--I am completely amazed at the sequences it can correctly assemble.
I never would have predicted that it would be able to correctly assemble a shotgun of a 100 kb
stretch of maize genomic DNA. To bad it doesn't use forward/reverse constraints.
Phillip San Miguel
Purdue Genomics Core Facility
Retrotransposons are usually 5-12 kb in length. Further, they are flanked by 0.3 - 3 kb long
terminal repeats (LTRs). LTRs are direct repeats, usually highly similar in sequence. (In fact, one
expects them to be identical upon insertion.) Also one frequently finds retrotransposons in
clusters--the result of nested insertions. That is, one retro inserts in another, and another in the
second. And so on. To give you an idea, the first long stretch of maize genomic DNA closely studied,
revealed 23 retrotransposons of 11 families in a 250 kb interval flanking the adh1 gene. One 70 kb
cluster was comprised of 8 retrotransposons.
Bruce Roe emailed me in response to my last post on this topic. He advised using higher assembly
stringencies for phrap: -minmatch 30 -maxmatch 55 -minscore 55. This helps a great deal, but there
are still some misassemblies.
More information about the Autoseq