I am working on a tiny piece of the Drosophila genome project.
I am trying to assemble regions
from the base of the arms,
where Celera assemblies stop, and progress towards the centromeres.
The tools i have been provided are Phred,Phrap, Consed, and blast.
My computers are Solaris 7 on sun hardware, Linux 2.2.18 on an iBook,
and Irix 6.5 on SGI Octane.
My project is BACs subcloned into 3kb and 10kb chunks.
One of the problems i have been struggling with is that the region
contains some repetitive DNA elements, some large repeats that
are highly identical, and some of the repeats are larger than my clone
I was thinking about attacking the problem with a divide and defeat
tactic, but i am new to thinking about how to assemble this gnarly
DNA. What i was thinking of is this:
1. make some fake projects
2. copy the files from all the clones i know or strongly
suspect overlap into this fake project.
3. assemble fake without interference from highly identical
sequences some large distance away.
4. export the consensus as a high quality 'phd' file
as a guide for phrap
5. reassemble the BAC with these guides in place
I was thinking of using a similar tactic for 3 and 10 kb clones that
are obviously placed incorrectly.
some questions I have :
Am I approaching this entirely wrong ?
Does anyone have any suggestions or hints for working with repeats ?
Are there traditional ways to approach this problem ?
Are there any new assemblers that were designed to address
repetitive DNA ?
Does anyone know of other tools(websites,programs,books) i may
find useful ?
Should I drop this project until the tools are available to deal with
repetitive DNA ?
Thank you very much for any help or suggestions. I have exhausted
nearly all ofmy ideas, and have nobody from whom to recieve peer
review, and discuss this project. Curt responses pointing me to FAQs,
websites,jounals etc are fine.