I am working on a part of the drosophila genome project trying to assemble repetive
DNA sequences. I have been using the Phred, Phrap, and Consed to work with my
projects. My project is a BAC, split into 10kb and 3kb clones.
I was planning on starting at the base of the arms where the celera data left off,
and progressing towards to centromere. Unfortunately dealing with repeats that are
highly identical has proved extremely challenging if the repeat size is greater than the clone size.
I was thinking of using a divide and defeat strategy to get around the problem by
creating fake projects composed of clones where i know or suspect strongly
most of them overlap. Clones that i don't know where they go can also be
assembled independently from the main project. Then i can use these consensus
sequences as guides for reassembling the original BAC. Unfortunately this
requires that the coverage for the clone is complete enough to make a single
contig, or at least identify the gaps required to complete it.
Anyone have any ideas on how to approach such a project ?
Am I approaching this completely wrong ?
How should I be approaching this ?
Can anyone recommend different assemblers and tools that they have found
useful ? Are there any new assemblers that need bata testers ?
-- I would prefer software with source available, as i prefer working on
a macintosh iBook, running debian linux 2.2.18, and binary distributions often
aren't available. I also have access to a Sun ultra 10, and SGI Octane.
Can anybody point me towards resources i may find useful ?
Should I just give up on this project until the resources available can handle these
sort of project ?
I have exhausted all of my ideas, and have nobody with which to discuss the
project, and recieve peer review. Any help at all would be appreciated.