[Staden] Benchmarking reassembly/Loading fasta files

N.E.Whiteford N.E.Whiteford at soton.ac.uk
Thu Jul 13 12:04:36 EST 2006


Hi All,

As part of my PhD project I'm working on a tool to benchmark reassembly
algorithms. To do this I'm planning on doing the following:

1. Taking a sequence file and breaking it into reads of a specified
   length and during this process adding errors.

2. Reassembly these simulated reads with the reassembly programs
   available in GAP4.

3. Align contigs of a useful size to the original sequence, note those
   that align within a given edit distance.

4. Calculate the percentage of the sequence that is covered by contigs.

I have just completed the alignment with edit distance tool and am now
beginning the processes of benchmarking reassembly algorithms. Does
anybody have any thoughts or suggestions? I should say that my main
interest is short read reassembly.

Secondly, I'm having a problem with GAP4. It only seems to load 
19 sequences from my fasta file. My fasta file looks like this:

>R0
CCAATTAGTCCTATTAAGAC

>R1
CAATTAGTCCTATTAAGACT

>R2
AATTAGTCCTATTAAGACTG

>R3
ATTAGTCCTATTAAGACTGT

However if I include any more than 19 sequences in my fasta file I 
get the following error:

Failed files:
    /home/new/A1.fasta (UNK) 'init: Unknown file type'

Is this a bug? Or I'm I doing something wrong?

Many Thanks for Reading,

Nava Whiteford





More information about the Staden mailing list