new zebrafish whole genome shotgun

Kerstin Jekosch kj2 at
Thu Apr 3 12:25:48 EST 2003

Please visit our web site at
to find the  

Second assembly Zv2 of the zebrafish genome released

Please note that this is a *preliminary* assembly and there are a
number of points to remember:

There is a high level of misassembly.  This is because the source DNA 
came from ~1000 5 day old embryos and the polymorphism is at least 
1/200bps with additional significant indels.  Thus regions of the 
genome which are highly variable do not form clusters for assembly 
since the sequences that originate from a given region are quite likely 
from different haplotypes. This causes assembly dropouts for some 
regions and false duplications in other regions where phrap splits 
different haplotypes into multiple paths.  We are working on the 
assembly code, Phusion, to address these issues.  However, there is 
an enormous amount of useful sequence in this assembly and hope 
this outweighs the problems in the assembly.  

We tried to include the fingerprint information from our fpc database to 
merge assembly supercontigs.  If this could be done, the new contigs 
were named after the fpc contig that lead to the merge (eg. ctg123).  
However, please not that this assembly is not tied to a map and 
mapping information derived from the contig names are therefore to be 
treated with care. We will offer a search tool to make all mapping 
information for a certain supercontig available soon.  

Although the assembly is being made available as early as possible to 
the research community, an Ensembl gene build has NOT yet been 
performed. An ensembl pre-release however is available.  

Assembly Statistics:

We started with 11737560 reads comprising 7.64 Gbp (651 bps 
average RL).  There are 9953938 unique reads, 84.8 % of the total 
reads, placed in the assembly.  

Phusion was used to cluster the reads and phrap was used for cluster 
assembly and consensus generation  

Small supercontigs with less than 3 reads or smaller than 1kb were 

For the supercontigs (bp measures include estimated gap sizes):

Total bases          = 1306256104
contig number        =    
Average length       =       3030
Largest              =      44497
bases / contigs: N50 = 4451, n =  

Supercontig stats (bp measures include estimated gap
Total bases          = 1452210772
Supercontigs         =     
Average length       =      17398
Largest              =    3581975
bases / contigs: N50 = 296896, n =  

Estimated coverage based on 93 Mbp of 656 finished clones
Supercontig coverage:
Contig coverage:      77%

Dr. Kerstin Jekosch                email kj2 at     
Project Leader                     tel   +44 (0)1223 494971
Zebrafish Genome Analysis          fax   +44 (0)1223 494919
Wellcome Trust Sanger Institute           
Hinxton, Cambridge CB10 1SA, UK

More information about the Zbrafish mailing list