new zebrafish whole genome shotgun
Kerstin Jekosch
kj2 at sanger.ac.uk
Thu Apr 3 12:25:48 EST 2003
Please visit our web site at http://www.sanger.ac.uk/Projects/D_rerio/
to find the
Second assembly Zv2 of the zebrafish genome released
Please note that this is a *preliminary* assembly and there are a
number of points to remember:
There is a high level of misassembly. This is because the source DNA
came from ~1000 5 day old embryos and the polymorphism is at least
1/200bps with additional significant indels. Thus regions of the
genome which are highly variable do not form clusters for assembly
since the sequences that originate from a given region are quite likely
from different haplotypes. This causes assembly dropouts for some
regions and false duplications in other regions where phrap splits
different haplotypes into multiple paths. We are working on the
assembly code, Phusion, to address these issues. However, there is
an enormous amount of useful sequence in this assembly and hope
this outweighs the problems in the assembly.
We tried to include the fingerprint information from our fpc database to
merge assembly supercontigs. If this could be done, the new contigs
were named after the fpc contig that lead to the merge (eg. ctg123).
However, please not that this assembly is not tied to a map and
mapping information derived from the contig names are therefore to be
treated with care. We will offer a search tool to make all mapping
information for a certain supercontig available soon.
Although the assembly is being made available as early as possible to
the research community, an Ensembl gene build has NOT yet been
performed. An ensembl pre-release however is available.
Assembly Statistics:
We started with 11737560 reads comprising 7.64 Gbp (651 bps
average RL). There are 9953938 unique reads, 84.8 % of the total
reads, placed in the assembly.
Phusion was used to cluster the reads and phrap was used for cluster
assembly and consensus generation
Small supercontigs with less than 3 reads or smaller than 1kb were
rejected.
For the supercontigs (bp measures include estimated gap sizes):
Contig
stats:
Total bases = 1306256104
bps
contig number =
430985
Average length = 3030
bps
Largest = 44497
bps
bases / contigs: N50 = 4451, n =
87069
Supercontig stats (bp measures include estimated gap
sizes):
Total bases = 1452210772
bps
Supercontigs =
83470
Average length = 17398
bps
Largest = 3581975
bps
bases / contigs: N50 = 296896, n =
1397
Estimated coverage based on 93 Mbp of 656 finished clones
gives:
Supercontig coverage:
95%
Contig coverage: 77%
--
Dr. Kerstin Jekosch email kj2 at sanger.ac.uk
Project Leader tel +44 (0)1223 494971
Zebrafish Genome Analysis fax +44 (0)1223 494919
Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
More information about the Zbrafish
mailing list