Evidential Genes for Nasonia vitripennis (OGS2)
We are pleased to announce a new official gene set for Nasonia vitripennis
(OGS version 2), incorporating new extensive expression evidence and
Hymenoptera/Arthropoda protein homology.
Nasonia now has one of the most complete hymenopteran gene sets, from the
perspectives of having the fewest missing orthology gene groups, and the most
species-unique genes supported by gene transcript evidence. For common gene
sets of 6 hymenopterans and 4 other arthropods, Nasonia is missing only 375,
compared to 429 missing in Camponotus (carpenter ant), 481 missing in Bombus
(bumble bee), and 632 missing in Apis (honey bee), of 8147 gene groups common
to 6 or more species.
Gene data files, in several formats including detailed evidence annotations
are available here: http://arthropods.eugenes.org/EvidentialGene/nasonia/
A web genome map with GBrowse, BLAST sequence search of these genes, and gene
orthology assessment, are available through the link above.
Gene homology evidence is collected from 220,000 proteins of 2 ants, 3 bees,
Drosophila, pea aphid, Tribolium, Daphnia, and human. Gene expression
evidence includes 164,000 ESTs, 188 million RNA-Seq reads, and genome
tiling-array expression data. Intron splice junctions, transcript assemblies
and alternative splice forms are derived from these.
This combined evidence supports 24,525 good genes with 7836 alternate
transcripts. This new N. vitripennis annotation uncovers twice the number of
duplicated genes than in Tribolium and Drosophila, yet fewer than in pea
aphid, Daphnia or human. A small increase in single copy genes in Nasonia is
similar to ants, but 1.4 times more than in fruit fly. The average gene
coding size is 265 amino acids, the average transcript size is 1.4 Kb, and
97% of gene models now have UTRs, versus 37% in OGS1.
RNA evidence supports 7836 alternate transcripts from 4248 genes. One gene
(lola) stands out with 86 annotated alternate transcripts. The next largest
set is 17 alternates, for fruitless gene that is related to lola. There are
3395 genes that are both expressed and transposon associated, while 1777
others are expressed but from noncoding or aberrant gene models.
This OGS2 gene set is built on the first Nasonia genome assembly, rather than
the current NCBI Nvit_2.0 assembly, which has only minor assembly
improvements. OGS2 does also provide information that can be used to improve
the genome assembly further. There are 550 genes curated from transcript
assemblies that improve on genome sequence gaps and resolve putative
frame-shifts, and 833 genes are an expert's choice, including genes split
over scaffolds, odorant genes, and others.
Comparison of this OGS2 with the Nasonia gene sets OGS1.2 (2009) and NCBI
Refseq2 (Sept 2011) shows substantial overlap among their gene models. Of
the 12,989 NCBIref2 genes, 10,362 are the same loci as in OGS2 and 1655
mostly overlap. Whereas 88 NCBI Refseq2 are missing from OGS2, 12,588 OGS2
loci are not found in NCBI Refseq2. Of the 18,941 OGS1.2 loci, 10,583 are
the same loci as in OGS2, 4226 mostly overlap, and 412 of OGS1 are missing
from OGS2. There are 7495 OGS2 loci that are not found in OGS1.2. This table
summarizes gene evidence recovered in these gene sets.
Gene evidence summary
OGS2 RefSeq2 OGS1
Introns 97% 90% 85%
EST coverage 72% 67% 51%
RNA assembly 63% 36% 29%
Homolog found 100% 89% 89%
Homolog score 679 635 --
Introns : match to EST/RNA spliced introns
EST coverage : overlap with EST exons
RNA assembly : >=66% equivalence with 28016 RNA/EST assemblies
Homolog found : n found for 13772 protein loci common to these gene sets
Homolog score : blastp bitscore average for found of 13772 homologs
Further details on the Nasonia vitripennis OGS2 will be presented in a
forthcoming publication. Access to OGS2 is available at the link above, with
further information on arthropod EvidentialGene sets
Please contact us with your questions, by email. Best wishes,
John (Jack) Werren
-- gilbertd from indiana.edu--http://marmot.bio.indiana.edu/