20600 Drosophila Genes/NEW annotation/Search

webmaster webmaster at softberry.com
Tue Nov 14 21:07:43 EST 2000


    New Ab initio annotation of sequences in Drosophila genome:
                 (20622 Genes and 75935 exons)
available for public usage at
        httpd://www.softberry.com/inf/infodb.shtml
Search in Drosophila proteins: 
        httpd: //www.softberry.com/scan/scan.shtml

Recently,  the nucleotide sequence of nearly all euchromatic portion of the 
Drosophila genome (~120 MB)  has been determined (Adams et al., 2000).  
We annotated these sequences (at httpd: /genomic.sanger.ac.uk/inf/infodb.shtml) 
predicting genes using  Fgenesh program and checking similarity of  each exon 
with the EST and protein databases using Blast program (Altshul et al.,1977). 
Later some additional sequencing and sequence improvements were provided. 

WE REPEAT AB INITIO PREDICTION ON IMPROVED SEQUENCES and annotated exons 
by PfamA domains. The results of this analysis are presented in Table 1 
and can be seen in the InfoGene database at httpd: //www.softberry.com/inf/infodb.shtml. 
In this table we present SETS of GENES AND EXONS with removing (filtering out) 
most unreliable genes in addition to computer predicted genes. 
We use 2 criteria: 1) Remove genes with total length of protein coding 
region less then 30 amono acids and 2) Remove genes with total score of exons < 15. 
Such filtering was proved to be useful to improve the accuracy of prediction 
(Salamov, Solovyev, 2000, Genome Res.,10,516-522). We should note that 20622 
genes includes some pseudogenes and genes of mobile elements.

The Blast/DBscan search against the predicted drosophila proteins is provided at this site: 
httpd: //www.softberry.com/scan/scan.shtml.

The sequences of exons and gene annotation data can be copied from 
httpd: //www.softberry.com/inf/dro_ann.shtml for using them to 
create microarray  oligos:

Table 1. Summary of predicted genes and proteins in Drosophila genome sequences
	  	 X	2L        2R	3L      3R     4    Y  Unknown Total
Size (MB)	22.2 	23.0   21.4	24.1   28.3  1.2  0.02 4.6     124.8 
Genes predicted 4071   4610    4573    4851   4962   133    1  691     24884
       filtered 3349   3768    3915    4017   4962   105    1  504     20622

United/PfamA dom 1138  1193    1287    1216   1654   58	    0	76     6622
   Interesting that this ab initio predictions by FGENESH  produced about 7 thousands 
more than annotated by Celera scientists (after filtering) and academia coauthors. 
We however do not remove genes of mobile elements. Because any gene prediction approach 
is not perfect it will be useful  to analyze all different predictions to identify new genes. 
   
   


---







More information about the Bio-soft mailing list