GENES in Human genome draft:

webmaster webmaster at softberry.com
Thu Dec 14 04:36:15 EST 2000


GENES in Human genome draft: 

    (49171 Genes and 282378 exons) 

We presented for free public USAGE Human genes predicting genes by one of the most accurate 
FGENESH program in a draft of HUMAN genome assembled by UCSC Human Genome Project Team

Thanks to Domenick Venezia who pointed us to a bug in file hdg.exo, which
resulted in missing predicted exons for Chromosome 21.

The missing exons from chromosome 21 were added on Dec. 13, 2000. 
If you downloaded hgd.exo file prior to that date, please download the fixed verion:

at http://www.softberry.com/inf/humd_an.html
 
(That did not affect exon amino acid sequences in file hgd.exp below).

The complete results of this analysis are presented in Table 1 and can be seen 
  in the InfoGene database at: 
              http://www.softberry.com/inf/infodb.html 

where the Infogen Java viewer can by used to visualize the predictions along 
the chromosomes and by Action meny and Obtain Locus to get Prediction data 


  The sequences of exons and gene annotation data can be copied 
for using them locally or to create microarray oligos: 

>Human genome predicted genes/exons 
>Predicted amino acid sequences of exons with PfamA annotation 

Table 1. Summary of predicted genes and proteins in Human genome sequences 

           GENES    EXONS     BASES       MASKED+N  %N %N+M  GENE_PER EXON_PER 
   Total:  49171   282378   3374262130   1755813225  19  52    68623    11949 

Predicted Genes annotated using Pfam similarity search

Later we plan to annotate also CELL LOCATION of predicted proteins
(by our ProtComp program available at:http://www.softberry.com/protein.html)


 Total number of different types pfamA domains - 1154 
   (the same domains in neighbor exons counted here one time) 

467 pkinase Eukaryotic protein kinase domain 
372 7tm_1 7 transmembrane receptor (rhodopsin family) 
308 Myc_N_term Myc amino-terminal region 
256 Topoisomerase_I Eukaryotic DNA topoisomerase I 
224 ig Immunoglobulin domain 
183 rrm RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain) 
182 PH PH domain 
180 Myosin_tail Myosin tail 
166 EGF EGF-like domain 
159 filament Intermediate filament proteins 
154 Syndecan Syndecan domain 
143 ras Ras family 
138 RNA_pol_A2 RNA polymerase A/beta'/A" subunit 
123 BTB BTB/POZ domain 
and etc......


---






More information about the Bio-www mailing list