GENES in Human genome draft:

webmaster webmaster at softberry.com
Thu Dec 14 00:17:44 EST 2000


GENES in Human genome draft: 

    (49171 Genes and 282378 exons) 

We presented for free public USAGE Human genes predicting genes by one of the most accurate 
FGENESH program in a draft of HUMAN genome assembled by UCSC Human Genome Project Team 

Thanks to Domenick Venezia who pointed us to a bug in file hdg.exo, which 
resulted in missing predicted exons for Chromosome 21. 

The missing exons from chromosome 21 were added on Dec. 13, 2000. 
If you downloaded hgd.exo file prior to that date, please download the fixed verion: 

at http://www.softberry.com/inf/humd_an.html 

(That did not affect exon amino acid sequences in file hgd.exp below). 

The complete results of this analysis are presented in Table 1 and can be seen 
  in the InfoGene database at: 
              http://www.softberry.com/inf/infodb.html 

where the Infogen Java viewer can by used to visualize the predictions along 
the chromosomes and by Action meny and Obtain Locus to get Prediction data 


  The sequences of exons and gene annotation data can be copied 
for using them locally or to create microarray oligos: 

>Human genome predicted genes/exons 
>Predicted amino acid sequences of exons with PfamA annotation 

Table 1. Summary of predicted genes and proteins in Human genome sequences 

           GENES    EXONS     BASES       MASKED+N  %N %N+M  GENE_PER EXON_PER 
   Total:  49171   282378   3374262130   1755813225  19  52    68623    11949 

Predicted Genes annotated using Pfam similarity search 

Later we plan to annotate also CELL LOCATION of predicted proteins 

Total number of different types pfamA domains - 1154 
    (the same domains in neighbor exons counted here one time) 
 
 467 pkinase Eukaryotic protein kinase domain 
 372 7tm_1 7 transmembrane receptor (rhodopsin family) 
 308 Myc_N_term Myc amino-terminal region 
 256 Topoisomerase_I Eukaryotic DNA topoisomerase I 
 224 ig Immunoglobulin domain 
 183 rrm RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain) 
 182 PH PH domain 
 180 Myosin_tail Myosin tail 
 166 EGF EGF-like domain 
 159 filament Intermediate filament proteins 
 154 Syndecan Syndecan domain 
 143 ras Ras family 
 138 RNA_pol_A2 RNA polymerase A/beta'/A" subunit 
 123 BTB BTB/POZ domain 
 and etc...


---






More information about the Biochrom mailing list