Annotated 20000 Promoters in Human Genome

webmaster webmaster at
Wed Jul 11 16:50:48 EST 2001

Annotated 20000 Promoters in Human Genome in Genome

Promoters predicted in December draft of human
genome are presented in 
 Genome Explorer along with known and predicted
 Right mouse click on a promoter in Genome Explorer
 reveals promoter 
 sequence, presented in two blocks for TATA+
 promoters and in one block for 
 TATA-less promoters. First block of TATA+ promoter
 is TATA-box, and the 
 second is a stretch from predicted transcription
 start site (TSS) to known 5'-end 
 of mRNA or translation start site. 
  Promoters were predicted by Softberry promoter
 prediction program TSSW in 
 regions up to 3000 from known starts of coding
 regions (ATG codon) or known 
 mapped 5'-mRNA ends. We found that limiting promoter
 search to  such regions 
 drastically reduces false positive predictions.
 Also, we have very strong 
 thresholds for prediction of TATA-less promoters to
 minimize false positive 
  Our promoter prediction software accurately
 predicts about 50% promoters 
 accurately with a small average deviation from true
 start site. Such accuracy 
 makes possible experimental work with found promoter
 For 20 experimentally verified promoters on
 Chromosome 22, TSSW predicted 
 15, placed 12 of them  within (-150,+150) region
 from true TSS and 6 (30% of 
 all promoters) - within -8,+2 region from true TSS.
 These results are significantly better than those
 obtained with PromoterInspector 
 program (Scherf M., Klingenhoff A., Fresch K. et al.
 (2001) First Pass 
 Annotation of promoters of  human chromosome 22.
 Genome Res., 11,333-
 340), where only 50% promoters from the same sample
 were found, with 
 deviations from true TSS ranging from 200 to 1000
 We predicted 17632 TATA+ promoter and 2383 TATA-less
 promoters overall 
 in human genome draft. For Chromosome 22, we
 predicted 350 TATA+ 
 promoters and 85 TATA-less promoters.
 New Fgenesh++ gene predictions for December draft of
 human genome are 
 presented by Softberry Inc. ( at
 and will be presented in 
 Softberry Genome Explored with some  expression data
  44409 genes include 5883 genes correponding to
 refseq mRNA, 3592 genes 
 corresponding to GenBank mRNAs, 2047 known genes and
 302 pseudogenes.
 Methods of predictions are described at:
 Solovyev V.V. (2001) Statistical approaches in
 Eukaryotic gene prediction. In: 
 Handbook of Statistical genetics (eds. Balding D. et
 al.),  John Wiley & Sons, 
 Ltd., p. 83-127.)



Do You Yahoo!?
Get your free address at
or your free address at

More information about the Genstruc mailing list