Annotated 20000 Promoters in Human Genome in Genome Explorer
webmaster at softberry.com
Tue Jul 17 13:28:43 EST 2001
Promoters predicted in December draft of human genome are presented in
Genome Explorer along with known and predicted genes:
Right mouse click on a promoter in Genome Explorer reveals promoter
sequence, presented in two blocks for TATA+ promoters and in one block for
TATA-less promoters. First block of TATA+ promoter is TATA-box, and the
second is a stretch from predicted transcription start site (TSS) to known 5'-end
of mRNA or translation start site.
Promoters were predicted by Softberry promoter prediction program TSSW in
regions up to 3000 from known starts of coding regions (ATG codon) or known
mapped 5'-mRNA ends. We found that limiting promoter search to such regions
drastically reduces false positive predictions. Also, we have very strong
thresholds for prediction of TATA-less promoters to minimize false positive
Our promoter prediction software accurately predicts about 50% promoters
accurately with a small average deviation from true start site. Such accuracy
makes possible experimental work with found promoter candidates.
For 20 experimentally verified promoters on Chromosome 22, TSSW predicted
15, placed 12 of them within (-150,+150) region from true TSS and 6 (30% of
all promoters) - within -8,+2 region from true TSS.
These results are significantly better than those obtained with PromoterInspector
program (Scherf M., Klingenhoff A., Fresch K. et al. (2001) First Pass
Annotation of promoters of human chromosome 22. Genome Res., 11,333-
340), where only 50% promoters from the same sample were found, with
deviations from true TSS ranging from 200 to 1000 bp.
We predicted 17632 TATA+ promoter and 2383 TATA-less promoters overall
in human genome draft. For Chromosome 22, we predicted 350 TATA+
promoters and 85 TATA-less promoters.
New Fgenesh++ gene predictions for December draft of human genome are
presented by Softberry Inc. (www.softberry.com) at
http://genome.cse.ucsc.edu/goldenPath/decTracks.html and will be presented in
Softberry Genome Explored with some expression data soon
44409 genes include 5883 genes correponding to refseq mRNA, 3592 genes
corresponding to GenBank mRNAs, 2047 known genes and 302 pseudogenes.
Methods of predictions are described at:
Solovyev V.V. (2001) Statistical approaches in Eukaryotic gene prediction. In:
Handbook of Statistical genetics (eds. Balding D. et al.), John Wiley & Sons,
Ltd., p. 83-127.)
More information about the Bio-www