estimated number of ORFs in total?

Keith Bradnam keith at
Thu Jul 27 08:29:39 EST 2000

On 26 Jul 2000, H Hieronymus wrote:

>  Basic question: what are the estimates for total number of ORFs in the
>  Arabidopsis genome? References would be most appreciated.

Popular estimates for the total number of ORFs/genes are in the region
20-25,000.  The lower figure is perhaps more common in older reports, e.g.

but as some of the recently sequenced eukaryotic genomes have shown, e.g.
yeast, many genes are only properly identified following completion of the

The Arabidopsis Information Resource (TAIR) holds details on about 1500
clones which provide coverage of the genome.  Nearly all of these clones
have been sequenced (the genome sequence - according to TAIRs AGI
statistics as of today - is 92.8% complete) and released to the public
databases.  The remainder are currently being sequenced, and will shortly
be released to the databases.

The Arabidopsis Genome Resource (AGR) updates itself with daily updates
from EMBL.  As of this morning we have 1,124 sequences associated with
with the genome (i.e. AGI sequences).  The number of predicted open
reading frames in these sequences is 15,880.  However, many of the AGI
sequences have not yet been subjected to ORF predictions and so about 300
of the 1,124 have (as yet) no predicted ORFs.


in about 800 AGI sequences (which have had gene/ORF predictions) there are
15,880 predicted ORFs.  This suggests that the number of genes in the
1,500 or so AGI sequences might be quite a bit higher than 25,000.

However these back-of-napkin calculations don't consider the different
lengths of the AGI clones, i.e. if the remaining clones are shorter (on
average) than those sequenced so far, then my figure is an overestimate.

Furthermore, ORF predictions programs are just that, *ORF* predictions -
which don't always correspond to real, functional genes.  A better idea of
the real gene complement will be achieved when the 100,000 or so
Arabidopsis ESTs are all matched to the genome sequence.

Even then, just because something is transcribed doesn't mean it will
produce a functional protein...but I'm probably getting a bit pedantic


P.S. Of course you could just take the two completed chromosomes (see
Nature, 1999, 402, 761-777) and say that if the (combined) length of these
chromosomes is ~37 Mb and the combined number of genes is ~7,750 then this
corresponds to about 27,000 genes in a 130 Mb genome.

