<x-charset iso-8859-1>The informatics group at TIGR is pleased to
announce Release 4.0 of the
Arabidopsis genome annotation. Here are some statistics for the new release:
5 chromosomes now totaling 119.0 M bp
An updated tiling path now consisting of consists of 1611 BACs (includes PCR
products and MIPS-contigs) with gaps only in centromeric regions. New BACs
have been added, primarily in centromeric regions. The current tiling path
and associated information can be viewed at:
Contains 29,388 annotated genes
27,170 protein-coding genes
In addition to re-evaluating all gene models supported by full-length cDNAs,
all available EST evidence has now been used to validate and update the
annotation, including the construction of gene models that produce alternate
1,267 genes now have alternate splice isoforms that generate a total of 2678
A new set of web pages
provides access to the alignment data supporting these alternate splicing
There are now 12,053 gene models supported by FL-cDNAs, the majority of
which (11,244) also have EST support.
An additional 4,785 genes have EST support only.
17,069 genes have annotated untranslated regions (UTRs).
6,732 genes have a total of 12,454 GO-assignments, contributed by both TIGR
Changes since release 3.0:
444 new protein-coding genes of which 233 are annotated as pseudogenes
2197 gene model updates produced different protein sequences (not including
snRNA gene annotations have been added
Other value additions include documentation of non-consensus splice sites
tml>) , updating of the genome segmental duplication pages
and an inventory of missing genes (cDNAs not in the current tiling path) at
The annotation data are provided on the TIGR FTP site in XML format.
FASTA sequences are provided for proteins, CDS sequences, and the unspliced
transcript sequences for each gene, as well as the BACs and the newly
constructed chromosome sequences. Tentative cDNA sequences are made
available as well, with UTR regions easily differentiated from the CDS
sequence, represented in lower case.
The data can be found at the following locations:
Chromosome XML and tiling path information:
BAC annotations in XML
FASTA sequences <ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES/>
I would like to personally thank all the annotation and data management
group for their hard work.
The Institute for Genomic Research
9712 Medical Center Drive
Rockville, MD 20850.
Office phone: 301-838-3523
To page me at TIGR: 301-838-0200
Home phone: 301-990-0878