TIGR Annotation Release 4.0

Town, Christopher D. cdtown at tigr.org
Fri Apr 18 16:17:00 EST 2003


<x-charset iso-8859-1>The informatics group at TIGR is pleased to 
announce Release 4.0 of the
Arabidopsis genome annotation. Here are some statistics for the new release:

5 chromosomes now totaling 119.0 M bp
An updated tiling path now consisting of consists of 1611 BACs (includes PCR
products and MIPS-contigs) with gaps only in centromeric regions. New BACs
have been added, primarily in centromeric regions. The current tiling path
and associated information can be viewed at:
http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml

Contains 29,388 annotated genes
	27,170 protein-coding genes
	2,218 pseudogenes

In addition to re-evaluating all gene models supported by full-length cDNAs,
all available EST evidence has now been used to validate and update the
annotation, including the construction of gene models that produce alternate
splicing isoforms.
1,267 genes now have alternate splice isoforms that generate a total of 2678
proteins.
A new set of web pages
(http://www.tigr.org/tdb/e2k1/ath1/altsplicing/splicing_variations.shtml)
provides access to the alignment data supporting these alternate splicing
patterns.

There are now 12,053 gene models supported by FL-cDNAs, the majority of
which (11,244) also have EST support.
An additional 4,785 genes have EST support only.

17,069 genes have annotated untranslated regions (UTRs).

6,732 genes have a total of 12,454 GO-assignments, contributed by both TIGR
and TAIR.

Changes since release 3.0:
444 new protein-coding genes of which 233 are annotated as pseudogenes
2197 gene model updates produced different protein sequences (not including
splice isoforms)

snRNA gene annotations have been added

Other value additions include documentation of non-consensus splice sites
(<http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_nonconsensus_splice_sites.sh
tml>) , updating of the genome segmental duplication pages
(<http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml>) ,
and an inventory of missing genes (cDNAs not in the current tiling path) at
<http://www.tigr.org/tdb/e2k1/ath1/missing_genes.shtml> .

The annotation data are provided on the TIGR FTP site in XML format.
FASTA sequences are provided for proteins, CDS sequences, and the unspliced
transcript sequences for each gene, as well as the BACs and the newly
constructed chromosome sequences.  Tentative cDNA sequences are made
available as well, with UTR regions easily differentiated from the CDS
sequence, represented in lower case.

The data can be found at the following locations:

Chromosome XML and tiling path information:
<ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/PSEUDOCHROMOSOMES/>

BAC annotations in XML
<ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/BACS/>

FASTA sequences <ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES/>

I would like to personally thank all the annotation and data management
group for their hard work.

_____________________________________
Chris Town
Associate Investigator
The Institute for Genomic Research
9712 Medical Center Drive
Rockville, MD 20850.

Office phone: 301-838-3523
To page me at TIGR: 301-838-0200
Fax: 301-838-0208

Home phone: 301-990-0878

---

</x-charset>



More information about the Arab-gen mailing list