TIGR releases latest Arabidopsis Annotation (Version 5.0)

Town, Christopher cdtown at tigr.org
Wed Jan 28 22:45:32 EST 2004


We are pleased to announce to the plant community that TIGR has 
released its final contribution to the Arabidopsis genome 
reannotation, ATH1 version 5.0. This has been provided to NCBI and to 
TAIR. Both the sequence and the annotation data are available from 
the TIGR ftp site at: 
<ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/>ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/

The PSEUDOMOLECULES directory contains the fully annotated chromosome 
sequences in XML format along with associated data. The SEQUENCES 
directory contains just the sequence files for chromosomes, CDS, 
proteins, etc. TIGR Annotation for individual BACs in XML format is 
in the BACS directory.



Release 5.0 provides the annotation for:

26,207 protein-coding genes and 3,786 pseudogenes.  Nearly 15,000 of 
the protein-coding gene structures are supported by full-length cDNA 
alignments.  Also, this release provides our most comprehensive 
annotation for alternative splicing isoforms in the Arabidopsis 
genome: just over 2,000 genes are annotated with alternatively 
spliced transcripts based on EST and full-length cDNA alignments.

During the course of this 3 year process, we have re-visited both the 
structural and functional annotations of essentially every gene in 
the genome. Over 18,000 transcripts are annotated with upstream 
and/or downstream untranslated region (UTR) annotations.  In the 
SEQUENCES directory, the ATH1.cdna fasta file contains the tentative 
cDNA sequences for protein-coding genes with UTR regions provided in 
lowercase and the protein-coding region (CDS) provided in uppercase.



The following TIGR web resources have been recently updated using the 
release v5.0 annotation data:

-Genes found within segmentally duplicated regions of the Arabidopsis 
genome 
<http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml>http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml



-Genes found within tandem gene duplications: 
<http://www.tigr.org/tdb/e2k1/ath1/TandemDups/TandemGenes.html>http://www.tigr.org/tdb/e2k1/ath1/TandemDups/TandemGenes.html



-User-customizable interface that permits retrieval of sets of 
Arabidopsis genes based on the type of physical evidence support, 
including matches to other Arabidopsis proteins, non-Arabidopsis 
proteins, Arabidopsis ESTs, other plant ESTs, and matches to 
full-length cDNAs.

<http://www.tigr.org/tigr-scripts/e2k1/arab_gene_phys_ev_classification.cgi>http://www.tigr.org/tigr-scripts/e2k1/arab_gene_phys_ev_classification.cgi



Additional resources and links to the pages above can be found from our

TIGR Arabidopsis Annotation Database (ATH1) page at: 
<http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml>http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml



We thank TAIR, MIPS and individual community members for their input, 
assistance and feedback over the past three years. We believe that 
Arabidopsis is one of the most completely sequenced and best 
annotated genomes at this time and hope that these latest annotation 
improvements and updated resources will be useful to the community.



On behalf of all the users of the Arabidopsis genome sequence and 
annotation, we would like to thank NSF for their continuous and 
generous support of this project.



At this point, TAIR will assume responsibility for future public 
releases of updated annotation. The ATH1 database, more or less in 
its present form, will remain accessible at TIGR for the foreseeable 
future.



Please send comments or questions to 
<mailto:arabhelp at tigr.org>arabhelp at tigr.org







More information about the Arab-gen mailing list