TIGR releases latest Arabidopsis Annotation (Version 5.0)
Town, Christopher
cdtown at tigr.org
Wed Jan 28 22:45:32 EST 2004
We are pleased to announce to the plant community that TIGR has
released its final contribution to the Arabidopsis genome
reannotation, ATH1 version 5.0. This has been provided to NCBI and to
TAIR. Both the sequence and the annotation data are available from
the TIGR ftp site at:
<ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/>ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/
The PSEUDOMOLECULES directory contains the fully annotated chromosome
sequences in XML format along with associated data. The SEQUENCES
directory contains just the sequence files for chromosomes, CDS,
proteins, etc. TIGR Annotation for individual BACs in XML format is
in the BACS directory.
Release 5.0 provides the annotation for:
26,207 protein-coding genes and 3,786 pseudogenes. Nearly 15,000 of
the protein-coding gene structures are supported by full-length cDNA
alignments. Also, this release provides our most comprehensive
annotation for alternative splicing isoforms in the Arabidopsis
genome: just over 2,000 genes are annotated with alternatively
spliced transcripts based on EST and full-length cDNA alignments.
During the course of this 3 year process, we have re-visited both the
structural and functional annotations of essentially every gene in
the genome. Over 18,000 transcripts are annotated with upstream
and/or downstream untranslated region (UTR) annotations. In the
SEQUENCES directory, the ATH1.cdna fasta file contains the tentative
cDNA sequences for protein-coding genes with UTR regions provided in
lowercase and the protein-coding region (CDS) provided in uppercase.
The following TIGR web resources have been recently updated using the
release v5.0 annotation data:
-Genes found within segmentally duplicated regions of the Arabidopsis
genome
<http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml>http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml
-Genes found within tandem gene duplications:
<http://www.tigr.org/tdb/e2k1/ath1/TandemDups/TandemGenes.html>http://www.tigr.org/tdb/e2k1/ath1/TandemDups/TandemGenes.html
-User-customizable interface that permits retrieval of sets of
Arabidopsis genes based on the type of physical evidence support,
including matches to other Arabidopsis proteins, non-Arabidopsis
proteins, Arabidopsis ESTs, other plant ESTs, and matches to
full-length cDNAs.
<http://www.tigr.org/tigr-scripts/e2k1/arab_gene_phys_ev_classification.cgi>http://www.tigr.org/tigr-scripts/e2k1/arab_gene_phys_ev_classification.cgi
Additional resources and links to the pages above can be found from our
TIGR Arabidopsis Annotation Database (ATH1) page at:
<http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml>http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml
We thank TAIR, MIPS and individual community members for their input,
assistance and feedback over the past three years. We believe that
Arabidopsis is one of the most completely sequenced and best
annotated genomes at this time and hope that these latest annotation
improvements and updated resources will be useful to the community.
On behalf of all the users of the Arabidopsis genome sequence and
annotation, we would like to thank NSF for their continuous and
generous support of this project.
At this point, TAIR will assume responsibility for future public
releases of updated annotation. The ATH1 database, more or less in
its present form, will remain accessible at TIGR for the foreseeable
future.
Please send comments or questions to
<mailto:arabhelp at tigr.org>arabhelp at tigr.org
More information about the Arab-gen
mailing list