TIGR Annotation 5.0 - Pseudogenes and transposons - clarification

Town, Christopher cdtown at tigr.org
Wed Feb 18 16:41:40 EST 2004


Dear Colleagues



Our previous announcement about the gene content of TIGR annotation 
5.0, we failed to point out a change in our approach towards 
annotating transposons. We also, inadvertently lumped transposons and 
other pseudogenes together with the result that the apparent number 
of protein-coding genes in this latest annotation is decreased from 
previous releases. The following paragraph attempts to clarify the 
situation.



"Transposons and pseudogenes were the last categories of genes to be 
addressed by the re-annotation progress. Many ORFs were originally 
annotated as being similar to transposons or transposon-related 
proteins. However, the majority of these regions are degenerate so 
that it is difficult or impossible to model ORFs across their entire 
extent although shorter ORFs may be contained within the boundaries 
of transposon similarity. Thus the legacy annotation for 
transposon-related sequences consisted of a mixture of genes and 
pseudogenes, only some of which were annotated as transposon-related. 
In the latest release (5.0), we have provided a uniform annotation of 
all transposon-related sequences by searching the entire genome 
against a curated database of transposons sequences and automatically 
applying the corresponding Tn family annotation. The majority of such 
sequences are degenerate and clearly pseudogenes. We have not 
attempted to discriminate between possible complete ORFs and 
pseudogenes in this (transposon) category. There are 2,424 loci 
annotated as transposons in the current release and (in contrast to 
all previous releases) these are no longer included in the count of 
"protein coding genes" nor in that dataset.

In addition to transposon-related sequences, there are approximately 
500 "genuine" pseudogenes that are clearly related to genes of 
identifiable function. Finally, there are ~ 850 pseudogenes that are 
similar to proteins from Arabidopsis or other species that have no 
known function and may represent degenerate ORFs of hypothetical 
proteins yet to be characterized or to proteins from either of the 
above categories. Users should note also that the naming of these 
pseudogenes has not been subjected to the same set of uniform 
curation standards that we applied to the full set of non-pseudogenes 
and thus contains a mixture of TIGR and legacy annotation."



Please feel free to comment or write with questions.



Best wishes



Chris Town

_________________________________

Chris Town

Associate Investigator

The Institute for Genomic Research

9712 Medical Center Drive

Rockville, MD 20850



**********NOTE NEW PHONE NUMBERS**********



Office Phone: 301-795-7523

To page me at TIGR: 301-795-7000

Fax: 301-838-0208



Home Phone: 301-990-0878

Cell Phone: 512-422-8810





More information about the Arab-gen mailing list