Release 22 of TREMBL, a protein sequence database supplementing SWISS-PROT

Maria Jesus Martin martin at ebi.ac.uk
Mon Nov 4 22:06:46 EST 2002


INTRODUCTION
============

TrEMBL is a computer-annotated protein sequence database
supplementing the SWISS-PROT Protein Knowledgebase. TrEMBL
contains the translations of all coding sequences (CDS)
present in the EMBL/GenBank/DDBJ Nucleotide Sequence
Databases and also protein sequences extracted from the
literature or submitted to SWISS-PROT, which are not yet
integrated into SWISS-PROT. TrEMBL can be considered as a
preliminary section of SWISS-PROT. For all TrEMBL entries
which should finally be upgraded to the standard SWISS-PROT
quality, SWISS-PROT accession numbers have been assigned.

RELEASE 22.0 OF TrEMBL
=====================

This TrEMBL release was created from the EMBL Nucleotide
Sequence Database release 72 and contains 821'014 entries
and 36'790'365 amino acids. To minimize redundancy, the
translations of all coding sequences (CDS) in the EMBL
Nucleotide Sequence Database already included in
SWISS-PROT release 40 and updates until 25.10.02 have
been removed from TrEMBL release 22.

TrEMBL is split in two main sections: SP-TrEMBL and
REM-TrEMBL:
SP-TrEMBL (SWISS-PROT TrEMBL) contains the entries
(734'427) which should be eventually incorporated into
SWISS-PROT. SWISS-PROT accession numbers have been
assigned for all SP-TrEMBL entries.

SP-TrEMBL is organized in subsections:

arc.dat (Archaea): 1694 entries
arp.dat (Complete Archaeal proteomes): 32840 entries
fun.dat (Fungi): 19843 entries
hum.dat (Human): 39753 entries
inv.dat (Invertebrates): 84525 entries
mam.dat (Other Mammals): 11880 entries
mhc.dat (MHC proteins):  8701 entries
org.dat (Organelles): 89635 entries
phg.dat (Bacteriophages): 6585 entries
pln.dat (Plants): 98105 entries
pro.dat (Prokaryotes): 86915 entries
prp.dat (Complete Prokaryotic Proteomes):161638 entries
rod.dat (Rodents): 32982 entries
unc.dat (Unclassified): 149 entries
vrl.dat (Viruses): 85797 entries
vrt.dat (Other Vertebrates): 14095 entries
vrv.dat (Retroviruses): 82256 entries

72'120 new entries have been integrated in SP-TrEMBL.
The sequences of 4357 SP-TrEMBL entries have been
updated and the annotation has been updated in
334'435 entries.

In the document deleteac.txt, you will find a list
of all accession numbers which were previously
present in TrEMBL, but which have now been deleted from
the database.

REM-TrEMBL (REMaining TrEMBL) contains the entries
(86'587) that we do not want to include in SWISS-PROT.

ACCESS/DATA DISTRIBUTION
========================

FTP server:     ftp.ebi.ac.uk/pub/databases/trembl
SRS server:     http://srs.ebi.ac.uk/

TrEMBL is also available on the SWISS-PROT CD-ROM.
SWISS-PROT + TrEMBL is searchable on the following
servers at the EBI:

FASTA3  (http://www.ebi.ac.uk/fasta33/)
BLAST2  (http://www.ebi.ac.uk/blast2/)
Scanps  (http://www.ebi.ac.uk/scanps/)
MPSrch  (http://www.ebi.ac.uk/MPsrch/)

For each TrEMBL release, a synchronized version of
the concurrent SWISS-PROT release is distributed at
ftp.ebi.ac.uk/pub/databases/trembl/swissprot/

We also produce every week a complete non-redundant
protein sequence collection by providing three
compressed files (these are in the directory
/pub/databases/sp_tr_nrdb on the EBI FTP server:
sprot.dat.gz, trembl.dat.gz and trembl_new.dat.gz.


TREMBL in XML format
====================

A pre-release version of TrEMBL in XML format has
been developed and is provided with this release
of TrEMBL. More information is available at
http://www.ebi.ac.uk/swissprot/SP-ML and the data
can be downloaded from
ftp://ftp.ebi.ac.uk/pub/databases/trembl/xml


TrEMBL HAS BEEN PREPARED BY:
============================

Maria Jesus Martin, Claire O'Donovan,
Philippe Aldebert, Nicola Althorpe,
Rolf Apweiler, Daniel Barrell, Kirsty Bates,
Paul Browne, Daniel Barrell, Kirill Degtyarenko,
Gill Fraser, Alexander Fedetov, Andre Hackmann,
Henning Hermjakob, Alexander Kanapin,
Youla Karavidopoulou, Paul Kersey, Ernst Kretschmann,
Kati Laiho, Minna Lehvaslaiho, Michele Magrane,
Maria Jesus Martin, Michelle McHale, Virginie Mittard,
Nicola Mulder, Claire O'Donovan, John F. O'Rourke,
Sandra Orchard, Astrid Rakow, Kai Runte,
Sandra van den Broek, Eleanor Whitfield and
Allyson Williams at the
EMBL Outstation - European Bioinformatics Institute (EBI)
in Hinxton, UK;
Amos Bairoch, Alexandre Gattiker, Isabelle Phan and
Sandrine Pilbout at
the Swiss Institute of Bioinformatics in Geneva,
Switzerland.



Maria Jesus Martin
Sequence Database Group Coordinator
EMBL Outstation EBI
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SD UK






More information about the Bionews mailing list