Release 23 of TREMBL, a protein sequence database complementing SWISS-PROT

Maria Jesus Martin martin at
Wed Mar 5 19:11:49 EST 2003


TrEMBL is a computer-annotated protein sequence database
complementing the Swiss-Prot Protein Knowledgebase.
TrEMBL contains the translations of all coding sequences
(CDS) present in the EMBL/GenBank/DDBJ Nucleotide Sequence
Databases and also protein sequences extracted from the
literature or submitted to Swiss-Prot, which are not yet
integrated into Swiss-Prot. For all TrEMBL entries
which should finally be upgraded to the standard
Swiss-Prot quality, Swiss-Prot accession numbers have
been assigned.


This TrEMBL release has been produced in synch with
Swiss-Prot release 41. It was created from the EMBL
Nucleotide Sequence Database release 73 and contains
921'952 entries and 40'914'860 amino acids.

TrEMBL is split in two main sections:
SP-TrEMBL (Swiss-Prot TrEMBL) contains the entries
(830'525) which should be eventually incorporated into
Swiss-Prot. Swiss-Prot accession numbers have been
assigned for all SP-TrEMBL entries.

SP-TrEMBL is organized in subsections:

arc.dat (Archaea):                          1736 entries
arp.dat (Complete Archaeal proteomes):     31625 entries
fun.dat (Fungi):                           15977 entries
hum.dat (Human):                           34880 entries
inv.dat (Invertebrates):                   79680 entries
mam.dat (Other Mammals):                   12223 entries
mhc.dat (MHC proteins):                     8813 entries
org.dat (Organelles):                      73538 entries
phg.dat (Bacteriophages):                   6448 entries
pln.dat (Plants):                          80929 entries
pro.dat (Prokaryotes):                     79736 entries
prp.dat (Complete Prokaryotic Proteomes): 181432 entries
rod.dat (Rodents):                         40143 entries
unc.dat (Unclassified):                      331 entries
vrl.dat (Viruses):                         82490 entries
vrt.dat (Other Vertebrates):               14889 entries
vrv.dat (Retroviruses):                    85655 entries

107'123 new entries have been integrated in SP-TrEMBL.
The sequences of 1713 SP-TrEMBL entries have been updated
and the annotation has been updated in 252'549 entries.

In the document deleteac.txt, you will find a list of all
accession numbers which were previously present in TrEMBL,
but which have now been deleted from the database.

REM-TrEMBL (REMaining TrEMBL) contains the entries (91'427)
that we do not want to include in Swiss-Prot.


FTP server:
SRS server:

TrEMBL is also available on the SWISS-PROT CD-ROM.
SWISS-PROT + TrEMBL is searchable on the following
servers at the EBI:

Scanps  (
MPSrch  (

For each TrEMBL release, a synchronized version of
the concurrent SWISS-PROT release is distributed at

We also produce every week a complete non-redundant
protein sequence collection by providing three
compressed files (these are in the directory
/pub/databases/sp_tr_nrdb on the EBI FTP server:
sprot.dat.gz, trembl.dat.gz and trembl_new.dat.gz.)

TrEMBL and Swiss-Prot in XML format

A release version of TrEMBL and Swiss-Prot in XML format
is provided with this release of TrEMBL. More information
is available at
and the data can be downloaded from


Maria Jesus Martin, Claire O'Donovan,
Philippe Aldebert, Nicola Althorpe,
Rolf Apweiler, Daniel Barrell, Kirsty Bates,
Paul Browne, Daniel Barrell, Kirill Degtyarenko,
Gill Fraser, Alexander Fedetov, Andre Hackmann,
Alexander Kanapin, Youla Karavidopoulou,
Paul Kersey, Ernst Kretschmann,
Kati Laiho, Minna Lehvaslaiho, Michele Magrane,
Maria Jesus Martin, Michelle McHale, Virginie Mittard,
Nicola Mulder, Claire O'Donovan, John F. O'Rourke,
Sandra Orchard, Astrid Rakow, Kai Runte,
Sandra van den Broek, Eleanor Whitfield and
Allyson Williams at the
EMBL Outstation - European Bioinformatics Institute (EBI)
in Hinxton, UK;
Amos Bairoch, Alexandre Gattiker, Isabelle Phan and
Sandrine Pilbout at
the Swiss Institute of Bioinformatics in Geneva,

Maria Jesus Martin
Sequence Database Group Coordinator
EMBL Outstation EBI
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SD UK

More information about the Bionews mailing list