Swiss-Prot release 42.0 available

Elisabeth Gasteiger Elisabeth.Gasteiger at
Mon Oct 13 10:59:18 EST 2003


Name        : Swiss-Prot
Description : Protein knowledgebase.
Release     : 42.0 of October 2003
Statistics  : 135'850 fully annotated sequences, 50'046'799 amino acids,
               109'694 references.
Citation    : Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C.,
               Estreicher A., Gasteiger E., Martin M.J., Michoud K.,
               O'Donovan C., Phan I., Pilbout S. and Schneider M.
               The Swiss-Prot protein knowledgebase and its supplement
               TrEMBL in 2003.
               Nucleic Acids Res. 31:365-370(2003).
Availability: FTP:



Note: a much more complete description of the changes and future
developments is available from the release notes. The release notes
can be accessed from the WWW at the address:


or downloaded by FTP from:

A) Summary of the changes in Swiss-Prot release 42

In Swiss-Prot:

- Release 42.0 of Swiss-Prot contains 135'850 sequence entries,
   comprising 50'046'799 amino acids abstracted from 109'694 references.
   13'374 sequences have been added since release 41, the sequence data
   of 1'298 existing entries has been updated and the annotations of
   45'617 entries have been revised. This represents an increase of 11%.

- In order to handle the large amount of "raw" data coming from the
   microbial genomic sequencing, the High-quality Automated and Manual
   Annotation of microbial Proteomes (HAMAP) project was initiated. It
   aims to annotate a significant percentage of proteins which originate
   from microbial genome sequencing projects. There are currently 146
   complete proteomes in Swiss-Prot and TrEMBL. The HAMAP web site was
   enhanced with many new features:

- The Human Proteomics Initiative (HPI) project is progressing. There
   are currently 10'159 annotated human sequences in Swiss-Prot. Up-to-
   date detailed statistics concerning the HPI project, as well as
   detailed project information, are available at:

- A new format was introduced for "CC ALTERNATIVE PRODUCTS" lines. The
   new format is more structured than the previous format. Associated
   with these changes are the introduction of stable identifiers for
   each named splice isoform in all entries that describe more than one
   splice isoform; the extension of feature identifiers, previously only
   used for human VARIANT and certain CARBOHYD features, to VARSPLIC
   features in entries from all species.

- We have revised the annotation of post-translational modified amino
   acids in lipoproteins, and made a major overhaul of the controlled
   vocabulary. Lipid annotation that was covered by other feature (FT)
   keys than LIPID has been moved accordingly, e.g. cholesterol-binding.
   The controlled vocabulary for the feature descriptions of 'LIPID' FT
   lines can be found in the user manual:

- The feature key 'CROSSLNK' has been introduced to describe bonds
   between amino acids, which are formed posttranslationally within a
   peptide or between peptides, such as isopeptidic bonds, carbon-carbon
   linkages, carbon-nitrogen linkages, thioether bonds, thiolester
   bonds, and backbone condensations. The feature keys 'THIOETH' and
   'THIOLEST' have been removed. The controlled vocabulary for the
   'CROSSLNK' key can be found in the user manual.

- A new comment (CC) topic 'ALLERGEN' has been introduced to convey
   information relevant to allergenic proteins.

- We have added cross-references from Swiss-Prot to the following
   databases: GermOnline, GK, GO and PIRSF.

- All the recent changes to Swiss-Prot format are described in detail
   in the continuously updated document:

- The ExPASy WWW server was the target of many improvements that are
   all described at the address:

B) Future developments

Forthcoming format changes are listed in the continuously updated

Swiss-Prot is copyright.  It is produced through a collaboration between the
Swiss Institute  of  Bioinformatics   and the EMBL Outstation - the European
Bioinformatics Institute. There are no restrictions on its use by non-profit
institutions as long as its  content is in no way modified. Usage by and for
commercial entities requires a license agreement.  For information about the
licensing  scheme  see: or send  an email to
license at


More information about the Bio-www mailing list