ANNOUNCING PFAM RELEASE 6.0

K Howe klh at sanger.ac.uk
Fri Feb 9 07:54:47 EST 2001


   PFAM : Multiple alignments and profile HMMs of protein domains
			     RELEASE 6.0
		--------------------------------------

1. INTRODUCTION

  Pfam is a collection of protein family alignments which were constructed
  semi-automatically using hidden Markov models (HMMs).  Sequences that were
  not covered by Pfam were clustered and aligned automatically, and are
  released as Pfam-B.  Pfam families have permanent accession numbers and
  contain functional annotation and cross-references to other databases,
  while Pfam-B families are re-generated at each release and are unannotated.
  
  See http://www.sanger.ac.uk/Software/Pfam/
      http://pfam.wustl.edu/
      http://www.cgr.ki.se/Pfam/
  
2. STATISTICS

                            Pfam                         Pfam-B
                    -----------------------      -----------------------
  Release   Date  families sequences residues  families sequences residues   Source
  -------  -----  -------- --------- --------  -------- --------- --------  ---------
       
    0.2    01/96       100     10431  2246421     11763     32081  9200334  Swiss 32
    1.0    04/96       175     15610  3560959     11929     31931  8957230  Swiss 33
    2.0    03/97       527     28170  6770529     13289     31349  8224614  Swiss 34
    2.1    10/97       527     28205  6790960     13289     31349  8224614  Swiss 34
    3.0    06/98       806     99043 22766133     33550     79544 20648530  Swiss 35 + SP-TrEMBL 5
    3.1    09/98      1313    114750 27573470     33550     79544 20648530  Swiss 35 + SP-TrEMBL 5
    3.2    10/98      1344    115155 27689081     33550     79544 20648530  Swiss 35 + SP-TrEMBL 5
    3.3    12/98      1390    119420 28085438     33550     79544 20648530  Swiss 35 + SP-TrEMBL 5
    3.4    01/99      1407    119963 28343136     33550     79544 20648530  Swiss 35 + SP-TrEMBL 5
    4.0    05/99      1465    147347 34476183    128689    123610 33470292  Swiss 37 + SP-TrEMBL 9
    4.1    07/99      1488    148195 34692597     36739     89640 22510097  Swiss 37 + SP-TrEMBL 9
    4.2    08/99      1664    155979 36683193     40017     99587 24062200  Swiss 37 + SP-TrEMBL 9
    4.3    09/99      1815    161833 37803491     39506     97492 23115975  Swiss 37 + SP-TrEMBL 9
    4.4    11/99      2000    164412 38411490     39200     96055 22552453  Swiss 37 + SP-TrEMBL 9
    5.0    01/00      2008    178110 41516321     39228     96077 22506088  Swiss 38 + SP-TrEMBL 11
    5.1    02/00      2015    179782 41704446     42357    103709 24762358  Swiss 38 + SP-TrEMBL 11
    5.2	   03/00      2128    181068 42018555	  42163    102843 24471000  Swiss 38 + SP-TrEMBL 11
    5.3	   05/00      2216    183695 42512479	  41974    102024 23952537  Swiss 38 + SP-TrEMBL 11
    5.4    06/00      2290    185251 42659663     41885    101728 23774015  Swiss 38 + SP-TrEMBL 11
    5.5    09/00      2478    190302 43837632     41232     99302 22716640  Swiss 38 + SP-TrEMBL 11
    6.0    01/01      2697    258321 59332756     40681     96571 21789591  Swiss 39 + SP-TrEMBL 14

3. DESCRIPTION OF CHANGES MADE SINCE RELEASE 5.5


  Pfam 6.0 is based on Swiss-Prot 39 and SP-TREMBL 14 sequences.
  These databases can be accessed from

    ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/
    ftp://ftp.ebi.ac.uk/pub/databases/trembl/


  Release 6.0 contains 226 new families since the last release.

  Pfam now includes active site residues in the multiple sequence
  alignments.  These have been derived from the Swiss-Prot feature
  table. Below is an example line, where an asterisk marks the active
  site position.

    #=GR ODP2_AZOVI/418-637 AS        ...........*..............

  A new database link to the LOAD database has been added, for example
  
    #=GF DR   LOAD; ku;

  We are grateful to the many people who contributed data: 
  Laurence Etwiller, Rob Finn,  Matthew Bashton, Chris Ponting, Peer Bork,
  Joerg Schultz, Richard Copley, Tim Dudgeon, Harold Hutter, Anton
  Enright as well as many others.


4. FUTURE FORMAT CHANGES

  There are no planned format changes for the next release.

5. DESCRIPTION OF RELEASE FILES

  relnotes.txt      - This file.
  userman.txt       - A fuller description of Pfam fields.
  Pfam-A.full       - Annotation and full alignments in Pfam format of all Pfam-A families.
  Pfam-A.seed       - Annotation and seed alignments in Pfam format of all Pfam-A families.
  Pfam-B            - All Pfam-B families.
  swissPfam         - Pfam domain organisation of all Swissprot proteins.
  Pfam              - All Pfam-A HMMs in a HMM library searchable with the hmmpfam program.
  PfamFrag          - All Pfam-A HMMs in fs (fragment search) mode in a HMM library 
                      searchable with the hmmpfam program.
  diff              - A list of files for each family that have changed since the last
                      release.


6. DESCRIPTION OF FIELDS

  Compulsory fields:
  ------------------

  AC   Accession number:             Accession number in form PFxxxxx or PBxxxxxx.
  ID   Identification:               One word name for family.
  DE   Definition:                   Short description of family.
  AU   Author:                       Authors of the entry.
  AL   Alignment method of seed:     The method used to align the seed members.
  SE   Source of seed:               The source suggesting the seed members belong to one family.
  GA   Gathering method:             Search threshold to build the full alignment.
  TC   Trusted Cutoff:               Lowest sequence score and domain score of match in the full alignment.
  NC   Noise Cutoff:                 Highest sequence score and domain score of match not in full alignment.
  SQ   Sequence:                     Number of sequences in alignment.
  //                                 End of alignment.


  Optional fields:
  ----------------

  DC   Database Comment:             Comment about database reference.
  DR   Database Reference:           Reference to external database.
  RC   Reference Comment:            Comment about literature reference.
  RN   Reference Number:             Reference Number.
  RM   Reference Medline:            Eight digit medline UI number.
  RT   Reference Title:              Reference Title.
  RA   Reference Author:             Reference Author
  RL   Reference Location:           Journal location.
  PI   Previous identifier:          Record of all previous ID lines.
  KW   Keywords:                     Keywords.
  CC   Comment:                      Comments.


7. REFERENCES


  Papers on Pfam are listed below:

  i)   Sonnhammer ELL, Eddy SR, Durbin R. Proteins: Structure, 
       Function and Genetics 28:405-420 (1997).

  ii)  Sonnhammer ELL, Eddy SR, Birney E, Bateman A, Durbin R.
       Nucleic Acids Research 26:320-322 (1998).

  iii) Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer ELL.
       Nucleic Acids Research 27:260-262 (1999).

  iv)  Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer ELL.
       Nucleic Acids Research 28:263-266 (2000).
 
  We suggest that you reference the most recent paper.


8. COPYRIGHT NOTICE

Pfam - A database of protein domain family alignments and HMMs
Copyright (C) 1996-2001 The Pfam consortium.

This database is free; you can redistribute it and/or modify it under
the terms of the GNU Library General Public License as published by
the Free Software Foundation; either version 2 of the License, or (at
your option) any later version.

In summary, you are free to redistribute *verbatim* copies of Pfam or
any Pfam files in any way you like, including packaging Pfam in
proprietary software, so long as your copy of Pfam retains our
copyright notice and the GNU license. You may also make *modified*
copies of Pfam and distribute them, but your derivative database must
be freely distributed under the GNU LGPL. Many academic freeware
licenses prohibit any form of commercial use. In contrast, the intent
of our license is that Pfam should be freely available to both
industrial and academic researchers, including the use of the Pfam
database in commercial software; however, proprietary modifications of
the Pfam database itself are prohibited. Proprietary modification of
the Pfam database is possible only by a separate formal licensing
agreement from the Pfam consortium and our host institutions. See the
file GNULICENSE for the full text of the GNU Library General Public
License.

This database is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Library General Public License for more details.

You may also obtain a copy of the GNU LGPL by writing to the Free
Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.

Pfam is maintained by a consortium of researchers. You can contact
the Pfam consortium at:
        pfam-admin at sanger.ac.uk

The current members of the Pfam consortium are:
Alex Bateman, Ewan Birney, Kevin Howe, Lorenzo Cerutti, Richard Durbin,
Mhairi Marshall, Sam Griffiths-Jones: The Sanger Centre, UK
Ewan Birney, Laurence Etwiller: The European Bioinformatics Institute, UK.
Lorenzo Cerrutti: INRA, Station d'Amelioration des Plantes, France.
Erik Sonnhammer, Christian Storm, Michael Asman: Karolinska Institute, Sweden
Sean Eddy, Ajay Khanna, Christian Zmasek: Washington University, St Louis, USA
___________________
The Pfam Consortium
February 2001










More information about the Bio-soft mailing list