[ANNOUNCE] iProClass: Integrated Protein Classification Database

Cathy Wu wuc at NBRF.Georgetown.Edu
Thu Oct 12 15:08:52 EST 2000

The Protein Information Resource (PIR) is pleased to announce the
beta-release of iProClass Integrated Protein Classification Database,
freely accessible for sequence search and report retrieval at:


The iProClass is an integrated resource that provides comprehensive
family relationships and structural/functional features of proteins,
with rich links to various databases. It currently consists of more
than 210,000 non-redundant proteins organized with more than 29,000
superfamilies, 2600 domains, 1300 motifs, 280 post-translational
modification sites, and links to more than 30 databases of protein
families, structures, functions, genes, genomes, literature, and
taxonomy. Protein and superfamily summary reports provide rich
annotations, including membership information with length, taxonomy,
and keyword statistics, full family relationships, comprehensive
enzyme and PDB cross-references, and graphical feature display. The
iProClass is implemented in Oracle 8i object-relational system, and
can facilitate classification-driven annotation for protein sequences
and complete genomes, and support structural/functional genomics and
proteomics research.

The database has three major features: integration, comprehensiveness,
and annotation, as outlined below. Example protein and superfamily
summary reports are available at:
http://pir.georgetown.edu/iproclass/RPTPex.html and

A. Integration:
  - Integration of superfamily, domain and motif classifications
  - Integration of protein sequence, function, and structural classes
B. Comprehensiveness:
  - Protein Sequence Data: non-redundant PIR and SwissProt proteins
    (>210,000 total, 58% PIR unique, 32% PIR-SwissProt redundant, 10%
    SwissProt unique)
  - Family and Alignment Data: 
    PIR superfamilies (>29,000)
    MIPS families and ProtFam alignments (>100,000)
    PIR homology domains and PIR-ALN alignments (>380)
    Pfam domains (>2250)
    ProSite motifs and ProClass motif alignments (>1300)
    PIR-RESID post-translational modifications (>280)
    PIR-ASDB FASTA similarity clusters of all PIR proteins (>195,000)
  - Cross-References: Links to >30 databases of 
    Protein Sequence: PIR-PSD, SwissProt, TrEMBL, GenPept
    Family: PIR-ASDB, PIR-ALN, MIPS-ProtFam, ProClass, Pfam, ProSite, 
            Blocks, Prints, COG, MetaFam
    Protein Enzyme/Pathway: KEGG, BRENDA, WIT, EcoCyc
    Protein Structure and Structural Class: PDB, SCOP, CATH, PIR-RESID
    Genes/Genome: GenBank/EMBL/DDBJ, TIGR, UWGP, SGD, Flybase, MGI, 
            GDB, OMIM
    Literature: Medline
    Taxonomy: NCBI Taxonomy
C. Annotation: Annotated protein entries and curated sets of PIR 
   superfamilies/homology domains, Pfam domains, ProSite/ProClass 
   motifs, and PIR post-translational modification sites

The current version of iProClass (beta-release, 10/2000) is based on
the PIR-International Protein Sequence Database (PIR-PSD) Release 66.0
(09/00), SwissProt 39.0 (05/00), TrEMBL 14.0 (06/00), Pfam 5.4
(06/00), BLOCKS 12.0 (06/00), PRINTS 27.0 (04/00), PROSITE 14.0
(07/99), PDB (07/00), and COG (01/00). 

The work is supported in part by NSF Grant# DBI-9974855 and NIH Grant#
P41 LM05798.

Please contact Cathy Wu at wuc at nbrf.georgetown.edu for any comments,
and for inquiries regarding obtaining free copies of the iProClass
database or setting up reciprocal links or mirror sites.

Cathy H. Wu, Ph.D.
PIR Project Leader
National Biomedical Research Foundation
Georgetown University Medical Center
3900 Reservoir Road, NW, Washington, DC 20007-2195
Phone: (202)687-2121; Fax: (202)687-1662
Email: wuc at nbrf.georgetown.edu
PIR Home: http://pir.georgetown.edu/


