Announcing PIR-NREF: a comprehensive and non-redundant reference protein sequence database for full-scale or species-based protein identification

Pirmail pirmail at NBRF.Georgetown.Edu
Mon Jul 8 03:57:39 EST 2002


Containing 968,861 non-redundant protein sequences


The Protein Information Resource (PIR) is pleased to announce the
release of NREF (Non-redundant REFerence) Protein Database at:

The PIR-NREF provides a timely and comprehensive collection of all
protein sequence data from PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept,

and PDB, with source attribution and minimal redundancy.  Identical
sequences from the same source organism (species) reported in different
databases are presented as a single NREF entry with protein IDs and
names from each underlying database, in addition to protein sequence,
taxonomy, and composite bibliography.

The web site provides direct entry retrieval (based on protein IDs).
text search (protein or species names), and sequence search (BLAST,
peptide match, and pattern match) for full-scale and species-based
protein identification.  Species-based browsing and searching are
supported for about 100 organisms, which includes over 70 complete

The NREF is updated biweekly and available for free downloading and
redistribution in XML format (data file) and FASTA format (sequence
file) from our FTP site at:

Please contact Cathy Wu at pirmail at with any comments or

The work is supported in part by NIH Grant# P41 LM05798


More information about the Bio-www mailing list