ANNOUNCING PIR-NREF Non-Redundant Reference Protein Database
pirmail at NBRF.Georgetown.Edu
Mon Oct 29 05:32:38 EST 2001
Containing 800,917 non-redundant protein sequences
The Protein Information Resource (PIR) is pleased to announce the
beta-release of the PIR-NREF (Non-redundant
REFerence) Protein Database at:
The PIR-NREF is designed to provide a timely and comprehensive
collection of all protein sequence data, keeping
pace with the genome sequencing projects and containing source
attribution and minimal redundancy. The database
contains all sequences in PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept,
and PDB, and is updated biweekly.
Non-redundancy is achieved based on clustering by sequence identity and
taxonomy at the species level. The
NREF report provides source attribution with protein IDs and names from
associated databases, in addition to
protein sequence, taxonomy, and bibliography.
The web site supports direct retrieval of NREF reports based on sequence
unique identifiers, as well as
full-scale BLAST search and peptide/pattern match for functional
identification of query proteins or peptides.
The results are linked to the underlying databases for retrieval of
up-to-date source entries.
An example NREF entry is at:
An example BLAST search output is at:
The database is downloadable in XML format (data file) and FASTA format
(sequence file) from our FTP site at:
Please visit the pages and give us your feedback!
The work is supported in part by NIH Grant# P41 LM05798.
Please contact Cathy Wu at wuc at nbrf.georgetown.edu for any comments and
suggestions, and for inquiries
regarding setting up reciprocal links or mirror sites.
More information about the Bio-www