Announcements of PIR Network Request Service

Thu Oct 1 12:47:20 EST 1992

              Announcements of the Protein Identification Resource
                            Network Request Service

1. PIR Release 34 and NRL_3D Release 10
2. Updated Distribution Information
3. Database Standardization Efforts
4. Internet Addresses for Anonymous FTP and Network Request Service
5. FASTA Searches for NRL_3D Only
6. Network Request Service Command Summary

1. PIR Release 34 and NRL_3D Release 10

As of 30 September Release 34 of the PIR databases and Release 10 of the
NRL_3D database (corresponding to Brookhaven Protein Data Bank Release 61)
are now available through the PIR On-line system and Network Request Server.
Distribution of the tape and CD-ROMs of the new release will begin shortly.

Database   Release Sequences  Residues
  PIR1     34.00   10,550     3,591,370   
  PIR2     34.00   16,188     4,330,190   
  PIR3     34.00   18,162     5,284,017   
  NRL_3D   10.00    1,457       244,804

Growth of the PIR databases is documented in the file DBGROWTH.LIS available
through the Network Request Server.  The following files are also available
through the Server:
  the list of superfamiles in PIR1 is in SUPERFAM.LIS,
  the list of keywords in PIR1 and PIR2 is in KEYWORDS.LIS,
  the list of features in PIR1 and PIR2 is in FEATURES.LIS.

2. Updated Distribution Information

The databases and programs of the PIR are distributed on magnetic tape and
on TK50 and TK70 cartridges in VAX/VMS format and in ASCII card image format;
the protein databases are updated and distributed on a quarterly basis, the
sequence analysis software package is updated irregularly.  The prices listed
are per release and are subject to change. Tapes may be ordered on a one-time 
or on a standing order basis.

The PIR-International Protein Sequence Database ($250) contains substantially
sequenced proteins and sequences translated from nucleic acid sequences.
The database is divided into three data sets categorized by the degree of
annotation in the sequence entries.  The sequences in the PIR1 data set (and
some of the PIR2 data set) have been annotated to identify post-translational
modifications, active sites, signal sequences, disulfide bonds, etc.  The PIR3
data set contains minimal entries that have not yet been examined by
scientific staff.  The datatape also contains the NRL_3D database of sequence
information extracted from the Brookhaven Protein Data Bank.

The VAX/VMS format of the protein sequence datatape contains the PSQ
(Protein Sequence Query) and the NAQ (Nucleic Acid Query) retrieval programs
and programs for creating user databases.  As a service to our users, the PIR
is also including files required to use the PIR database with the GCG software.

The ATLAS multidatabase retrieval program is available on CD-ROM ($100) along
with the PIR-International Protein Sequence Database, the ALN protein alignment
database, the NRL_3D database, the PATCHX database, and the GenBank Genetic
Sequence Databank.  The ATLAS program is currently  designed to run on PC/DOS
and VAX/VMS systems.  Support for UNIX and Mac will be added.

The PATCHX database ($250) is produced by MIPS at the Max Planck Institute
for Biochemistry, Martinsried, Germany.  The PATCHX database includes all
protein sequences (not identical with or contained in sequences from PIR1,
PIR2 and PIR3) from the following databases: MIPSOwn MIPS preliminary entries,
PIRMOD MIPS/PIR preliminary entries, MIPSH MIPS yeast entries, NRL_3D
Brookhaven Data Bank Sequences, MIPSTrn MIPS preliminary translations,
EMTrans (EMBL translation by F. Pfeiffer), SwissProt, GenPept (GenBank(R)
translation by Los Alamos Nat. Lab.), Kabat, and PSeqIP.  All sequences
that are IDENTICAL within or between databases are presented only ONCE.
Also sequences completely contained within others have been removed. 

The NBRF-PIR Sequence Analysis Software tape ($200) contains programs designed
to run on a VAX computer operating under VMS version 5.  All programs are
written in VAX-11 Fortran (a superset of ANSI Fortran 77), with the exception
of the Lipman-Pearson programs (FASTA, RDF), which are written in VAX-11 C. 
Included are:
  database searching programs (SEARCH, ISEARCH, FASTA);
  global similarity programs (ALIGN, IALIGN);
  local similarity programs (RELATE & DOTMATRIX);
  and prediction programs (PRPLOT & CHOFAS - from the IDEAS package). 

More information about the databases, sequence analysis programs, tapes,
on-line services, custom services or prices can be obtained by contacting:
    Kathryn  E. Sidman
    Protein Identification Resource
    National Biomedical Research Foundation
    3900 Reservoir Road, NW
    Washington, DC  20007
    Phone: (202) 687-2121 
    FAX:  (202) 687-1662

3. Database Standardization Efforts

The combined staffs of the PIR-International have been engaged in a vigorous
effort to standardize the keyword and features records occurring in the PIR1
and PIR2 databases.  Previous efforts to standardize the species and reference
records and the title records for enzymes had been very successful.  The
standardization effort progressed by:
(1) determining the complete variety of information that existed in those
(2) formulating rules for which forms were acceptable and which were not,
(3) imposing those rules by correcting the non-compliant entries and
    introducing additional checking procedures during the data entry process.

The success of this standardization effort for the keyword records can be
judged from these results: in Release 30 there were 1614 different keywords
with 63% of those keywords appearing in fewer than 4 entries; in Release 34
there are 1037 different keywords and 40% of those keywords appear in fewer
than 4 entries.  The following table provides a more complete breakdown.

          Frequency of Keywords

Frequency        Different Keywords
in Entries        Rel. 30  Rel. 34
     >400               7       12
  201-400              10       24
  101-200              19       42
   51-100              38       58
   26-50               61       61
   13-25              103      105
    7-12              131      135
    4-6               218      185
    2-3               395      208
     1                632      207

4. Internet Addresses for Anonymous FTP and Network Request Service

During September the PIR Network Request Service was made available through
the National Biomedical Research Foundation's Internet address.  For users
on BITNET the address remains FILESERV at GUNBRF.  For users on Internet and
other networks with gateways to Internet the preferred address is now
  FILESERV at NBRF.Georgetown.Edu.
Provided in the last part of this announcement is a synopsis of instructions
for using this database query and FASTA sequence search service.

Each PIR release and its accompanying NRL-3D release are available for
anonymous FTP from the UH Gene-Server,, IP address
The login is "anonymous" and the password is your e-mail address.  The files
are kept in pub/gene-server/pir/pir_relXX/{ascii,vms}. "XX" is the release
number.  All files are stored as Unix 16-bit compressed files and the file
names end in .Z (e.g. pir.1.dat.Z) as a reminder.

The "ascii" directory contains the CODATA format files, and the "vms"
directory the NBRF format files and indices in VMS format. Note that two of
the files required by GCG V.7.X are not included; those can be generated by
GCG-supplied utilities.

Uncompress utilities are available for non-Unix systems;
  the DOS archive sites have a file "";
  the Info-Mac archives have "maccompress-32.hqx";
  and various VMS archives have "lhzcomp.exe" or "decompress.exe".
The latter is also available in pub/gene-server/pir, with a sample
(but non-working) .CLD file.

Questions about the FTP server can be directed to Dan Davison, davison at
Our thanks to Bill Pearson and Dan Davison for their efforts in providing FTP
access to the PIR databases.

5. FASTA Searches for NRL_3D Only

Some users had suggested that they wanted to do FASTA sequence searches
only for the sequences with known 3-dimensional structures, the sequences
extracted from the Brookhaven Protein Data Bank in NRL_3D.  Normally our
FASTA searches are done against all the protein databases, PIR1, PIR2, PIR3,
the non-redundant PATCHX (described in the August announcement and in part 2
above) and NRL_3D.  Now when the command
is used before a SEARCH command, only the NRL_3D database will be used for
the FASTA search.  Otherwise, all the protein databases will be used.

Thanks to Ada Prochnicka-Chalufour at the Pasteur Institute for her helpful
suggestion and her hospitality this spring.

6. PIR Network Request Service Command Summary

The National Biomedical Research Foundation Protein Identification Resource
network request service is a full-function fileserver and database query
system.  It has been operating since August 1990 and is capable of handling
database queries, sequence searches and sequence submissions, in addition to
fileserver requests.  To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET.  The FILESERVer recognizes the following commands
sent either in a mail message, or (if the sender is on BITNET) in command
messages or in a file:

  Command        Action
  -------        -----------------------------------------------
  ACCESSION      list entry codes and titles by accession number
  AND            combine QUERY commands with Boolean AND
  AUTHOR         list entry codes and titles by author
  BASES          list accessible databases
  CROSS          list PIR entry codes and titles corresponding to
                 a particular nucleic sequence database entry
  DEPOSIT        deposit entry for database submission
    END DEPOSIT  terminate deposit entry
  FEATURE        list entry codes and titles by feature table entry
  GENE           list entry codes and titles for a gene name
  GET            return entry by entry code
  HELP           return HELP instructions
  HOST           list entry codes and titles by host species
  INDEX          list SENDable files
  JOURNAL        list entry codes and titles by journal citation
  KEYWORD        list entry codes and titles by keyword
  MEMBER         list alignments containing entry code as a member

More information about the Bio-soft mailing list