Announcements of PIR Network Request Service

POSTMASTER at NBRF.Georgetown.Edu POSTMASTER at NBRF.Georgetown.Edu
Mon Feb 1 14:55:08 EST 1993

               Announcements of the Protein Information Resource
                            Network Request Service

1. Summaries for PIR-International Release 35 and NRL_3D Release 11
2. The ALN Database of Protein Sequence Alignments
3. Confidentiality of Requests Submitted to the Network Request Server
4. PIR-International Technical Development Bulletin
5. GenBank and EMBL Database Sections
6. PIR Network Request Service Command Summary

1. Summaries for PIR-International Release 35 and NRL_3D Release 11

Release 35.00 of the PIR-International databases, Release 11.00 of the
NRL_3D database (corresponding to Brookhaven Protein Data Bank Release 62),
and Release 3.00 of the ALN database of protein sequence alignments, are now
available through the PIR On-line system and Network Request Server.
Distribution of the tapes of the new release has been completed and the
CD-ROMs are due to be shipped shortly.

Database   Release Sequences  Residues   
  PIR1     35.00   10,928     3,761,590   Annotated and Classified Entries
  PIR2     34.00   16,662     4,453,825   Preliminary Entries
  PIR3     34.00   19,644     5,660,425   Unverified Entries
  NRL_3D   11.00    1,550       272,744   Sequences from Brookhaven PDB
  ALN       3.00      715 (entries)       Protein Sequence Alignments

Growth of the PIR databases is documented in the file DBGROWTH.LIS available
through the Network Request Server.  The following files are also available
through the Server:
  entries added since Release 34.00 are listed in PADD.LIS,
  entries revised since Release 34.0 are listed in PREV.LIS,
  superfamiles recorded in PIR1 and PIR2 are listed in SUPERFAM.LIS,
  keywords employed in PIR1 and PIR2 are listed in KEYWORDS.LIS,
  features cataloged in PIR1 and PIR2 are listed in FEATURES.LIS,
  recognized journal abbreviations are listed in JOURNALS.LIS
  a description of the ALN database is in ALNBASE.LIS,
  titles in the ALN database are listed in ALNTITLE.LIS,
  titles in the NRL_3D Database are listed in NRLTITLE.LIS.

2. The ALN Database of Protein Sequence Alignments

The Protein Information Resource (PIR) is developing a system for construction,
storage and retrieval of alignments of protein sequences.  The objective is a
database of characteristic domain alignments with their known properties that
might be useful for characterizing proteins of unknown structure and function
as well as for describing the evolutionary relationships of multidomain

In the initial phase, we have constructed a database of alignments of
homologous protein sequences that are less than 55% different from each other. 
Groups of at least three sequences with comparable  lengths and more than 50%
identical were selected from Section 1, Annotated and Classified entries of the
PIR-International Protein Sequence Database (PIR1).  The ClustalV program of
Des Higgins at EMBL was used to align the sequences initially.  The alignments
were checked by senior staff members at PIR and corrections were incorporated
wherever necessary using the ALNED program developed at PIR.

Other alignments developed as part of research projects at PIR, as well as
alignments of domains and repeats have also been included.  The database
currently has 715 entries and can be accessed through the PIR On-line system,
the Network Request Service and the ATLAS retireval system being developed at

Description of an ALN database entry

Each entry consists of a variable number of consecutive records.  The
information contained in these lines is divided into six sections.  The
sections are listed below in the order in which they occur in the entry. 

  1. TITLE
     The title of the alignment.

  2. DATE
     Creation and revision dates.

     The sequence identification codes of the sequences used in the 

     The members title lines, as found in the Protein Sequence Database.

  5. ALIGNMENT (variable number of records)
     The alignment of sequences. The completely conserved residues are
     marked by '*' and partially conserved residues are marked by '.' 
     at the bottom of the alignment.

  6. MATRIX The matrix of percent differences.  
     The upper portion of the matrix gives the number of differences
     between the sequences while the lower portion represents the same
     as percent differences.

3. Confidentiality of Requests Submitted to the Network Request Server

All requests submitted to the PIR Network Request Server, including protein and
nucleotide sequences submitted for FASTA search against the PIR-International
protein sequence databases, are confidential within the following limitations.
The requests are stored in files in a directory that is not accessible to the
public through either network communication or the on-line system.  Network
access is only possible through the server daemon and then only in response to
the network request, coded by date, time and address, that generated the file.
The files are not accessible to PIR personnel except those with the computer
system privileges necessary to conduct computer hardware and software
maintenance.  The files, other than those generated by PIR staff members, are
examined only for accounting purposes and to monitor and ensure correct
software performance.  Accounting summaries are generated for user address
distribution, numbers of requests, and numbers of commands on a monthly basis. 
The files may be retained for up to one month for this accounting and are then
deleted.  This confidentiality does not, of course, apply to protein sequences
submitted through the Server for inclusion in the PIR-International database.

4. PIR-International Technical Development Bulletin

We have on-going efforts to standardize the PIR databases, improving their
parsability and compliance with CODATA and other format standards.  During
the next year the combined staffs of the PIR-International will be imposing
and enforcing many new rules and requirements on the distributed versions of
the database.  Some of these rules and requirements may affect the currently
existing software designed to read the PIR databases in "NBRF format".
Notification of the broader aspects of these changes will be placed in our
newsletters and in announcements posted on the BioSci Newsgroups PROTEINS and
BIONEWS.  However, some people may wish to be informed about the technical
aspects of these changes before they appear in a database release.  For that
reason we will be setting up an electronic mailing list to inform software
developers and others interested in the technical aspects of these database

This electronic bulletin serves as an "early warning system" for people who
are concerned about changes in the format and standards for PIR database
entries.  The first bulletin was posted on 22 January.  Hereafter, they should
appear approximately quarterly.  The first bulletin may be obtained by sending
the request SEND PIRTECH.LIS to the PIR Network Request Server.

If you would be interested in being placed on this mailing list, please send
a brief electronic mail note to me at POSTMAST at GUNBRF.BITNET or
POSTMASTER at NBRF.Georgetown.Edu.

5. GenBank and EMBL Database Sections

The GenBank and EMBL entries available on the On-line system and the Network
Request Server are now divided into the standard 13 libraries.  The GBNEW
section contains the GenBank weekly update entries.   All these databases are
automatically available on the Server through all the commands that can use
them.  Particular databases may be selected with the USE BASES command
described at the end of the Server command summary.

6. PIR Network Request Service Command Summary

The National Biomedical Research Foundation Protein Information Resource
network request service is a full-function fileserver and database query
system.  Operating since August 1990 it is capable of handling database
queries, sequence searches and sequence submissions, in addition to
fileserver requests.  To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET or FILESERV at NBRF.Georgetown.EDU on Internet.
The server recognizes the following commands sent either in a mail message,
or (if the sender is on BITNET) in a command message or a file:

  Command        Action
  -------        -----------------------------------------------
  ACCESSION      list entry codes and titles by accession number
  AND            combine QUERY commands with Boolean AND
  AUTHOR         list entry codes and titles by author
  BASES          list accessible databases
  CROSS          list PIR entry codes and titles corresponding to
                   a particular nucleic sequence database entry
  DEPOSIT        deposit entry for database submission
    END DEPOSIT  terminate deposit entry
  FEATURE        list entry codes and titles by feature table entry
  GENE           list entry codes and titles for a gene name
  GET            return entry by entry code
  HELP           return HELP instructions
  HOST           list entry codes and titles by host species
  INDEX          list SENDable files
  JOURNAL        list entry codes and titles by journal citation
  KEYWORD        list entry codes and titles by keyword
  MEMBER         list alignments containing entry code as a member
  NOT            combine QUERY commands with Boolean NOT
  OR             combine QUERY commands with Boolean OR
  QUERY          begin collecting QUERY commands
    END QUERY    terminate collecting commands and execute QUERY
  QUIT           ignore the remaining text (E-mail signature blocks)
  RETURN         change return address for gateway mail
  SEARCH         search for matching sequences by FASTA procedure
    END SEARCH   terminate sequence for searching
  SEND           send file
  SPECIES        list entry codes and titles by species

