Announcements of PIR Network Request Service

POSTMASTER at NBRF.GEORGETOWN.EDU POSTMASTER at NBRF.GEORGETOWN.EDU
Mon Feb 1 14:55:08 EST 1993


               Announcements of the Protein Information Resource
                            Network Request Service

Highlights
1. Summaries for PIR-International Release 35 and NRL_3D Release 11
2. The ALN Database of Protein Sequence Alignments
3. Confidentiality of Requests Submitted to the Network Request Server
4. PIR-International Technical Development Bulletin
5. GenBank and EMBL Database Sections
6. PIR Network Request Service Command Summary

Announcements
1. Summaries for PIR-International Release 35 and NRL_3D Release 11

Release 35.00 of the PIR-International databases, Release 11.00 of the
NRL_3D database (corresponding to Brookhaven Protein Data Bank Release 62),
and Release 3.00 of the ALN database of protein sequence alignments, are now
available through the PIR On-line system and Network Request Server.
Distribution of the tapes of the new release has been completed and the
CD-ROMs are due to be shipped shortly.

Database   Release Sequences  Residues   
  PIR1     35.00   10,928     3,761,590   Annotated and Classified Entries
  PIR2     34.00   16,662     4,453,825   Preliminary Entries
  PIR3     34.00   19,644     5,660,425   Unverified Entries
  NRL_3D   11.00    1,550       272,744   Sequences from Brookhaven PDB
  ALN       3.00      715 (entries)       Protein Sequence Alignments

Growth of the PIR databases is documented in the file DBGROWTH.LIS available
through the Network Request Server.  The following files are also available
through the Server:
  entries added since Release 34.00 are listed in PADD.LIS,
  entries revised since Release 34.0 are listed in PREV.LIS,
  superfamiles recorded in PIR1 and PIR2 are listed in SUPERFAM.LIS,
  keywords employed in PIR1 and PIR2 are listed in KEYWORDS.LIS,
  features cataloged in PIR1 and PIR2 are listed in FEATURES.LIS,
  recognized journal abbreviations are listed in JOURNALS.LIS
  a description of the ALN database is in ALNBASE.LIS,
  titles in the ALN database are listed in ALNTITLE.LIS,
  titles in the NRL_3D Database are listed in NRLTITLE.LIS.


2. The ALN Database of Protein Sequence Alignments

The Protein Information Resource (PIR) is developing a system for construction,
storage and retrieval of alignments of protein sequences.  The objective is a
database of characteristic domain alignments with their known properties that
might be useful for characterizing proteins of unknown structure and function
as well as for describing the evolutionary relationships of multidomain
proteins.

In the initial phase, we have constructed a database of alignments of
homologous protein sequences that are less than 55% different from each other. 
Groups of at least three sequences with comparable  lengths and more than 50%
identical were selected from Section 1, Annotated and Classified entries of the
PIR-International Protein Sequence Database (PIR1).  The ClustalV program of
Des Higgins at EMBL was used to align the sequences initially.  The alignments
were checked by senior staff members at PIR and corrections were incorporated
wherever necessary using the ALNED program developed at PIR.

Other alignments developed as part of research projects at PIR, as well as
alignments of domains and repeats have also been included.  The database
currently has 715 entries and can be accessed through the PIR On-line system,
the Network Request Service and the ATLAS retireval system being developed at
PIR.

Description of an ALN database entry

Each entry consists of a variable number of consecutive records.  The
information contained in these lines is divided into six sections.  The
sections are listed below in the order in which they occur in the entry. 

  1. TITLE
     The title of the alignment.

  2. DATE
     Creation and revision dates.

  3. MEMBERS
     The sequence identification codes of the sequences used in the 
     alignment.

  4. MEMBERS TITLES
     The members title lines, as found in the Protein Sequence Database.

  5. ALIGNMENT (variable number of records)
     The alignment of sequences. The completely conserved residues are
     marked by '*' and partially conserved residues are marked by '.' 
     at the bottom of the alignment.

  6. MATRIX The matrix of percent differences.  
     The upper portion of the matrix gives the number of differences
     between the sequences while the lower portion represents the same
     as percent differences.


3. Confidentiality of Requests Submitted to the Network Request Server

All requests submitted to the PIR Network Request Server, including protein and
nucleotide sequences submitted for FASTA search against the PIR-International
protein sequence databases, are confidential within the following limitations.
The requests are stored in files in a directory that is not accessible to the
public through either network communication or the on-line system.  Network
access is only possible through the server daemon and then only in response to
the network request, coded by date, time and address, that generated the file.
The files are not accessible to PIR personnel except those with the computer
system privileges necessary to conduct computer hardware and software
maintenance.  The files, other than those generated by PIR staff members, are
examined only for accounting purposes and to monitor and ensure correct
software performance.  Accounting summaries are generated for user address
distribution, numbers of requests, and numbers of commands on a monthly basis. 
The files may be retained for up to one month for this accounting and are then
deleted.  This confidentiality does not, of course, apply to protein sequences
submitted through the Server for inclusion in the PIR-International database.


4. PIR-International Technical Development Bulletin

We have on-going efforts to standardize the PIR databases, improving their
parsability and compliance with CODATA and other format standards.  During
the next year the combined staffs of the PIR-International will be imposing
and enforcing many new rules and requirements on the distributed versions of
the database.  Some of these rules and requirements may affect the currently
existing software designed to read the PIR databases in "NBRF format".
Notification of the broader aspects of these changes will be placed in our
newsletters and in announcements posted on the BioSci Newsgroups PROTEINS and
BIONEWS.  However, some people may wish to be informed about the technical
aspects of these changes before they appear in a database release.  For that
reason we will be setting up an electronic mailing list to inform software
developers and others interested in the technical aspects of these database
changes.

This electronic bulletin serves as an "early warning system" for people who
are concerned about changes in the format and standards for PIR database
entries.  The first bulletin was posted on 22 January.  Hereafter, they should
appear approximately quarterly.  The first bulletin may be obtained by sending
the request SEND PIRTECH.LIS to the PIR Network Request Server.

If you would be interested in being placed on this mailing list, please send
a brief electronic mail note to me at POSTMAST at GUNBRF.BITNET or
POSTMASTER at NBRF.Georgetown.Edu.


5. GenBank and EMBL Database Sections

The GenBank and EMBL entries available on the On-line system and the Network
Request Server are now divided into the standard 13 libraries.  The GBNEW
section contains the GenBank weekly update entries.   All these databases are
automatically available on the Server through all the commands that can use
them.  Particular databases may be selected with the USE BASES command
described at the end of the Server command summary.


6. PIR Network Request Service Command Summary

The National Biomedical Research Foundation Protein Information Resource
network request service is a full-function fileserver and database query
system.  Operating since August 1990 it is capable of handling database
queries, sequence searches and sequence submissions, in addition to
fileserver requests.  To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET or FILESERV at NBRF.Georgetown.EDU on Internet.
The server recognizes the following commands sent either in a mail message,
or (if the sender is on BITNET) in a command message or a file:

  Command        Action
  -------        -----------------------------------------------
  ACCESSION      list entry codes and titles by accession number
  AND            combine QUERY commands with Boolean AND
  AUTHOR         list entry codes and titles by author
  BASES          list accessible databases
  CROSS          list PIR entry codes and titles corresponding to
                   a particular nucleic sequence database entry
  DEPOSIT        deposit entry for database submission
    END DEPOSIT  terminate deposit entry
  FEATURE        list entry codes and titles by feature table entry
  GENE           list entry codes and titles for a gene name
  GET            return entry by entry code
  HELP           return HELP instructions
  HOST           list entry codes and titles by host species
  INDEX          list SENDable files
  JOURNAL        list entry codes and titles by journal citation
  KEYWORD        list entry codes and titles by keyword
  MEMBER         list alignments containing entry code as a member
  NOT            combine QUERY commands with Boolean NOT
  OR             combine QUERY commands with Boolean OR
  QUERY          begin collecting QUERY commands
    END QUERY    terminate collecting commands and execute QUERY
  QUIT           ignore the remaining text (E-mail signature blocks)
  RETURN         change return address for gateway mail
  SEARCH         search for matching sequences by FASTA procedure
    END SEARCH   terminate sequence for searching
  SEND           send file
  SPECIES        list entry codes and titles by species
  SUGGEST        leave suggestion or correction for PIR staff
    END SUGGEST  terminate suggestion text
  SUPERFAMILY    list entry codes and titles by superfamily name
  TAXONOMY       report taxonomy for scientific or common name
  TITLE          list entry codes and titles by title
  USE            set databases, dates or formats to use in limited searches

Multiple commands can be sent with one command on each line of a mail message
or file.  Commands should NOT be sent on the Subject line of a mail message.
Receipt of command messages and files will be acknowledged immediately.  Mail
messages will be acknowledged by return mail.

For help in using any of the commands, send a request of the form
  HELP topic
for example
  HELP SEARCH

In addition to the commands, help instructions are also available on the
following topics:
  Custom_Services
  Databases
  FTP
  Gateway_Access
  Help_en_Espanol
  Help_en_francais
  Hints
  IBM-VM_BITNET
  On-Line_Access
  PIR_Distribution
  VAX-VMS_BITNET

Because of network gateway communication protocols, there are limitations on
requests sent through gateways.  Users not on BITNET or INTERNET who access the
server through local or network gateways should read and carefully follow these
instructions before sending requests.  Only mail message requests (not command
messages or files) can be sent through gateways.  Because addresses posted on
gateway mail do not always work for the return, before you send requests
through network gateways it is strongly recommended that you first contact
John S. Garavelli (POSTMAST at GUNBRF on BITNET, POSTMASTER at NBRF.Georgetown.EDU on
Internet).  We will confirm a return address for you and may instruct you to
use the RETURN command to ensure that your request output will reach you.  It
is not usually necessary to do this if you are on BITNET or INTERNET, unless
your system employs a local remailer or your mail program applies a
nonstandard return address (for example a personal name on the FROM: line).

The BITNET network and the network gateways impose strict limits on file size.
Poorly posed database queries may result in output so extensive that it could
not be returned by network mail.  Therefore, an output limit of 1000 lines for
each command and 3000 lines for each request is imposed by the PIR server.

The DEPOSIT and QUERY commands, and the SEARCH and SUGGEST commands (in their
multiline form) must be followed by their respective END commands after the
text appearing on the intervening lines.  The DEPOSIT command requires, and the
SEARCH command optionally uses, parameters that appear on the same line as the
command.  Because these four commands are so complex, users should obtain and
carefully read the help instructions before attempting to use them.

The databases available through the PIR Network Server and their abbreviations
for code specification are as follows:
  Abbreviation  Database                              Update Schedule
  PIR1          PIR Annotated and Classified Entries  approximately biweekly
  PIR2          PIR Preliminary Entries               approximately weekly
  PIR3          PIR Unverified Entries                weekly
  ALN           PIR Alignment Entries                 semiannually
  NRL_3D        Brookhaven Data Bank Sequences        quarterly
  PATCHX        MIPS PIR-Supplementary Database       quarterly
  N             NBRF Nucleic
  GB*           GenBank (TM)                          as received
  GBNEW         GenBank (TM) New Entries              weekly
  EMBL*         EMBL                                  as received

In the FASTA output of the SEARCH command the abbreviation for PATCHX is
shortened to PATX and NRL_3D is shortened to NR3D; the longer abbreviation
should be used to retrieve an entry with the GET command.  Not all commands
work with all databases; please read the information returned by the command
HELP DATABASES.
The GenBank (TM), GB, and EMBL databases are now divided into sections
corresponding to the sections of their standard releases:
  -BCT          Bacterial Sequences
  -EST          EST Sequences
  -INV          Invertebrate Sequences
  -MAM          Other Mammalian Sequences
  -PHG          Phage Sequences
  -PLN          Plant Sequences
  -PRI          Primate Sequences
  -RNA          Struct RNA Sequences
  -ROD          Rodent Sequences
  -SYN          Synthetic Sequences
  -UNA          Unannotated Sequences
  -VRL          Viral Sequences
  -VRT          Other Vertebrate Sequences
These databases may be indivually accessed with the USE BASES command
with the database abbreviation and the section abbreviation, for example
  USE BASES GBPRI
or all sections of a given database may be accessed with the database
abbreviation and an asterisk, for example
  USE BASES PIR*
or
  USE BASES GB*
------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMAST at GUNBRF.BITNET
                                 POSTMASTER at NBRF.Georgetown.Edu



More information about the Proteins mailing list