Announcements of PIR Network Request Service

POSTMASTER at NBRF.Georgetown.Edu POSTMASTER at NBRF.Georgetown.Edu
Tue Apr 13 14:51:33 EST 1993


               Announcements of the Protein Information Resource
                            Network Request Service

Highlights
1. Summaries for PIR-International Release 36, NRL_3D Release 12, ALN Release 4
2. Summary of Database Developments in Release 36.00
3. Second Technical Development Bulletin Available
4. Answer to an FAQ and Changes in PIR Network Request Server Commands
5. PIR Network Request Service Command Summary


Announcements
1. Summaries for PIR-International Release 36, NRL_3D Release 12, ALN Release 4

Release 36.00 of the PIR-International database, Release 12.00 of the NRL_3D
database (corresponding to Brookhaven Protein Data Bank Release 63), and
Release 4.00 of the ALN database of protein sequence alignments are now
available through the PIR On-line system and the Network Request Server.
The PIR1, PIR2, PIR3 and NRL_3D databases have been distributed on tape;
the CD-ROM with those databases and ALN is in production.  An appropriate
announcement will be made when the CD-ROM is distributed.

Database   Release Sequences  Residues
PIR1       36.00   11,252     3,903,802   Classified and Annotated Entries
PIR2       36.00   27,383     7,560,531   Annotated Entries
PIR3       36.00   13,622     4,021,433   Unverified Entries
NRL_3D     12.00    1,630       283,635   Sequences in Brookhaven PDB
ALN         4.00      956 (Entries)       Protein Sequence Alignments

Growth of the PIR databases is documented in the file DBGROWTH.LIS available
through the Network Request Server.  The following files are also available
through the Server:
  PADD.LIS      PIR1 entries added since Release 35.00
  PREV.LIS      PIR1 entries with revised sequences since Release 35.00
  SUPERFAM.LIS  superfamiles recorded in PIR1 and PIR2
  KEYWORDS.LIS  keywords employed in PIR1 and PIR2
  FEATURES.LIS  features cataloged in PIR1 and PIR2
  JOURNALS.LIS  recognized journal abbreviations
  ALNBASE.LIS   a description of the ALN database
  ALNTITLE.LIS  titles in the ALN database
  NRLTITLE.LIS  titles in the NRL_3D Database
To obtain these and other files from the PIR Network Request Server, follow the
instructions in the last section of these announcements.


2. Summary of Database Developments in Release 36.00

The three sections, PIR1, PIR2 and PIR3, of the PIR-International Database now
have a uniform format description.  Previously some PIR3 entries appeared with
an asterisk in the title and included the comment
  *This entry is not verified.
This has been resolved for each entry, or removed and replaced with a new
"Status:" comment.  The comment
  Status: preliminary
indicates that the sequence and reference information has been verified or
extracted automatically from another database.  However, the entry may not
have been reviewed subsequently by a PIR-International staff scientist.

Information extracted from the NCBI Backbone Sequence Database is included in
the PIR-International Database.  Initially, some information extracted from
the NCBI data set may not conform to previous PIR standards or conventions.
The database codes "NCBIN:" and "NCBIP:" now appear indicating, respectively,
a cross-reference to a nucleic sequence and a cross-reference to a protein
sequence (or conceptual translation) from the NCBI Backbone Database.  Such
cross-references are followed by the comment
  Note: sequence extracted from NCBI backbone
In entries extracted from other databases the comment "Status: preliminary"
additionally indicates that the sequence has not been checked by
PIR-International personnel.

The new molecule type "nucleic acid" has been introduced for those entries 
where the molecule type could not be determined.

As a result of collaborations with Human Genome Data Base (GDB) Center at the
Johns Hopkins University Welch Medical Library, the human sequence entries in
the PIR-International Database have gene names cross-referenced to GDB gene
symbols.  In the "Gene name" information the database code "GDB:" before a gene
symbol indicates this cross-reference.  Human entries with gene names not
preceded by "GDB:" and those without gene names will be matched in an on-going
joint effort with the Human Genome Data Base.

As a result of collaborations with the National Center for Biotechnology
Information (NCBI), the bibliographic references in the PIR-International
Database have been extensively cross-referenced with the National Library of
Medicine MedLine UID's.  In the "Reference number" information the database
code "MUID:" followed by a reference number indicates the MedLine UID.


3. Second Technical Development Bulletin Available

The second PIR-International Technical Development Bulletin is available in
the file PIRTECH.LIS that can be sent by the PIR Network Request Server or
picked up by anonymous FTP from the UH Gene-Server, ftp.bchs.uh.edu, IP address
129.7.2.43.  This electronic bulletin provides detailed specifications of the
database format and serves as an "early warning system" for software developers
and others who are concerned about changes in the format and standards for the
PIR databases.  If you are interested in the technical aspects of these
database changes and would like to be placed on the mailing list for the
Technical Bulletin, send a brief electronic mail note to POSTMAST at GUNBRF on
BITNET or to POSTMASTER at NBRF.Georgetown.Edu on Internet.


4. Answer to an FAQ and Changes in PIR Network Request Server Commands

One frequently asked question goes something like
> What is the longest (or shortest) known human (or some other species)
> protein sequence?

Fragments, free-amino acids and isopeptides should be eliminated from the
contest for shortest.  Then there would have to be a caveat that it might
not be certain whether a particular di- or tripeptide is genetically coded. 
Also, for various reasons there is an inherent bias that may limit the
shortest to 3 rather than 2 residues.  The shortest human sequence in the
PIR databases is:
 length  code    title
      3  GKHU    Growth-modulating peptide - Human

The current longest sequences for various eukaryotes are:
 length  code    title
   6805  S20901  Titin - Rabbit
   6048  S07571  Twitchin - Caenorhabditis elegans
   5147  A41087  Cadherin-related tumor suppressor precursor - Fruit fly
                      (Drosophila melanogaster)
   5032  A35041  Ryanodine receptor - Human

The PIR Network Request Server will now allow the PIR protein sequence
databases to be queried on the basis of length.  The commands
  USE LOWER nnn
and
  USE UPPER nnn
will set the sequence length lower and upper limits.  For example, the
commands
  USE LOWER 1300
  USE UPPER 1600
will restrict the selection to sequences with from 1300 through 1600 residues.
The default unrestricted limits can be restored by using the commands
  USE LOWER *
  USE UPPER *
The USE LOWER, USE UPPER, USE BEFORE, USE AFTER and USE FORMAT commands
are applicable only to the PIR1, PIR2, PIR3 and NRL_3D databases; these
commands cannot be used with the ALN, GenBank and EMBL databases.

It is now anticipated that with Release 37.00, the "Host" information will
be eliminated in the PIR databases.  When this happens, the HOST command for
the PIR Network Request Server will be disabled.


5. PIR Network Request Server Command Summary

The National Biomedical Research Foundation Protein Information Resource
Network Request Server is a full-function fileserver and database query system.
Operating since August 1990 it is capable of handling database queries,
sequence searches and sequence submissions, in addition to fileserver requests.
To use this server, request commands should be sent to
  FILESERV at GUNBRF on BITNET or
  FILESERV at NBRF.Georgetown.EDU on Internet.
The server recognizes the following commands sent either in a mail message
or (if the sender is on BITNET) in a command message or a file:

  Command        Action
  -------        -----------------------------------------------
  ACCESSION      list entry codes and titles by accession number
  AND            combine QUERY commands with Boolean AND
  AUTHOR         list entry codes and titles by author
  BASES          list accessible databases
  CROSS          list PIR entry codes and titles corresponding to
                   a particular nucleic sequence database entry
  DEPOSIT        deposit entry for database submission
    END DEPOSIT  terminate deposit entry
  FEATURE        list entry codes and titles by feature table entry
  GENE           list entry codes and titles for a gene name
  GET            return entry by entry code
  HELP           return HELP instructions
  HOST           list entry codes and titles by host species
  INDEX          list SENDable files
  JOURNAL        list entry codes and titles by journal citation
  KEYWORD        list entry codes and titles by keyword
  MEMBER         list alignments containing entry c


More information about the Bionews mailing list