Availability of PIR ATLAS CD-ROM

POSTMASTER at NBRF.GEORGETOWN.EDU POSTMASTER at NBRF.GEORGETOWN.EDU
Thu May 6 14:01:30 EST 1993


                ATLAS of Protein and Genomic Sequences CD-ROM

The new release of the ATLAS of Protein and Genomic Sequences CD-ROM is now
available for distribution.

The ATLAS CD contains the most up-to-date versions of the PIR-International
Protein Sequence Database (release 36.00, the most comprehensive and complete
protein sequence database available) and the GenBank Sequence Data Bank
(release 75.0, supplemented with the weekly GenBank new data entries).  Once
again the Protein Sequence Database increased by more than 10% since the last
release and is nearly double the size of release 25 of the Swiss-Prot database.
In conjunction with the MIPS PATCHX data set (assembled from a collection of
other public domain protein sequence databases and also included on the
CD-ROM), the Protein Sequence Database provides the most complete collection of
protein sequence data currently available in the public domain.  This release
of the PIR-International Protein Sequence Database is comprehensively
cross-referenced to the MedLine abstracts by the MedLine Unique Identifier
(MUID) and contains cross-references to the Genome Data Base (GDB) of the Welch
Medical Library at the Johns Hopkins University.

Also provided on the CD-ROM are: release 12.00 of NRL_3D Structure-Function
Database, release 4.0 of the PIR Alignment Database, and the March 1993 release
of the JIPID ECOLI (Escherichia coli) Nucleic Acid Sequence Database.  The
NRL_3D Database is a protein sequence database extracted from the Brookhaven
Protein Data Bank (PDB) coordinate data files; it provides an interface between
the Protein Sequence Database and the PDB and provides access to the PDB data
via computerized sequence searching and comparison methods.  The ALN database
provides a set of multiple sequence alignments of closely related protein
sequences from the PIR-International Protein Sequence Database.  The ECOLI
Nucleic Acid Sequence Database is a comprehensive, nonredundant, fully merged
(all recognized contigs are assembled into single sequence segments), and
annotated Escherichia coli genomic sequence database.  All entries in this
dataset are directly linked to the corresponding protein sequence products in
the PIR-International Protein Sequence Database.

All data on the ATLAS CD are represented in ASCII files that can be read
directly by any computer system that supports the ISO 9660 CD-ROM standard.
The sequence data files are in the NBRF format that can be accessed by a wide
variety of sequence analysis software, including the GenePro software by
Riverside Scientific, the Genetics Computer Group (GCG) package, the FASTA
series of database searching programs by William Pearson of the University of
Viginia, and any software using the READSEQ subroutines by Don G. Gilbert of
Indiana University.

Included on the ATLAS CD-ROM is the ATLAS Information Retrieval program that
provides direct and simultaneous retrieval from all of the databases on the
CD-ROM.  This program was recently featured on the CD-ROM produced in
cooperation with the journal Protein Science and the Protein Society.  In this
release of the ATLAS CD-ROM, versions of the ATLAS program are provided for
PC-DOS, VMS (VAX and alpha), and DEC(RISC) ULTRIX operating systems.  Support
will be added for SunOS and MacIntosh systems in the near future.

The ATLAS program provides an effective alternative to the Entrez program of
the National Center for Biotechnology Information (NCBI).  The ATLAS program
is designed on the principle that the sequence database annotations (protein
names, superfamily names, organism names, gene names, keywords, feature
descriptions, author's names, etc.) provide meaningful, biological information
that can be used to query the database directly.  These data provide direct
links between the nucleic acid and protein sequence database entries and
entries in other specialized data sets.  The ATLAS program provides an
environment where data entries from various databases can be linked dynamically
by simultaneous retrieval on these biological and bibliographic descriptors.

The program presents a command interface modeled on the DEC Command Language
(DCL) of the VMS operating system.  The "command/modifier" interface recognizes
truncated versions of the commands and modifiers.  The ATLAS command language
is similar to that employed in the NBRF PSQ and NAQ programs.  Those familiar
with these systems will experience very little difficulty in adapting to this
new program.  A menu interface is provided for PC-DOS systems.  A complete and
comprehensive Installation and User's Guide is provided on the CD-ROM and the
ATLAS program itself contains an integrated help facility.

ATLAS allows simultaneous retrieval on any selected subset (or all) of the
databases on the CD-ROM.  The user may select any combination of fields to
query on.  For example, a single query command will allow retrieval on the
TITLE and KEYWORDS fields of the GenBank and PIR-International databases. 
Queries can be refined by Boolean combination of sequential database queries. 
Queries are evaluated by an efficient substring searching algorithm.  For
example, a search on the term "globin" will retrieve the complete set of
hemoglobin, leghemoglobin, alpha-globin, beta-globin, myoglobin, and various
other globin and globin-like sequences.  This logic alleviates difficulties
resulting from usage of varying or nonstandard biological terminology within
the different databases. 

The ATLAS CD-ROM also contains specially configured versions of the FASTA and
TFASTA programs that allow the sequence databases on the CD-ROM to be searched
(by sequence) directly.  These programs will execute on PC-DOS, VAX/VMS, and
DEC ULTRIX systems.

Orders for the ATLAS CD-ROM are accepted, without prepayment, by FAX or E-mail.
For further information in the US and the Americas, please contact:

                Kathryn Sidman, Technical Services Coordinator
                      Protein Information Resource (PIR)
                National Biomedical Research Foundation (NBRF)
                           3900 Reservoir Rd., NW
                              Washington DC 20007
                             FAX: (202) 687-1662
                            phone: (202) 687-2121
                     E-mail: PIRMAIL at nbrf.georgetown.edu
                             PIRMAIL at gunbrf.bitnet

In Europe contact:

              Martinsried Institute for Protein Sequences (MIPS)
                    Max-Planck-Institute for Biochemistry
                          8033 Martinsried, Germany
                             FAX:  49 89 8578 2655
                            phone: 49 89 8578 2657
                   E-mail: mewes at ehpmic.mips.biochem.mpg.de

In Asia and Oceania contact:

           Japan International Protein Information Database (JIPID)
                         Science University of Tokyo
                        2669 Yamazaki, Noda 278 Japan
                             FAX:  81 47 122 1544 
                            phone: 81 48 124 1501
                       E-mail: Tsugita at JPNSUT31.BITNET




More information about the Bio-soft mailing list