PIR/GCG

POSTMASTER at NBRF.GEORGETOWN.EDU POSTMASTER at NBRF.GEORGETOWN.EDU
Tue Nov 2 12:01:40 EST 1993


In message <9310301907.AA04875 at net.bio.net> posted to Bio-Soft Jean-Loup Risler
wrote:

> Up to now I received the PIR database from MIPS on a magtape. Like many many
> others, I had troubles with the GCG files provided on the tape, which were
> fixed by re-running DBINDEX.
> 
> Now I receive PIR on their CD-ROM. Since I want to use GCG, I have to copy
> the files onto my VAX/VMS. *NOTE* this was a pain in the neck, a CD_MOUNT
> followed by a COPY doesn't work because the resulting file are "undefined".
> The CD_ACCESS program from P. Stockwell, available from the EMBL server,
> doesn't work either on these CD-ROMs (it's the only case I know of).
> 
> You MUST mount the CD-ROM with the following command:
> 
> CD_MOUNT/media=cdrom/UNDEFINED=(STREAM_LF:512)
> 
> It was rather hard to be aware of the UNDEFINED switch ... this may be
> useful to other people...
> 
> Well, finally it worked. The .REF and .SEQ files seem to be OK. Now, if I
> run PIRTOGCG or DBINDEX, I get the following message for *ALL* the
> sequences:
> 
> * no accession number for sequence XXXX *
> 
> Anyway the programs go on, I get the usual .NAMES, .OFFSET, etc...files
> whose size seem reasonable.
> 
> *BUT* I can't FETCH any sequence. If I try to fetch CCHU from PIR1, for
> example, I get  * no files in PIR1:CCHU *  .....
> 
> NOTE 1: I am lazy and I always wait for a certain time before installing the
> minor releases, waiting for other people to find the new bugs ... :-)
> Hence I'm still under GCG #7.0  Is this the reason?
> 
> NOTE 2: The files are NOT in CODATA format. They just look like the good
> old PIR files. However, the accession number is hidden in a line such as:
> c;Accession: xxxxxx

As we understand it you would like to use the GCG sequence analysis package
with the CD-ROM.  Although the ATLAS package is designed to be a standalone
database query/retreival system, the data may be used in conjunction with other
applications.  This summarizes the procedure we believe you are looking for.

Since the GCG index files do not exist on the CD you must off-load all data
files to disk and create the necessary auxiliary files.  The CD_ACCESS program
should work, but after in-house testing we have determined that it does not.
Apparently the CD format is incompatible with our specifications; we are
discussing this issue with the CD publisher.

In order to copy files from the CD to disk one must use the CD_MOUNT command
with the UNDEFINED_FAT (for non-VMS users that is "undefined file access type"
:-) qualifier as you have discovered.  Two types of files can be retrieved:

1) binary files (PIR1.INX, TERM.TDX)
  If you want the BINARY files such as PIR1.INX, use the following CD_MOUNT
  command with no further file modifications but substituting the correct
  device name for your CD-ROM:
    $ CD_MOUNT/MEDIA=CDROM/UNDEFINED_FAT=(FIXED:NONE:512) $1$dka100:

2) ASCII files (PIR1.SEQ, PIR1.REF)
  The process of downloading ASCII files is a little more complicated.
  The files MUST end up in their native format with the following file
  attributes:
    Record format:      Variable length, maximum XXX bytes
    Record attributes:  Carriage return carriage control
  where XXX is "490" for a .SEQ file and "500" for a .REF file depending on
  the PIR release.  After copying the files from CD, the DCL CONVERT command
  must be used to create files with the proper attributes.  The following is
  an FDL file describing the resulting file after using the CONVERT utility.
  Extract this file between the "----cut here----" marks (exclusive) and call
  it PIRCD.FDL.

----------------------------------cut here-----------------------------------
IDENT	"27-JAN-1993 13:30:42   VAX-11 FDL Editor"

RECORD
	CARRIAGE_CONTROL	carriage_return
	FORMAT			variable
----------------------------------cut here-----------------------------------

  Protocol:
    a) $ CD_MOUNT/MEDIA=CDROM/UNDEFINED_FAT=(STREAM:500) $1$dka100:
!! DO _NOT_ USE "STREAM_LF" !!
    b) $ COPY $1$dka100:[DATA.NBR]PIR1.SEQ PIR1.SEQTMP
       $ COPY $1$dka100:[DATA.NBR]PIR1.REF PIR1.REFTMP
       ...
    c) $ CONVERT/FDL=PIRCD.FDL PIR1.SEQTMP PIR1.SEQ
       $ CONVERT/FDL=PIRCD.FDL PIR1.REFTMP PIR1.REF
       ...
    d) $ CD_DISMOUNT $1$dka100:

  After this procedure use the DCL command DIRECTORY/FULL to make certain the
  ASCII files have the proper file attributes.

Please use the above protocols and report any problems to us.  We appreciate
notification of a problem with the ATLAS product.  If you like, we will supply
you with a detailed DCL COMMAND procedure to facilitate file manipulation for
future releases.
------------------------------------------------------------------------
                                 Christopher Marzec
                                 MARZEC at NBRF.Georgetown.Edu

                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMAST at GUNBRF.BITNET
                                 POSTMASTER at NBRF.GEORGETOWN.EDU




More information about the Bio-soft mailing list