An Easy way to retrieve Full PDB entries with Gopher

Dan Jacobson danj at welchgate.welch.jhu.edu
Mon Mar 1 10:35:47 EST 1993


In article <1993Feb25.162306.92 at ntet.nctr.fda.gov> wmelchior at ntet.nctr.fda.gov writes:
>I been using the Ohio State VMS gopher client to browse in the biogopher 
>hole at Johns Hopkins.  When I search in the PDB database, the files end 
>with the line
>
>"DATA TRUNCATED, FOR ENTIRE RECORD USE GOPHER RETRIEVE ..."
>
>What is "gopher retrieve"?  Is it a feature of clients other than VMS, or 
>is there something else that I don't know about.
>
>Thanks, Bill

What they mean is use ftp or gopher to go to the PDB ftp/gopher site, 
look through over a thousand file names spread over eleven directories 
to find the file containing your entry and then retrieve it.  I hear you
say "there has to be a better way" - well now there is :-).  Point your
gopher client at merlot.welch.jhu.edu and select the following directories:


   12. Search Databases at Welchlab (Vectors, Promoters, NRL-3D, EST,

      9.  Search and Retrieve entries from the PDB (Protein Data Bank) /
 
Where you will see:

          1.  About these searches.
          2.  Retrieve Full PDB Entries by Accession Number <?>
          3.  Search Protein Data Bank Headers (Brookahven) <?>
          4.  Search Protein Data Bank Headers (NIH) <?> 

  
These three searches are presented here in order to
make it easier to identify PDB entries of interest
and to retrieve the full entry (coordinates and all).

The searches:

      3.  Search Protein Data Bank Headers (Brookahven) <?>
      4.  Search Protein Data Bank Headers (NIH) <?> 

will search and return the header part of A PDB entry (where the
structure is described a little bit, citations given, authors listed
etc...).  The search at NIH allows you to use booleans (and, or, not) but
is run on a copy of the database that is not as up to date as the one
used for the search run at Brookahven (which does not have boolean 
capability).

Once you have used one or both of these searches you can retrieve the
full entries of the proteins of interest (3D coordinates etc...) by
selecting:

      2.  Retrieve Full PDB Entries by Accession Number <?>

and simply typing in the accession number(s) of the entry(s) that
you want to retrieve.  Now just select the results of this search
and the entry will be transferred to your computer.  PDB entries
are large (about 300 kbytes) so have a little patience as it will take 
a little while to import these files over the network.


For example - Lets say that I'm looking for the known 3D-structures
of protein kinases.   I would first use the search at NIH or Brookhaven
and type "kinase".  I read the results of that search and decide that I
want to retrieve the one of the entries for cAMP dependent protein
kinase which has the accession number - 1apk.  I select the search for
retrieving full entries, type "1apk" and select the resulting entry and
voila - the entry comes to my desk top.


Ok, now a few notes about some of the gopher clients.  If you are using
a Unix client - retrieve the full entry by pointing the selector
arrow at the entry of interest and typing "s".  This will retrieve
the entry straight to your hard disk without requiring you to view the
whole thing - but more importantly it avoids the insertion of bolding
characters (^[[7m^[[m) into the entry (some of the older unix clients 
don't have the "s" feature - update your client if necessary).  
The Mac clients GopherApp and Sextant will retrieve entries just fine, 
Turbogopher however has trouble - a bug I hope to track down soon.


Best of luck,

Dan Jacobson

danj at welchgate.welch.jhu.edu



More information about the Bioforum mailing list