PDB - GCG or GENBANK accession number tables

Dan Jacobson danj at welchdev.welch.jhu.edu
Wed Jul 28 17:07:08 EST 1993


In article <28JUL199312344982 at aardvark.ucs.uoknor.edu> bfrank at aardvark.ucs.uoknor.edu (FRANK,BART) writes:
>Can someone suggest a fast method to obtain the amino acid and/or
>nucleotide sequences of particular proteins in pdb? Is there a 
>table listing for the pdb numbers and accession numbers for GCG or 
>Genbank/EMBL files?
>


For protein sequences you can search NRL_3D (a Protein Sequence-Structure 
Database)  via gopher.  The entire documentation set of each emtry is 
searchable so you can search by PDB accession number, or a title, 
keyword ....

Point your gopher client at merlot.welch.jhu.edu and select:

13. Search Databases at Hopkins (Vectors, Promoters, NRL-3D, EST, OMIM ../

  -->  10. Sequence Databases (Vectors, EPD, EST, NRL_3D, Kabat, Genbank)/

   -->  7.  NRL_3D Protein Sequence-Structure Database <?>

Now search for a PDB accession number or a topic of interest - for example
try:

kinase 

and you'll see - 


 -->  1.  2CPKE c-AMP-dependent protein kinase (cAPK) (catalytic.
      2.  2CPKI cAMP-dependent protein kinase inhibitor, chain I -.
      3.  6ENL Enolase (2-phospho-D-glycerate hydrolase) Complex.
      4.  1AK3A Nucleoside-triphosphate--adenylate kinase isoenzyme.
      5.  1AK3B Nucleoside-triphosphate--adenylate kinase isoenzyme.
      6.  1APK Protein kinase I (domain A) - Bovine #EC-number.
      7.  1BPK Protein kinase I (domain B) - Bovine #EC-number.
      8.  1CPKE Protein kinase, chain E - Mouse #EC-number 2.7.1.37.
      9.  1CPKI cAMP-dependent protein kinase inhibitor, chain I -.
      10. 2APK Protein kinase II (domain A) - Bovine #EC-number.
      11. 2BPK Protein kinase II (domain B) - Bovine #EC-number.
      12. 3ADK Adenylate kinase - Pig #EC-number 2.7.4.3.
      13. 3ENL Enolase (2-phospho-D-glycerate hydrolase) (apo) -.
      14. 3PGK Phosphoglycerate kinase complex with atp, Magnesium.
      15. 4ENL Enolase (2-phospho-D-glycerate hydrolase) (holo) -.
      16. 5ENL Enolase (2-phospho-D-glycerate hydrolase) Complex.
      17. 7ENL Enolase (2-phospho-D-glycerate hydrolase) Complex.



an entry looks as follows:


---------------

ENTRY           2CPKE      #Type Protein
TITLE           c-AMP-dependent protein kinase (cAPK) (catalytic
                  subunit), chain E - Mus musculus (recombinant
                  mouse) #EC-number 2.7.1.37
DATE            19-Feb-1993 #Sequence 19-Feb-1993 #Text 31-Mar-1993
PLACEMENT          0.0    0.0    0.0    0.0    0.0
COMMENT         PDB code: 2CPK
SOURCE          Mus musculus #Common-name house mouse
COMMENT         Note: "alpha" isoenzyme expressed in (escherichia
                  coli)
REFERENCE
   #Authors     Knighton D.R., Zheng J., Ten Eyck L.F., Ashford
                  V.A., Xuong N.H., Taylor S.S., Sowadski J.M.
   #Citation    coordinates deposited in Brookhaven National
                  Laboratory's Protein Data Bank
REFERENCE
   #Authors     Knighton D.R., Zheng J., Ten Eyck L.F., Ashford
                  V.A., Xuong N.H., Taylor S.S., Sowadski J.M.
   #Journal     Science (1991) 253:407
   #Title       Crystal structure of the catalytic subunit of cyclic
                  adenosine monophosphate-Dependent protein kinase.
REFERENCE
   #Authors     Knighton D.R., Zheng J., Ten Eyck L.F., Xuong N.H.,
                  Taylor S.S., Sowadski J.M.
   #Journal     Science (1991) 253:414
   #Title       Structure of a peptide inhibitor bound to the
                  catalytic subunit of cyclic adenosine
                  monophosphate-Dependent protein kinase.
REFERENCE
   #Authors     Slice L.W., Taylor S.S.
   #Journal     J. Biol. Chem. (1989) 264:20940
   #Title       Expression of the catalytic subunit of
                  cAMP-dependent protein kinase in escherichia coli.
COMMENT         Resolution: 2.7 angstroms
COMMENT         R-value: 0.18
COMMENT         Determination: X-ray diffraction
KEYWORDS        Transferase(phosphotransferase)
FEATURE
   2-17                    #Region helix (right hand alpha)\
   26-28                   #Region helix (right hand 3-10) (not
                             noted in ref 1)\
   62-67                   #Region helix (right hand alpha)\
   71-83                   #Region helix (right hand alpha)\
   114-121                 #Region helix (right hand alpha)\
   126-145                 #Region helix (right hand alpha)\
   155-157                 #Region helix (right hand 3-10) (not
                             noted in ref 1)\
   188-190                 #Region helix (right hand 3-10) (not
                             noted in ref 1)\
   193-196                 #Region helix (right hand alpha) (not
                             noted in ref 1)\
   204-219                 #Region helix (right hand alpha)\
   229-238                 #Region helix (right hand alpha)\
   249-258                 #Region helix (right hand alpha)\
   263-265                 #Region helix (right hand 3-10) (not
                             noted in ref 1)\
   275-278                 #Region helix (right hand alpha)\
   281-285                 #Region helix (right hand 3-10) (not
                             noted in ref 1)\
   288-292                 #Region helix (right hand alpha)\
   29-37,41-48,53-61,
   101-107,92-97           #Region beta sheet\
   148-149,158-160         #Region beta sheet\
   166-168,175-176         #Region beta sheet
SUMMARY       #Molecular-weight 39110  #Length 336  #Checksum  7934
SEQUENCE
                5        10        15        20        25        30
      1 V K E F L A K A K E D F L K K W E T P S Q N T A Q L D Q F D
     31 R I K T L G T G S F G R V M L V K H K E S G N H Y A M K I L
     61 D K Q K V V K L K Q I E H T L N E K R I L Q A V N F P F L V
     91 K L E F S F K D N S N L Y M V M E Y V A G G E M F S H L R R
    121 I G R F S E P H A R F Y A A Q I V L T F E Y L H S L D L I Y
    151 R D L K P E N L L I D Q Q G Y I Q V T D F G F A K R V K G R
    181 T W T L C G T P E Y L A P E I I L S K G Y N K A V D W W A L
    211 G V L I Y E M A A G Y P P F F A D Q P I Q I Y E K I V S G K
    241 V R F P S H F S S D L K D L L R N L L Q V D L T K R F G N L
    271 K N G V N D I K N H K W F A T T D W I A I Y Q R K V E A P F
    301 I P K F K G P G D T S N F D D Y E E E E I R V S I N E K C G
    331 K E F T E F

---------------


Thus you have the protein sequence.  The dna sequence is a bit harder - 
you would need to run this sequence through the PIR or Genbank (genpept)
Fasta or Blast e-mail-servers to find a match - and then use gopher to pull out
the full entries found with the Fasta/Blast searches.

If you've never heard of gopher write me a note and I'll send you some
information to get you started.

Best of luck,

Dan Jacobson

danj at welchgate.welch.jhu.edu

Johns Hopkins University




More information about the Bio-soft mailing list