[Molecular-modelling] Re: protein graft, pdb cleanup

Raoul Fleckman via molmodel%40net.bio.net (by raoul.fleckman At gmail.com)
Sun Nov 26 23:38:21 EST 2006


On 2006-11-27, wkadlo <gkowadlo At gmail.com> wrote:
> I did the grafting with a hypothetical protein in the program Discovery
> (Accelrys).
> so i have all the 3D coordinates, but the numbering in the file is
> wrong (each fragment has retained the numbering from the original pdb's
> that they came from).

you did "grafting" with "Discovery" which somehow placed the modeled in
bit into a reasonable 3D context (we hope...) and yet the software
having done this failed to simply renumber the atom-sequence-numbers
(and the residue sequence numbers for you)?  hmm...

ok..whatever. please keep in mind that the PDB format is old. it has its
roots in punch cards, (please don't ask what that is, if you're too
young to know).  therefore, its fields are column number sensitive; and
it is even possible to have numeric fields get large enough so that no
white space is left between fields.  therefore, the attached perl script
is designed in an odd (FORTRANish) way to deal with that.  if you just
run it on a PDB input without modification, it should just output its
input.  that is, you need to figure out what the new atom and residue
sequence numbers ($asn and $rsn, respectively in the script) should be
to continue on after the unchanging sequence and edit the script
accordingly - as per the comments in the script.

here's a second warning: i have observed software that connects residue
N to residue N+1 no matter how many Angstroms might separate them; and
i've seen software that refuses to do so.  so if your "i have all the 3D
coordinates" up there isn't quite right, get ready to view some
pretzels.

here's the script appended. it's probably only worth slightly more than
you paid for it; but maybe it'll save you some time.  good luck:

#-----------
#!/usr/bin/perl
#         1         2         3         4         5         6         7
#1234567890123456789012345678901234567890123456789012345678901234567890123456
#ATOM      1  CB  ASP A   8       -3.48   27.95   19.71  1.00  36.1      A     
#ATOM   1439  OW  WAT A1259      38.232  42.977   2.328  5.81  7.20   8    10
#ATOM      1  CB  ASP     8      -3.220  27.537  20.293  0.00 49.28   6
# put $cid in $segid if $segid is blank or $segid in $cid if $cid is blank
# change nothing if already there  
# assumes segid and cid are BOTH one character 
# add one to each residue number
# 1- 6  6 $id    card id
# 7-11  5 $asn   atom sequence number **
#12     1 void
#13-16  4 $anam  atom name
#17     1 $ali   alternate location indicator
#18-20  3 $rnam  residue name
#21     1 void
#22     1 $cid   chain identifier
#23-26  4 $rsn   residue sequence number **
#27     1 $ci    code for insertions (e.g. 66A, 66B)
#28-30  3 void   !! is this right ?? there's something up there!!
#31-38  8 $x     x coordinate
#39-46  8 $y     y coordinate
#47-54  8 $z     z coordinate
#55-60  6 $occ   occupany
#61-66  6 $bval  temperature factor
#67     1 void
#68-70  3 $fn    footnote
#71-72  2 void
#73     1 $segid
my $intemp = 'A6 A5 x A4 A A3 x A A4 A x3 A8 A8 A8 A6 A6 x A3 x2 A' ;
#since x generates a zero byte pack must use a different output pattern
my $outemp = 'A6 A5 A A4 A A3 A A A4 A A3 A8 A8 A8 A6 A6 A A3 A2 A A3';
my $K = ' ';
my $K2 = ' ' x 2;
my $K3 = ' ' x 3;
my $sol = $K3;
#a smoother way around this *might* be to make all fields gobble the x before
#it, that is, instead of 'A5 x A4' have 'A5 A5'

while(<>) {
  unless (/^ATOM/) {
    print;
    next;
  }
  chomp;
  my $len = length $_;
  if($len < 76) { $_ .= (' ' x (76 - $len)); }

  ($id,$asn,$anam,$ali,$rnam,$cid,$rsn,$ci,$x,$y,$z,$occ,
   $bval,$fn,$segid) = unpack($intemp,$_);

  # ----  here alter stuff with the fields (most likely: $asn and $rsn)
  #
  # for example, if the unchanging part of the protein ends with its
  # final residue number ($rsn) equal to 197, then you want to replace
  # the $rsn below with "$rsnx++" and initialize it before the above
  # "while(<>) {" with "my $rsnx = 198" 
  # ... and quite similarly for the atom sequence number ($asn)
  # 
  # then you might be able to do something like:
  #
  # this_script.pl new_attachment.pdb >> copy_of_constant_chain.pdb
  #
  # ----

  my $card = pack($outemp,$id,$asn,$K,$anam,$ali,$rnam,$K,$cid,
    $rsn,$ci,$K3,$x,$y,$z,$occ,$bval,$K,$K3,$K2,$segid,$sol);
  print $card, "\n" ;  
}





More information about the Molmodel mailing list