ANNOUNCE: pdbcat - convert PDB columns to fields

Andrew Dalke dalke at uxa.cso.uiuc.edu
Wed Oct 26 03:30:28 EST 1994


	PDBCAT - converts PDB files to a "field based format", and back again

Description:

  I have a problem with the standard PDB file format (used to describe
protein structure information).  It is column based and designed for
FORTRAN.  But I like to program in C/C++ and use Unix tools like awk,
but these tools are field based.

  This is a problem.  For example, the following are valid PDB ATOM
record entries:

ATOM     34  N   GLY 1   6       8.420  50.899  85.486  0.50 51.30   4  2PLV
ATOM      1  N   GLY    23       9.927  48.677  66.447  1.00 69.73      1LPE
                     ^no chain identifier          no footnote record^
HETATM12345  N   GLY    23       9.927  48.677  66.447  1.00 69.73      1LPE
      ^the atom number is so large that HETATM and 12345 form one field

The coordinates are fields 7, 8, and 9 in the first record and 6,
7, and 8 in the second.  In the third, the HETATM and atom number form
one field.

  So I wrote a C++ program to convert the PDB file to a field based
format by filling in any missing fields with either a default value,
or the '#' character.  The above gets converted to:

# ATOM 34 N # GLY 1 6 #    8.420   50.899   85.486   0.50  51.30 4 2PLV
# ATOM 1 N # GLY # 23 #    9.927   48.677   66.447   1.00  69.73 # 1LPE
# HETATM 12345 N G LY # 23 #    9.927   48.677   66.447   1.00  69.73 # LPE

which can then be used by field based tools without worrying about the
specifics of the columns.  Pdbcat can also convert that field based
format back to the column based one.

pdbcat {-fields | -columns} [[-f] files]

  Read any pdb file from stdin or list of files and convert the
 data to either a column based or field based pdb file.  A '#'
 represents an empty field.  This is useful for field based 
 tools like awk.  The default output is 'columns'.

 The command to convert to the field based format is:  pdbcat -fields
 To convert back to the column format:  pdbcat   (the -columns is optional)
 If any filenames are given, they are read one after the other.
 If no filenames are given, the input is stdin.

How to get it:

  The only way to get it is to use the WWW.  The URL is:
http://www.ks.uiuc.edu:1250/~dalke/utils/pdbcat.html

  That html file describes how to download, configure, and compile the
program as well as a couple examples.  Pdbcat will compile with the
C++ compiliers on IBM 6000, HP, and SGI.

Examples:

Find the centroid:

% pdbcat -fields polio.pdb | awk '{num++; x+=$10; y+=$11; z+=$12} 
     END {print "Centroid is:", x/num, y/num, z/num}'
Centroid is: 31.0427 37.0688 120.05


Move the data by a certain amount

(in this case, -31.0427 -37.0688 -120.05) Special note, the print
statement by itself prints the whole line. Also note that I can change
a field by just as if it was a varable.

% head -3 polio.pdb
ATOM      1  HT1 GLY     6       8.960  50.028  85.307  1.00   .00      VP1 
ATOM      2  HT2 GLY     6       8.912  51.471  86.202  1.00   .00      VP1 
ATOM      3  N   GLY     6       8.420  50.899  85.486   .50 51.30      VP1 

% pdbcat -fields polio.pdb | awk '{$10 -= 31.0427; $11 -= 37.0688; 
          $12 -= 120.05; print }' | pdbcat | head -3
ATOM      1 HT1  GLY     6     -22.083  12.959 -34.743  1.00  0.00      VP1    
ATOM      2 HT2  GLY     6     -22.131  14.402 -33.848  1.00  0.00      VP1    
ATOM      3 N    GLY     6     -22.623  13.830 -34.564  0.50 51.30      VP1    

						Andrew
						dalke at uiuc.edu




More information about the Bio-soft mailing list