ANNOUNCE: pdbcat - convert PDB columns to fields
Andrew Dalke
dalke at uxa.cso.uiuc.edu
Wed Oct 26 03:30:28 EST 1994
PDBCAT - converts PDB files to a "field based format", and back again
Description:
I have a problem with the standard PDB file format (used to describe
protein structure information). It is column based and designed for
FORTRAN. But I like to program in C/C++ and use Unix tools like awk,
but these tools are field based.
This is a problem. For example, the following are valid PDB ATOM
record entries:
ATOM 34 N GLY 1 6 8.420 50.899 85.486 0.50 51.30 4 2PLV
ATOM 1 N GLY 23 9.927 48.677 66.447 1.00 69.73 1LPE
^no chain identifier no footnote record^
HETATM12345 N GLY 23 9.927 48.677 66.447 1.00 69.73 1LPE
^the atom number is so large that HETATM and 12345 form one field
The coordinates are fields 7, 8, and 9 in the first record and 6,
7, and 8 in the second. In the third, the HETATM and atom number form
one field.
So I wrote a C++ program to convert the PDB file to a field based
format by filling in any missing fields with either a default value,
or the '#' character. The above gets converted to:
# ATOM 34 N # GLY 1 6 # 8.420 50.899 85.486 0.50 51.30 4 2PLV
# ATOM 1 N # GLY # 23 # 9.927 48.677 66.447 1.00 69.73 # 1LPE
# HETATM 12345 N G LY # 23 # 9.927 48.677 66.447 1.00 69.73 # LPE
which can then be used by field based tools without worrying about the
specifics of the columns. Pdbcat can also convert that field based
format back to the column based one.
pdbcat {-fields | -columns} [[-f] files]
Read any pdb file from stdin or list of files and convert the
data to either a column based or field based pdb file. A '#'
represents an empty field. This is useful for field based
tools like awk. The default output is 'columns'.
The command to convert to the field based format is: pdbcat -fields
To convert back to the column format: pdbcat (the -columns is optional)
If any filenames are given, they are read one after the other.
If no filenames are given, the input is stdin.
How to get it:
The only way to get it is to use the WWW. The URL is:
http://www.ks.uiuc.edu:1250/~dalke/utils/pdbcat.html
That html file describes how to download, configure, and compile the
program as well as a couple examples. Pdbcat will compile with the
C++ compiliers on IBM 6000, HP, and SGI.
Examples:
Find the centroid:
% pdbcat -fields polio.pdb | awk '{num++; x+=$10; y+=$11; z+=$12}
END {print "Centroid is:", x/num, y/num, z/num}'
Centroid is: 31.0427 37.0688 120.05
Move the data by a certain amount
(in this case, -31.0427 -37.0688 -120.05) Special note, the print
statement by itself prints the whole line. Also note that I can change
a field by just as if it was a varable.
% head -3 polio.pdb
ATOM 1 HT1 GLY 6 8.960 50.028 85.307 1.00 .00 VP1
ATOM 2 HT2 GLY 6 8.912 51.471 86.202 1.00 .00 VP1
ATOM 3 N GLY 6 8.420 50.899 85.486 .50 51.30 VP1
% pdbcat -fields polio.pdb | awk '{$10 -= 31.0427; $11 -= 37.0688;
$12 -= 120.05; print }' | pdbcat | head -3
ATOM 1 HT1 GLY 6 -22.083 12.959 -34.743 1.00 0.00 VP1
ATOM 2 HT2 GLY 6 -22.131 14.402 -33.848 1.00 0.00 VP1
ATOM 3 N GLY 6 -22.623 13.830 -34.564 0.50 51.30 VP1
Andrew
dalke at uiuc.edu
More information about the Bio-soft
mailing list