[Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures

Rolf Huehne via proteins%40net.bio.net (by rhuehne from fli-leibniz.de)
Wed Apr 23 07:22:40 EST 2008

Narges Habibi wrote:
> Hi all,
> I'm doing a project on "Protein Contact Map Prediction" and I use some
> features for nueral network's input, including Secondary Structure of a
> given Amino Acid. There are several ways:
> 1- getting dssp file for each pdb file (from ftp server)
> 2- extracting from pdb file (The HELIX and SHEET and TURN section)
> 3- getting ss file from www.pdb.org (as I see the given sequences in this
> file don't match with the pdb files, why?)
> What do you suggest? What method is more accurate?

You should be careful if you use the secondary structure assignments
provided by the authors in the PDB file. You might expect that they are
of high quality because they were curated by people who know the
proteins quite well. But I noticed that there are overlaps between
different secondary structure elements, for example in entry '1AMR':

  HELIX   13  HM PHE A  352  LYS A  355  1
  TURN    16 T16 ILE A 353  GLN A 356     TYPE I

(The helix at position 352-355 overlaps with a turn at position 353-356!?)

I analyzed this a few weeks ago for all remediated PDB files and
detected overlaps within about 2000 entries.
A summary text file (15 KB) containing the PDB codes and the
corresponding number of overlapping residues is (temporarily) available
at this URL:


(Note: The numbers are not always correct because for simplification and
speed-up I didn't took into account numbering irregularities in the
non-border residues of an element, like insertion codes or nice residue
number sequences like "28,328,29" in entry '1BLB'.)

The full output (9 MB) is (temporarily) available at this URL:


Besides not all PDB entries contain secondary structure assignments.


