[Protein-analysis] Re: pdb-l: About PDB Files and Secondary
(by geoff from compbio.dundee.ac.uk)
Thu Apr 24 10:35:23 EST 2008
>>>> I'm doing a project on "Protein Contact Map Prediction" and I use some
>>>> features for nueral network's input, including Secondary Structure of a
>>>> given Amino Acid. There are several ways:
>>>> 1- getting dssp file for each pdb file (from ftp server)
>>> This method has the advantage of giving you a consistent definition of
>>> what is helix / sheet / turn / etc., but the disadvantage of sometimes
>>> missing short regions of sheet in particular.
>> Which region do you mean DSSP might miss ? The terminal ends of the
>> strands in a sheet, the terminal strands of the sheet or perhaps the
>> interrupted strands in a sheet ?
>> Would you please share further details, any statistics or any reference
>> that discusses these missing regions please ?
> I am glad you brought this up, because the truth is that I don't
> really know! I have only ever heard this said - I have not seen any
> statistics or examples. It would be great to get more information from
> the list about the identification (and validity) of short regions of
> secondary structure.
> Sorry for any confusion,
We did a comparison of DSSP, STRIDE and DEFINE in connection with
secondary structure prediction (see Cuff and Barton, 1999, Proteins, 34,
508-519) which may give you some information on this. For example, the
different helix-length distributions found for stride and DSSP - if you
can't get that paper let me know. You might also find the paper by
Colloc'h et al (http://www.ncbi.nlm.nih.gov/pubmed/8332595) interesting as
this compared a number of definition methods. The original DSSP paper by
Kabsch and Sander is a good read since this gives a very clear explanation
of how the method works (but don't print out the secondary structure
dictionary that is on the end!)
DSSP defines secondary structure by a set of rules based on hydrogen
bonding. It is a nice algorithm since it only has one adjustable
parameter (the H-bond energy cutoff) and the rules are relatively simple
and hierarchical. For example, beta-strands are not defined directly, but
only as part of a sheet which in turn is defined by joining beta ladders,
which are defined as sequences of beta bridges. For high "quality"
crystal structures, the DSSP definitions usually reproduce what you would
expect by eye, but as with any automatic procedure, for some structures
you may not agree with the definition. For example, loss of single
hydrogen bond due to a poorly modelled residue, can lead to truncation of
a helix, or splitting of a sheet into two sheets. Whether this is
important to you or not will depend on the problem you are interested in.
Stride starts with DSSP, but then applies geometric criteria to see if
strands or helices should be extended. For this reason, some people
prefer STRIDE definitions. DEFINE, which is not used these days worked
from completely different principles.
The advantage of using DSSP or STRIDE over the structure-author
definitions is they follow a consistent model. Of course, many authors
base their secondary structure definitions on those from DSSP.
Geoff Barton, Professor of Bioinformatics, School of Life Sciences
University of Dundee, Scotland, UK. geoff from compbio.dundee.ac.uk
Tel:+44 1382 385860/388731 (Fax:385764) www.compbio.dundee.ac.uk
The University of Dundee is registered Scottish charity: No.SC015096
More information about the Proteins