[Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures

Geoff Barton via proteins%40net.bio.net (by geoff from compbio.dundee.ac.uk)
Thu Apr 24 10:35:23 EST 2008


>>>> I'm doing a project on "Protein Contact Map Prediction" and I use some
>>>> features for nueral network's input, including Secondary Structure of a
>>>> given Amino Acid. There are several ways:
>>>>
>>>> 1- getting dssp file for each pdb file (from ftp server)
>>>>
>>>
>>> This method has the advantage of giving you a consistent definition of
>>> what is helix / sheet / turn / etc., but the disadvantage of sometimes
>>> missing short regions of sheet in particular.
>>>
>>>
>>
>>  Which region do you mean DSSP might miss ?  The terminal ends of the
>> strands in a sheet, the terminal strands of the sheet or perhaps the
>> interrupted strands in a sheet ?
>>
>>  Would you please share further details, any statistics or any reference
>> that discusses these missing regions please ?
>
> I am glad you brought this up, because the truth is that I don't
> really know! I have only ever heard this said - I have not seen any
> statistics or examples. It would be great to get more information from
> the list about the identification (and validity) of short regions of
> secondary structure.
>
> Sorry for any confusion,
>
> Dan.

We did a comparison of DSSP, STRIDE and DEFINE in connection with 
secondary structure prediction (see Cuff and Barton, 1999, Proteins, 34, 
508-519) which may give you some information on this.  For example, the 
different helix-length distributions found for stride and DSSP - if you 
can't get that paper let me know.  You might also find the paper by 
Colloc'h et al (http://www.ncbi.nlm.nih.gov/pubmed/8332595) interesting as 
this compared a number of definition methods. The original DSSP paper by 
Kabsch and Sander is a good read since this gives a very clear explanation 
of how the method works (but don't print out the secondary structure 
dictionary that is on the end!)

DSSP defines secondary structure by a set of rules based on hydrogen 
bonding.  It is a nice algorithm since it only has one adjustable 
parameter (the H-bond energy cutoff) and the rules are relatively simple 
and hierarchical.  For example, beta-strands are not defined directly, but 
only as part of a sheet which in turn is defined by joining beta ladders, 
which are defined as sequences of beta bridges.  For high "quality" 
crystal structures, the DSSP definitions usually reproduce what you would 
expect by eye, but as with any automatic procedure, for some structures 
you may not agree with the definition.  For example, loss of single 
hydrogen bond due to a poorly modelled residue, can lead to truncation of 
a helix, or splitting of a sheet into two sheets.  Whether this is 
important to you or not will depend on the problem you are interested in.

Stride starts with DSSP, but then applies geometric criteria to see if 
strands or helices should be extended.  For this reason, some people 
prefer STRIDE definitions.  DEFINE, which is not used these days worked 
from completely different principles.

The advantage of using DSSP or STRIDE over the structure-author 
definitions is they follow a consistent model.  Of course, many authors 
base their secondary structure definitions on those from DSSP.

Geoff.

-- 
Geoff Barton, Professor of Bioinformatics,   School of Life Sciences
University of Dundee, Scotland, UK.       geoff from compbio.dundee.ac.uk
Tel:+44 1382 385860/388731 (Fax:385764)     www.compbio.dundee.ac.uk

The University of Dundee is registered Scottish charity: No.SC015096



More information about the Proteins mailing list