GD Rose paper

Jong jhp20 at cus.cam.ac.uk
Fri Jul 21 17:36:14 EST 1995


Whatever talks are going on, the protein folding problem is
not solved by LINUS. The approach used in LINUS lacks a lot
of necessary parts for solving the problem. 
Also, the true context of the work should be stated as  
'another approach in predicting small protein(or long peptide) sec. 
structures with partial and imprecise topological prediction'

I would find it useful to have a very crude and fast preview for
any prediction work for any small size proteins before I really make 
very serious prediction.

Jong


In article <3u71g7$bu5 at saba.info.ucla.edu>, 
legrand at tesla.mbi.ucla.edu says...
>
>In article <6gwxdmdks7.fsf at hodgkin.mbi.ucla.edu>, 
arne at hodgkin.mbi.ucla.edu (Arne Elofsson
>) (Arne Elofsson) writes:
>> In article <3u3t5m$pdb at saba.info.ucla.edu> 
legrand at tesla.mbi.ucla.edu (Scott Le Grand) w
>rites:
>> 
>> > 
>> > In article <1995Jul12.182420.12069 at alw.nih.gov>, 
johnk at spasm.niddk.nih.gov (John Kusze
>wski) writes: 
>> > > I think that part of this is explained by his having "solved" these 
>> > > structures in short pieces (because the program is 
computationally
>> > > expensive).
>> > 
>> > This would be a lame excuse if true...  We live in an era of 
inexpensive
>> > 300 Mhz desktop workstations...In fact, even within the LINUS 
paper, there are 
>> > numerous instances of working with larger fragments i.e. the 
GroES prediction.
>> > My biggest problem with the paper is the 12 day turnaround from 
submission
>> > to acceptance.  There are numerous ambiguities in the description 
of the
>> > methods (what proteins was it trained on?  How do you assemble 
overlapping
>> > fragments?  How were the fragments for the results selected?  
How consistent
>> > are independent LINUS runs on the same fragment?  Why oh why 
did they neglect
>> > to show the DHFR data?) which should have been caught by the 
referees and fixed 
>> > by the authors.
>> > 
>> 
>> Yeah I can agree that 12 days seems very very short. (any reviewers 
wanna identify
>> themself ?) 
>> 
>> However it must be assumed that by time of submission the (and 
the DHFR)
>> were the only simulations (with these parameters) done at time of 
submission.
>> 
>> They do not overlap overlapping fragments. And do not claim they 
do.
>
>You're right, but if you look closely, you'll notice that the lengths
>of some reported fragments varies rather wildly and seems to be 
selected to be 
>the ends of various elements of secondary structure.  This is 
observer introduced
>bias, no matter how small...  Examples include PCY 17-35, PCY 36-50, 
PCY
>51-65, PCY 1-16, PCY 66-99, EGLIN 8-40, EGLIN 8-70, and EGLIN 40-70.
>
>I'm also very interested to know what is generated by multiple runs 
on
>the same fragment...  If they get precisely the same structure (0.0 A 
RMSD), 
>then they aren't using a proper random number generator...
> 
>> I do not agree this paper is more unambigous than many other 
papers.
>> The problem is that it was so extremely hyped out before the 
publication.
>
>It's right in the middle of the spectrum of ambiguity.  It provides a 
good
>overview of the method, but when it comes down to implementing it 
based on
>the methods sections, there are unclear such as aspects of the 
potential
>function (try to figure out >EXACTLY< the hydrophobic component, 
and just 
>what is that 2nd sidechain atom on Thr?), and the locking of triplets.
>
>> It is quite certain that they optimised their (very simple) parameters
>> on this training set, or a part of it, but they do not claim anything
>> else, so you can not hold them to that. (For instance what did you 
think 
>> Jim did when he optimised his parameters for the 3d-1d paper ?)
>
>Jim's folding of 434 repressor clearly is not prediction.  But Jim 
makes
>it clear which protein he used to train his parameters.  This paper 
does
>not give any insight as to how the parameters were developed and 
that is
>tambiguity. 
> 
>> > > To start another thread, are models of that resolution useful for
>> > > anything?
>> > 
>> > A wonderfully controversial question.  I'm in the school of thought 
that
>> > if I look at a model and it "looks" like the native structure (I know, 
horribly
>> > subjective), then it is useful no matter what the RMSD.  One of the 
big
>> > problems with the results section in this paper is that the authors
>> > usually do not show us a complete model of the predicted 
structure, but only
>> > seemingly arbitrarily chosen fragments which "worked"...
>> >  
>> 
>> If they did that (which I really doubt) it is fraud and scientific 
missconduct.
>> I have the feeling that actually all they did was what is shown in 
>> the paper. And if you want to look at stuctures everything is there in 
>> molscript pictures. What more can you ask for ?
>
>Well, they definitely have not shown >ALL< that they did.  They neglect 
to
>show us even a single fragment of DHFR...  I wouldn't call it scientific
>fraud and misconduct though...  They do show examples where the 
algorithm
>fails, even though they try to talk their way out of it ie the packing of
>the last helix of cytochrome b562...
> 
>> > > |> It is interesting that such a simple method seems to work that 
well.
>> > > 
>> > > Precisely.  I just saw Andrej Sali give a talk on MODELLER, and its
>> > > output is amazingly good.  However, he's using a very large 
empirical
>> > > database.  LINUS does extremely well for having so little starting
>> > > information.
>> > 
>> > If LINUS is really predicting secondary structure as well as it 
seems
>> > (I'm betting that it's not), then it does seem the the whole game's 
a lot 
>> > simpler than we thought.  I can submit some apocryphal data here. 
 In my PhD 
>> > work, I used a Sippl potential to predict several protein structures. 
 It did 
>> > a wonderful job of secondary structure prediction on melittin, 
pancreatic 
>> > polypeptide, and crambin (as good as LINUS I would daresay, but 
this was all 
>> > helix and coil prediction and easy targets), but it did a miserable 
job packing 
>> > things together.  This work is summarized in Molecular 
Simulations 13:299-320.
>> > A lot of the figures in the LINUS paper look familiar to me.
>> > 
>> But you could not predict any sheets. (:
>
>True :-).  The most impressive part of this paper is the prediction of
>sheets...  The least impressive is the calculation of RMSDs between 
predicted
>and X-ray helices...
>
>> And even if they do not do such a great work on tertiary structure 
packing
>> It is uch better than your phd work.
>
>Certainly true of IFB, but the eglin structure looks to be about as much
>a mess as my crambin (12.1 A RMSD versus 9.5 A)...  No other 
reasonably
>complete tertiary structures are presented except for the GroES 
prediction
>which remains just that...
>
>> Skolnick also wrote in his 1994 papers (Kolinski & Skolnick, Proteins 
1994)
>> that their potential performed very well in prediction sec.str. They
>> claimed to have a paper in preperation but atleast I have not seen it.
>
>You're right.  I suspect that these potentials may be fairly good at
>such prediction where the segment has a locally determined 
preferred
>conformation, but is it performing better than PhD or GOR?
> 
>> Their "sec.str. prediction" is probably as good for approximately as 
many
>> targets as Rose's. However their targets were less diverse and that 
was
>> not at all the focus on the papers. (Skolnick also had to use slightly 
>> different potential functions for one protein (ubiquitin ?))
>
>Yep... 
>
>> > > One last question:  Are there any other algorithms that predict
>> > > secondary structure as well as LINUS?
>> > 
>> > A tough question.  That requires testing LINUS on a set of
>> > proteins not involved in its development and comparing it to
>> > the performance on those same proteins by PhD and GOR 
(assuming
>> > GOR does not use them in its database either).  Ignore arguments
>> > that LINUS is not based on amino acid identity.  If training set
>> > data is involved in any way in the development of a method, then
>> > it is not fair to rate the predictive power of a method by its
>> > performance on training set data.  It is only fair to conclude
>> > that the method has learned how to reproduce the training set.
>> > The only fair test is on external data.  The upcoming predictive
>> > targets Moult is putting together should be a wonderful example
>> > of this.  
>> > 
>> 
>> We do not actually need this, as long as people report what they do.
>
>I disagree.  It is very difficult, perhaps impossible, to truly eliminate
>all forms of bias in structure prediction when you're working with 
targets
>with already known structures.  Nowhere was this more apparent 
than in
>the all-around failures at homology modelling presented at last year's
>workshop when the predictors had no clear idea what the target 
structure
>was...  The results were nothing like the claims of their respective
>papers...
>
>> If you optimize your parameters so that they work very good on 
>> a small set of proteins, (as probably Rose did). It is not bad
>> science to report that. Even if it would not work on anything
>> outside the test set it might be very useful and interesting.
>
>It is not bad science to report it.  It is bad science to report it
>as prediction.  The neural networks people went through this phase
>many years ago demonstrating that one can converge a sufficiently
>elaborate neural network to almost any training set...  We seem not 
>to have gotten past it yet...
> 
>> However you are right that it is much more impressive to predict
>> a completely independent test set.
>
>That's what it will take to knock my socks off...
>
>Overall, they score a hit on IFB, the prediction of helical secondary
>structure in cytochrome b562 and myoglobin, and some of the 
secondary structure
>of plastocyanin and eglin.  They fail at the task of tertiary structure 
>prediction of Eglin, cytochrome c, and DHFR.  Nothing is shown of
>complete structures for any other protein so nothing can be said 
about
>it...
>
>Scott
>
>




More information about the Proteins mailing list