Is the GD Rose paper out?

Scott Le Grand legrand at tesla.mbi.ucla.edu
Fri Jul 14 19:18:15 EST 1995


In article <6gwxdmdks7.fsf at hodgkin.mbi.ucla.edu>, arne at hodgkin.mbi.ucla.edu (Arne Elofsson) (Arne Elofsson) writes:
> In article <3u3t5m$pdb at saba.info.ucla.edu> legrand at tesla.mbi.ucla.edu (Scott Le Grand) writes:
> 
> > 
> > In article <1995Jul12.182420.12069 at alw.nih.gov>, johnk at spasm.niddk.nih.gov (John Kuszewski) writes: 
> > > I think that part of this is explained by his having "solved" these 
> > > structures in short pieces (because the program is computationally
> > > expensive).
> > 
> > This would be a lame excuse if true...  We live in an era of inexpensive
> > 300 Mhz desktop workstations...In fact, even within the LINUS paper, there are 
> > numerous instances of working with larger fragments i.e. the GroES prediction.
> > My biggest problem with the paper is the 12 day turnaround from submission
> > to acceptance.  There are numerous ambiguities in the description of the
> > methods (what proteins was it trained on?  How do you assemble overlapping
> > fragments?  How were the fragments for the results selected?  How consistent
> > are independent LINUS runs on the same fragment?  Why oh why did they neglect
> > to show the DHFR data?) which should have been caught by the referees and fixed 
> > by the authors.
> > 
> 
> Yeah I can agree that 12 days seems very very short. (any reviewers wanna identify
> themself ?) 
> 
> However it must be assumed that by time of submission the (and the DHFR)
> were the only simulations (with these parameters) done at time of submission.
> 
> They do not overlap overlapping fragments. And do not claim they do.

You're right, but if you look closely, you'll notice that the lengths
of some reported fragments varies rather wildly and seems to be selected to be 
the ends of various elements of secondary structure.  This is observer introduced
bias, no matter how small...  Examples include PCY 17-35, PCY 36-50, PCY
51-65, PCY 1-16, PCY 66-99, EGLIN 8-40, EGLIN 8-70, and EGLIN 40-70.

I'm also very interested to know what is generated by multiple runs on
the same fragment...  If they get precisely the same structure (0.0 A RMSD), 
then they aren't using a proper random number generator...
 
> I do not agree this paper is more unambigous than many other papers.
> The problem is that it was so extremely hyped out before the publication.

It's right in the middle of the spectrum of ambiguity.  It provides a good
overview of the method, but when it comes down to implementing it based on
the methods sections, there are unclear such as aspects of the potential
function (try to figure out >EXACTLY< the hydrophobic component, and just 
what is that 2nd sidechain atom on Thr?), and the locking of triplets.

> It is quite certain that they optimised their (very simple) parameters
> on this training set, or a part of it, but they do not claim anything
> else, so you can not hold them to that. (For instance what did you think 
> Jim did when he optimised his parameters for the 3d-1d paper ?)

Jim's folding of 434 repressor clearly is not prediction.  But Jim makes
it clear which protein he used to train his parameters.  This paper does
not give any insight as to how the parameters were developed and that is
tambiguity. 
 
> > > To start another thread, are models of that resolution useful for
> > > anything?
> > 
> > A wonderfully controversial question.  I'm in the school of thought that
> > if I look at a model and it "looks" like the native structure (I know, horribly
> > subjective), then it is useful no matter what the RMSD.  One of the big
> > problems with the results section in this paper is that the authors
> > usually do not show us a complete model of the predicted structure, but only
> > seemingly arbitrarily chosen fragments which "worked"...
> >  
> 
> If they did that (which I really doubt) it is fraud and scientific missconduct.
> I have the feeling that actually all they did was what is shown in 
> the paper. And if you want to look at stuctures everything is there in 
> molscript pictures. What more can you ask for ?

Well, they definitely have not shown >ALL< that they did.  They neglect to
show us even a single fragment of DHFR...  I wouldn't call it scientific
fraud and misconduct though...  They do show examples where the algorithm
fails, even though they try to talk their way out of it ie the packing of
the last helix of cytochrome b562...
 
> > > |> It is interesting that such a simple method seems to work that well.
> > > 
> > > Precisely.  I just saw Andrej Sali give a talk on MODELLER, and its
> > > output is amazingly good.  However, he's using a very large empirical
> > > database.  LINUS does extremely well for having so little starting
> > > information.
> > 
> > If LINUS is really predicting secondary structure as well as it seems
> > (I'm betting that it's not), then it does seem the the whole game's a lot 
> > simpler than we thought.  I can submit some apocryphal data here.  In my PhD 
> > work, I used a Sippl potential to predict several protein structures.  It did 
> > a wonderful job of secondary structure prediction on melittin, pancreatic 
> > polypeptide, and crambin (as good as LINUS I would daresay, but this was all 
> > helix and coil prediction and easy targets), but it did a miserable job packing 
> > things together.  This work is summarized in Molecular Simulations 13:299-320.
> > A lot of the figures in the LINUS paper look familiar to me.
> > 
> But you could not predict any sheets. (:

True :-).  The most impressive part of this paper is the prediction of
sheets...  The least impressive is the calculation of RMSDs between predicted
and X-ray helices...

> And even if they do not do such a great work on tertiary structure packing
> It is uch better than your phd work.

Certainly true of IFB, but the eglin structure looks to be about as much
a mess as my crambin (12.1 A RMSD versus 9.5 A)...  No other reasonably
complete tertiary structures are presented except for the GroES prediction
which remains just that...

> Skolnick also wrote in his 1994 papers (Kolinski & Skolnick, Proteins 1994)
> that their potential performed very well in prediction sec.str. They
> claimed to have a paper in preperation but atleast I have not seen it.

You're right.  I suspect that these potentials may be fairly good at
such prediction where the segment has a locally determined preferred
conformation, but is it performing better than PhD or GOR?
 
> Their "sec.str. prediction" is probably as good for approximately as many
> targets as Rose's. However their targets were less diverse and that was
> not at all the focus on the papers. (Skolnick also had to use slightly 
> different potential functions for one protein (ubiquitin ?))

Yep... 

> > > One last question:  Are there any other algorithms that predict
> > > secondary structure as well as LINUS?
> > 
> > A tough question.  That requires testing LINUS on a set of
> > proteins not involved in its development and comparing it to
> > the performance on those same proteins by PhD and GOR (assuming
> > GOR does not use them in its database either).  Ignore arguments
> > that LINUS is not based on amino acid identity.  If training set
> > data is involved in any way in the development of a method, then
> > it is not fair to rate the predictive power of a method by its
> > performance on training set data.  It is only fair to conclude
> > that the method has learned how to reproduce the training set.
> > The only fair test is on external data.  The upcoming predictive
> > targets Moult is putting together should be a wonderful example
> > of this.  
> > 
> 
> We do not actually need this, as long as people report what they do.

I disagree.  It is very difficult, perhaps impossible, to truly eliminate
all forms of bias in structure prediction when you're working with targets
with already known structures.  Nowhere was this more apparent than in
the all-around failures at homology modelling presented at last year's
workshop when the predictors had no clear idea what the target structure
was...  The results were nothing like the claims of their respective
papers...

> If you optimize your parameters so that they work very good on 
> a small set of proteins, (as probably Rose did). It is not bad
> science to report that. Even if it would not work on anything
> outside the test set it might be very useful and interesting.

It is not bad science to report it.  It is bad science to report it
as prediction.  The neural networks people went through this phase
many years ago demonstrating that one can converge a sufficiently
elaborate neural network to almost any training set...  We seem not 
to have gotten past it yet...
 
> However you are right that it is much more impressive to predict
> a completely independent test set.

That's what it will take to knock my socks off...

Overall, they score a hit on IFB, the prediction of helical secondary
structure in cytochrome b562 and myoglobin, and some of the secondary structure
of plastocyanin and eglin.  They fail at the task of tertiary structure 
prediction of Eglin, cytochrome c, and DHFR.  Nothing is shown of
complete structures for any other protein so nothing can be said about
it...

Scott





More information about the Proteins mailing list