GD Rose paper
jhp20 at cus.cam.ac.uk
Fri Jul 21 17:36:14 EST 1995
Whatever talks are going on, the protein folding problem is
not solved by LINUS. The approach used in LINUS lacks a lot
of necessary parts for solving the problem.
Also, the true context of the work should be stated as
'another approach in predicting small protein(or long peptide) sec.
structures with partial and imprecise topological prediction'
I would find it useful to have a very crude and fast preview for
any prediction work for any small size proteins before I really make
very serious prediction.
In article <3u71g7$bu5 at saba.info.ucla.edu>,
legrand at tesla.mbi.ucla.edu says...
>In article <6gwxdmdks7.fsf at hodgkin.mbi.ucla.edu>,
arne at hodgkin.mbi.ucla.edu (Arne Elofsson
>) (Arne Elofsson) writes:
>> In article <3u3t5m$pdb at saba.info.ucla.edu>
legrand at tesla.mbi.ucla.edu (Scott Le Grand) w
>> > In article <1995Jul12.182420.12069 at alw.nih.gov>,
johnk at spasm.niddk.nih.gov (John Kusze
>> > > I think that part of this is explained by his having "solved" these
>> > > structures in short pieces (because the program is
>> > > expensive).
>> > This would be a lame excuse if true... We live in an era of
>> > 300 Mhz desktop workstations...In fact, even within the LINUS
paper, there are
>> > numerous instances of working with larger fragments i.e. the
>> > My biggest problem with the paper is the 12 day turnaround from
>> > to acceptance. There are numerous ambiguities in the description
>> > methods (what proteins was it trained on? How do you assemble
>> > fragments? How were the fragments for the results selected?
>> > are independent LINUS runs on the same fragment? Why oh why
did they neglect
>> > to show the DHFR data?) which should have been caught by the
referees and fixed
>> > by the authors.
>> Yeah I can agree that 12 days seems very very short. (any reviewers
>> themself ?)
>> However it must be assumed that by time of submission the (and
>> were the only simulations (with these parameters) done at time of
>> They do not overlap overlapping fragments. And do not claim they
>You're right, but if you look closely, you'll notice that the lengths
>of some reported fragments varies rather wildly and seems to be
selected to be
>the ends of various elements of secondary structure. This is
>bias, no matter how small... Examples include PCY 17-35, PCY 36-50,
>51-65, PCY 1-16, PCY 66-99, EGLIN 8-40, EGLIN 8-70, and EGLIN 40-70.
>I'm also very interested to know what is generated by multiple runs
>the same fragment... If they get precisely the same structure (0.0 A
>then they aren't using a proper random number generator...
>> I do not agree this paper is more unambigous than many other
>> The problem is that it was so extremely hyped out before the
>It's right in the middle of the spectrum of ambiguity. It provides a
>overview of the method, but when it comes down to implementing it
>the methods sections, there are unclear such as aspects of the
>function (try to figure out >EXACTLY< the hydrophobic component,
>what is that 2nd sidechain atom on Thr?), and the locking of triplets.
>> It is quite certain that they optimised their (very simple) parameters
>> on this training set, or a part of it, but they do not claim anything
>> else, so you can not hold them to that. (For instance what did you
>> Jim did when he optimised his parameters for the 3d-1d paper ?)
>Jim's folding of 434 repressor clearly is not prediction. But Jim
>it clear which protein he used to train his parameters. This paper
>not give any insight as to how the parameters were developed and
>> > > To start another thread, are models of that resolution useful for
>> > > anything?
>> > A wonderfully controversial question. I'm in the school of thought
>> > if I look at a model and it "looks" like the native structure (I know,
>> > subjective), then it is useful no matter what the RMSD. One of the
>> > problems with the results section in this paper is that the authors
>> > usually do not show us a complete model of the predicted
structure, but only
>> > seemingly arbitrarily chosen fragments which "worked"...
>> If they did that (which I really doubt) it is fraud and scientific
>> I have the feeling that actually all they did was what is shown in
>> the paper. And if you want to look at stuctures everything is there in
>> molscript pictures. What more can you ask for ?
>Well, they definitely have not shown >ALL< that they did. They neglect
>show us even a single fragment of DHFR... I wouldn't call it scientific
>fraud and misconduct though... They do show examples where the
>fails, even though they try to talk their way out of it ie the packing of
>the last helix of cytochrome b562...
>> > > |> It is interesting that such a simple method seems to work that
>> > >
>> > > Precisely. I just saw Andrej Sali give a talk on MODELLER, and its
>> > > output is amazingly good. However, he's using a very large
>> > > database. LINUS does extremely well for having so little starting
>> > > information.
>> > If LINUS is really predicting secondary structure as well as it
>> > (I'm betting that it's not), then it does seem the the whole game's
>> > simpler than we thought. I can submit some apocryphal data here.
In my PhD
>> > work, I used a Sippl potential to predict several protein structures.
>> > a wonderful job of secondary structure prediction on melittin,
>> > polypeptide, and crambin (as good as LINUS I would daresay, but
this was all
>> > helix and coil prediction and easy targets), but it did a miserable
>> > things together. This work is summarized in Molecular
>> > A lot of the figures in the LINUS paper look familiar to me.
>> But you could not predict any sheets. (:
>True :-). The most impressive part of this paper is the prediction of
>sheets... The least impressive is the calculation of RMSDs between
>and X-ray helices...
>> And even if they do not do such a great work on tertiary structure
>> It is uch better than your phd work.
>Certainly true of IFB, but the eglin structure looks to be about as much
>a mess as my crambin (12.1 A RMSD versus 9.5 A)... No other
>complete tertiary structures are presented except for the GroES
>which remains just that...
>> Skolnick also wrote in his 1994 papers (Kolinski & Skolnick, Proteins
>> that their potential performed very well in prediction sec.str. They
>> claimed to have a paper in preperation but atleast I have not seen it.
>You're right. I suspect that these potentials may be fairly good at
>such prediction where the segment has a locally determined
>conformation, but is it performing better than PhD or GOR?
>> Their "sec.str. prediction" is probably as good for approximately as
>> targets as Rose's. However their targets were less diverse and that
>> not at all the focus on the papers. (Skolnick also had to use slightly
>> different potential functions for one protein (ubiquitin ?))
>> > > One last question: Are there any other algorithms that predict
>> > > secondary structure as well as LINUS?
>> > A tough question. That requires testing LINUS on a set of
>> > proteins not involved in its development and comparing it to
>> > the performance on those same proteins by PhD and GOR
>> > GOR does not use them in its database either). Ignore arguments
>> > that LINUS is not based on amino acid identity. If training set
>> > data is involved in any way in the development of a method, then
>> > it is not fair to rate the predictive power of a method by its
>> > performance on training set data. It is only fair to conclude
>> > that the method has learned how to reproduce the training set.
>> > The only fair test is on external data. The upcoming predictive
>> > targets Moult is putting together should be a wonderful example
>> > of this.
>> We do not actually need this, as long as people report what they do.
>I disagree. It is very difficult, perhaps impossible, to truly eliminate
>all forms of bias in structure prediction when you're working with
>with already known structures. Nowhere was this more apparent
>the all-around failures at homology modelling presented at last year's
>workshop when the predictors had no clear idea what the target
>was... The results were nothing like the claims of their respective
>> If you optimize your parameters so that they work very good on
>> a small set of proteins, (as probably Rose did). It is not bad
>> science to report that. Even if it would not work on anything
>> outside the test set it might be very useful and interesting.
>It is not bad science to report it. It is bad science to report it
>as prediction. The neural networks people went through this phase
>many years ago demonstrating that one can converge a sufficiently
>elaborate neural network to almost any training set... We seem not
>to have gotten past it yet...
>> However you are right that it is much more impressive to predict
>> a completely independent test set.
>That's what it will take to knock my socks off...
>Overall, they score a hit on IFB, the prediction of helical secondary
>structure in cytochrome b562 and myoglobin, and some of the
>of plastocyanin and eglin. They fail at the task of tertiary structure
>prediction of Eglin, cytochrome c, and DHFR. Nothing is shown of
>complete structures for any other protein so nothing can be said
More information about the Proteins