3D-proteine structure according to available DNA-sequence

Andrej Sali sali at tamika.rockefeller.edu
Mon Jan 30 21:15:56 EST 1995

>     (*** preceding parts of article deleted ***)
> >   ... When sequence identity is about 40% or more, you can get a model by  
> >   comparative modeling that is essentially equivalent to a medium resolution  
> >   X-ray structure (2.5A; R=factor 25%). 
> >
>     (*** subsequent parts of article deleted ***)
> I'm sorry to have to insert my 2 cents here, but this is COMPLETELY incorrect.
> Having worked at a variety of resolutions in crystal structures, and done some
> homology modeling myself, I have yet to see a model that corresponded to
> anything better than a partially-refined 4.5 Angstrom resolution structure
> (read this as approx. 3x the error level that Dr. Sali suggests).
> I think that the term "essentially equivalent" is being stretched way too far
> here - the overall fold will be correct, some conserved loops will
> perhaps even be correct, but many of the side-chains and a significant portion
> of the backbone will exhibit very substantial deviations from their "true" 
> positions.

I think you probably overestimate the 'biologically meaningful' accuracy (not precision) (see  
below) of the experimental structures. Molecular biologists are interested in the accuracy, not  
precision (I am not using this quite in the sense of the NMR terminology). Also, please note  
that I used R=25%, which is not a well refined structure, but a medium-refined structure. And  
there are bad and good crystallographers, and bad and good modelers, and bad and less bad  
modeling programs, so these comparisons can be quite subjective without specifying all these  
factors (which I am not going to do).

If you compare structures of the same protein solved at medium resolution in 
different labs or independently refined subunits in oligomeric proteins, you
can find RMS mainchain differences close to 1A. Also, about 70% of sidechains
tend to be in different rotameric states. Even for 2.0A well refined structures,
10-15% of the sidechains are in different rotameric states, some of them in the 
core; obviously, these percentages would be much higher if you looked only at the
exposed regions where most of the differences are found, both in
experiment and in modelling. A case in point is our recent model of one
subunit in a hexamer of identical subunits solved at 2.7A (on templates with about 
60% seq id). The model was closer to some of the X-ray subunits than some of the
X-ray subunits were to each other, both in terms of mainchain and sidechain
conformations. Moreover, the usual RMS differences between NMR structures
are much more than 1A (usually 2A) and differences between NMR and X-ray
structures are usually also more than 1A, no matter what the resolution of the
X-ray analysis or the number of restraints used in the NMR refinement. Since
we are usually not interested specifically in a structure in a P212121 unit
cell (precision vs. accuracy), medium resolution medium refined experimental 
structures are __overall__ as good as homology models based on 40% seq id or 
more. This is not because modelers are smart, but simply because homologs
at that level are likely to have mainchain RMS (over more than 90% of residues) 
of about 1A (and because we can model sidechains with about 70-80% accuracy). Of course, 
there are some loops and gaps which are usually defined better by experiment than by
modelling, especially when they are long. On the other hand, many of
these segments are also badly defined in >2.5 A structures, plus
they may be genuinely flexible, so it is misleading to represent them
by one conformation only. As in experiment, we also know in
modelling which parts of the model are likely to be less accurate. I am not
saying that for absolutely all questions a homology model based on >40% identity is 
equivalent to a medium resolution medium refined X-ray structure; I am saying that this 
is true for many many applications. So, I stand by my original statement.

If anything, this debate illustrates the need for a rigorous evaluation of modeling and probably  
also a relevant comparison of X-ray structures determined independently at different  
resolutions/refinement stages. Hopefully, the proceedings of the prediction meeting in Asilomar  
last December (to be published this summer in Proteins) will go some way to achieve this.

Andrej Sali (an optimistic protein modeler)

> > 
> Phil Jeffrey (a protein crystallographer)
> --
> -------------------------------------------------------------------------------
> | Phil Jeffrey                                  |                             |
> | X-ray/Computer Manager, Crystallography Lab   | If you lie to the compiler, |
> | Memorial Sloan-Kettering Cancer Center, NYC   | it will get its revenge     |
> | phil at xray2.mskcc.org, p-jeffrey at ski.mskcc.org |     - Henry Spencer         |
> | Ph: (212) 639 2189   Fax: (212) 717 3066      |                             |
> -------------------------------------------------------------------------------

More information about the Bio-soft mailing list