All data?
Ethan A Merritt
merritt at u.washington.edu
Tue May 20 19:56:07 EST 1997
In article <Pine.SGI.3.95.970520141235.16958B-100000 at medxtal>,
james f head <jfh at MED-XTAL.BU.EDU> wrote:
>
>Thanks for the replies to my post. I had not intended to stand as an
>advocate for cutoffs although I see my mailing may be read that way. I
>stand informed on the value of the most poorly measured data i.e. that it
>is usually weak and poorly measured low values are better than no values.
>
>Of course this turns my question into why many papers, past and present
>use 2sig(F) as a cutoff in refinement? (Phil - I don't know the origins,
>but almost any current structural journal includes examples). Is the
>cutoff used just to give improved statistics?
In case you missed it, George and Randy both qualified their answers
by noting that a "proper" weighting scheme is needed to handle refinement
that includes data with great variation in I/sigma. This is a real issue,
as the commonly-used refinement packages do not work at all well if you
have a typical area detector data set and tell the program to use weights
of 1/sigma**2. So instead people generally use unit weights, but then
the weak refls (say, I/sigma < 1) are problematic. It can be a matter
of frustration as much as anything: "I know with proper weighting I could
use all the data, but where do I get the proper weights? Forget it -
I'll just ignore those problematic weak refls".
Of course there is an element of R-factor beautification as well.
The omission of weak data doesn't bother me as much as the recent tendency
to discard all the low resolution data from refinement. Again a matter of
R-factor beautification I think, although ignorance of bulk solvent models
may also enter into it. Still, there are some recent glaringly bad examples
of refinements that apparently chopped all data below 6A without a word
of explanation in the paper.
>If Rfree and Rcryst are
>indeed lower with a cutoff (as is my own experience and also indicated in
>papers where values are given with and without cutoff), and yet inclusion
>of all data brings us closer to the "true" structure, it suggests that
>these statistics, as generally used, are not necessarily a good guide to
>the "truth".
Careful! You are close to falling into the error of looking at Rfree as
a measure on some absolute scale. Try to rephrase all thoughts of Rfree
in your mind as a question of the first derivative of Rfree. That is,
if you expand your model, or do further refinement of the parameters of
your model, then a drop in Rfree tells you something was good. A rise
in Rfree tells you to go back and try something else instead. The
absolute value of Rfree and the end (or the start) contains very little
information by itself.
>How then to assess the "truth"? It seems the maps may be noisier with all
>data (presumably including difference maps), unless you use weighting in
>map production, as per George Sheldrick.
Absolutely one should use weighting in map production.
If you haven't been up til now, you may be in for a very pleasant surprise
to see how much it can improve your maps. The CCP4
package contains Randy's SigmaA program for this purpose, and I believe
that the new XPLOR 3.851 will also do SigmaA weighted maps. You have
already noted that shelxl will generate SigmaA weighted coefficients also.
Ethan A Merritt
merritt at u.washington.edu
More information about the Xtal-log
mailing list