NMR data treatment: wrong problem?

Gerard Kleijwegt gerard at rigel.bmc.uu.se
Tue May 3 11:14:27 EST 1994

In article <2q2rgk$1fo at mserv1.dl.ac.uk>, pjk at ciclid.csb.ki.se (Per Kraulis) writes:
|> One problem with the Rsym as defined above is that it will be dominated by
|> strong cross peaks. Maybe one should consider some other measure that
|> deals with this, or maybe give Rsym as a function of average
|> intensity. Also, one should give the number of NOE cross peaks
|> contributing to the Rsym, and the number of NOE cross peaks for which
|> the symmtry-related peaks could not be measured, for whatever reason.
|> All this needs testing, and I hope someone out there feels up to it...

In protein xtallography, careful papers include a table
of the following items as a function of resolution
(could be as a function of NOE intensity for you NMR

  - Rsym (a.k.a., internal Rmerge)
  - completeness (in our case, which percentage of
    the possible reflections were observed; in your
    case, which percentage of the NOEs, expected
    on the basis of the refined structure, was
    actually observed ?  of course, the latter
    criterion would mix model and data)
  - multiplicity (how often was a reflection measured)
  - percentage of the data with F > 2 (or 3) sigma(F);
    alternatively, <F/sigma> or <F>/<sigma> (<..>=average)

Someone else mentioned Axel Brunger's free Rfactor;
this statistic is rapidly becoming the major tool
for judging whether or not a structure was overfitted.
Another is simply the data-to-parameter ratio (although
I can't imagine that this will be particularly popular
in the NMR community ... ;-); it turns out that many
low-resolution X-ray protein structures have been
refined with more parameters than observations ...
(GJ Kleywegt & TA Jones, to be published - sorry,

If anyone thinks of a procedure to estimate sigmas
of NOE volumes, remember that you should also test
if the obtained sigmas are a good estimate of the
true sigmas (i.e., the distribution of the number
of NOEs with (int(j)-meanint)/sigma(j)= -5,-4,...,+5
should be normal with average zero and standard deviation 1).

As for Alexandre's R6, is there any statistical justification
for using this (e.g., are you using an objective function
with sixth powers of I in your refinement ?), or is
it just cosmetic (i.e., to get lower values) ?
It might be worthwhile to use a correlation coefficient
instead of Rfactors (contrary to R, a CC is independent
of the averages and scales of the two arrays you're
comparing).  We use this in xtallography as well (Axel
Brunger's PC-refinement, for instance, and in so-called
real-space electron-density averaging procedures).
Also, since the "NMR Rsym" will probably mostly
involve a maximum of 2 observations, you might want to
consider to calculate a "CCsym", i.e. the correlation
coefficient between the volumes 'below' and 'above'
the diagonal.  Another option is to collect duplicate
datasets ("multiple xtals", although you would probably
have to use the same sample ?) and to check how well
the two sets of NOE volumes "merge".  This might give
a much more realistic estimate of the errors.

A question (just out of curiosity): why do so few
NMR-protein structure papers contain a Ramachandran
plot ?

Note that I'm x-posting this to bionet.xtallography
(it's time for some x-pollination).


               Gerard J. Kleywegt              ___  
  Department of Molecular Biology              | |  /\
                Biomedical Centre             /\ -- ||
               Uppsala University             || || ||
                Box 590, S-751 24             || || ||
                  Uppsala, SWEDEN             || \/ --
                                              --  |__|
    E-mail: gerard at xray.bmc.uu.se
           "He's probably pining for the fiords ..."
           "Visit famous Uppsala, home of the ...
            er ... people who live in Uppsala !"
  The opinions in this mail/post are fictional.  Any similarity
   to actual opinions, living or dead, is purely coincidental.

More information about the Xtal-log mailing list