# R-Value and Free R-Value

Ditlev Brodersen ditlev at kemi.aau.dk
Mon Nov 25 14:14:34 EST 1996

>        Hi to all, my name is Gerard Pujadas and I wonder which is the
>difference between this two concepts which often are in the same PDB file:
>R VALUE and FREE R VALUE. Which may be used as a measure of the refinement
>
>        Thanks in advances. Yours sincerelly.
>
>        Gerard.
>

Dear Gerard,
The R VALUE entry in the PDB-file refers to the traditional
crystallographic R-factor which tells us something about the mean
difference between the structure factors (corresponding to the intensities
of the diffracted rays) calculated from the model (the PDB-coordinates in
this case) and the measured structure factors. The definition is
sum(num(num(Fo)-k*num(Fc)))
R =  ---------------------------
sum(num(Fo))

where sum is a sum over all Miller indices h, k, l. Fo and Fc are the
observed and calculated structure factors, respectively - both are
functions of (h, k, l). num() means numerical value, and k is a scaling
constant.
So clearly, as your model gets closer to the correct structure, the
R-value will drop as the Fc's approach the Fo's. But for refining the
structure, one normally uses a program that automatically fits the build
model to the data in the best possible way. This is done by adjusting
parameters such as bond lengths, bond angles, and so on always taking into
account the expected value of those parameters. In this process, however,
we face a serious problem: The R-value can be made arbitrarily low simply
by increasing the number of parameters to be fitted (In the same way as any
curve can be fitted arbitrarily close by choosing an appropriate
polynomium).
For this reason, Axel Brunger in 1992 (not long ago, huh!?) suggested the
use of the Free-R factor. The strategy is as follows: Before refinement, a
piece of the dataset (say 10%) is set aside, and the refinement is only
done with the remaining 90%. By calculating the R-factor during refinement
as above but ONLY summing over the test set of 10%, one can allways check
whether the refinement goes in the right direction and then stop when one
starts overfitting.
Consequently, the Free R-factor will allways be larger than the "normal"
one, but the Free R-factor is one of the best measurements of the quality
of the refinement.

Hope this cleared things up a bit...

Ditlev  :-)

-------------------------------------------------------------------
PhD-student
Ditlev Brodersen                   Phone: +45 8942 3871
Dept. of Struct. and Mol. Biology  Fax:   +45 8619 6199
c/o Dept. of Chemistry             Email: ditlev at kemi.aau.dk