# Diederichs Novel R available in CCP4?

Kay Diederichs dikay at sun2.ruf.uni-freiburg.de
Fri May 30 08:04:48 EST 1997

```Phil Evans (pre at mrc-lmb.cam.ac.uk) wrote:
<... deleted ...>
:
: The calculation is slightly different from that in Diederichs & Karplus
: program, as I compare observations to the weighted rather than the
: unweighted mean.

We didn't put the weighted formulas into the paper because that complicates
explanations, but we also use them in the distributed small program 'novel_r.f'.

: I also have not put in a calculation of "R_mergd" as
: I'm not totally convinved about the division of observations into random
: subsets for this statistic (maybe because I don't understand it
: properly).

Another possible way to define "R_mrgd (on F's)" would be to define it like
"R_meas" in eqn 2, replacing sqrt(n_h/(n_h-1)) by 2.*sqrt(1/(n_h-1)) and the I's
by F's. This would then simply reflect the fact that averaging of n_h
reflections improves the accuracy of the mean by a factor of sqrt(n_h).
The way we define it is equivalent, but has two advantages: 1) we test the
hypothesis that the mean gets closer to the truth by comparing two independent
estimates of the mean with each other, instead of just believing in the
sqrt-law. 2) the use of subsets of the redundant observations lends itself to
other uses, like the following: comparing the difference of two randomly
chosen subsets with the difference of two non-randomly chosen subsets.

Example: suppose you have measured the reflection (h,k,l)=(1,2,3) ten times
with values F=(10,12,7,15,9,10,8,6,12,7) in the order of data collection.
a) randomly choosing observation # 3,1,6,2,8 for subset A and the remaining
4,5,7,9,10 for subset B gives f_A =45/5=9 and f_B=51/5=10.2
b) non-randomly choosing the 5 reflections measured first, and comparing with
the 5 measured near the end of data collection, gives f_P=53/5 and f_Q=43/5
In this example, the difference of the non-random subsets is bigger than for
the non-random subsets (i.e. the crystal seems to decay, at least judged from
this one reflection). If the differences are accumulated over all unique
reflections, systematic errors like radiation decay can be identified.

Thus, I do think that having an R_mrgd (on F) output from SCALA would help users
to appreciate the real data quality. Whether you do the subset partitioning
or just use the sqrt(n) law really doesn't matter.

Yours, Kay
--
Kay Diederichs  (dikay at ruf.uni-freiburg.de and Kay.Diederichs at uni-konstanz.de)
Universitaet Freiburg, Institut fuer Biophysik, Albertstr. 23
D-79104 Freiburg, Germany   Tel. +49/(0)761/2035391 FAX +49/(0)761/2035016
and: Universitaet Konstanz, Fakultaet fuer Biologie, Postfach 5560, M656
D-78434 Konstanz, Germany   Tel. +49/(0)7531/884049 FAX +49/(0)7531/883183

```