complete cross validation question
Yu Wai Chen
ywc at mrc-lmb.cam.ac.uk
Fri Nov 5 08:20:10 EST 1999
Dear all,
I would like to ask some questions on how one should perform a complete
C.V.
I have a data set of only 2500 reflections, have to refine a ~400 atoms
model. I have been using 20% of data (just about 500) for C.V. so that
the statistics is more meaningful. And so my refinement goes OK. How I
am at a latter stage when I intend to use all my data so that I can
refine individual B-factors. And I have partitioned my dataset into 10
non-overlapping cv sets each omitted 10%.
How should I actually carry on with, say, SA? I mean if one have only
one C.V. data set, one would run a S.A. with several trials and get the
model out of the one with lowest Rfree. Now if I do S.A. (say 5 trials
on each c.v. set) for the 10 c.v. sets, I get 50 S.A. results. I
suppose I should use all the 50 Rfree's to estimate the mean Rfree and
its s.d.? But which model do I pick then for further refinement? Shall
I still pick the one with lowest Rfree?
I am contemplating another approach, that is to switch off c.v. at this
stage and use all the data for refinement. And then do a posteriori
Rfree with a final cycle of S.A. when refinement is finished.
Please comment.
--
===================================================================
Yu Wai CHEN, Ph.D. .................. email:ywc at mrc-lmb.cam.ac.uk
Centre for Protein Engineering, tel:+44-(0)1223-402148
MRC Centre, Hills Rd, Cambridge CB2 2QH, UK fax:+44-(0)1223-402140
WWW homepage: http://www.mrc-cpe.cam.ac.uk/~ywc
More information about the X-plor
mailing list