How to judge the best of 2 fits to 1 data set?
Theo Schoenmakers
theos at sci.kun.nl
Wed Jun 8 03:12:47 EST 1994
Dear knowledgeable statisticians,
I am in doubt regarding the analysis of the results of a nonlinear
regression analysis encountered during my research. I hope you can shed
some light on the following problem:
-My experiments display an "exponentially" decreasing curve. The analysis
program offers a one- or a two-exponential decay (plus an offset). I have
no clue as to which model is correct. So I try both, receiving from the
program the 3 or 5 parameters plus an R value (4 digits after the comma).
The curve contains 493 sampled data points, so I guess the degrees of
freedom are 490 and 488. Okay?
-Usually, the more complicated model fits the data better (not
surprisingly). The increase in R is from +- 0.9984 to 0.9994. I guess the
high R is also due to the great number of data points, since I think it is
calculated as R = sqrt(SS[regression]/(SS[regression]+SS[error])), where
SS is the sum of squares due to regression, or due to residual error.
-I had a look at: Motulsky,H.J. & Ransnas,L.A. Fitting curves to data
using nonlinear regression: a practical and nonmathematical review.
FASEB J. 1: 365-374 (1987). To test whether the R's differ significantly,
they recommend the F test. For two fits with an equal number of
parameters (bear with me), they suggest F = SS1/SS2, where both numerator
and denominator have N-V degrees of freedom (SS=residual sum of squares;
N=number of data points; V=number of parameters fit by the program). Since
this division eliminates the unknown sum of squares due to regression, I
can calculate an F, which ends up to be about 1.9 and is highly
significant for +- 489 degrees of freedom. The formula I used:
1/(R1*R1) - 1
F = ---------------
1/(R2*R2) - 1
Apart from the actually unequal number of parameters, I am not doing
anything completely wrong here, am I? I have the impression it's a
reasonable approximation to assume the equal number of parameters, due to
the large number of data points.
-Of course, the number of parameters isn't equal. So I should use another
formula they propose:
(SS1 - SS2) / (df1 - df2)
F = -------------------------
(SS2 / df2)
Again, I use the R's here: if I am right,
(1/(R1*R1)-1/(R2*R2)) / (df1-df2)
F = ---------------------------------
(1/(R2*R2) - 1) / df2
Model 1 is the one with fewer parameters. Here's something I don't
understand: if one is trying to explain the difference between the two SS,
why doesn't one add up the degrees of freedom? Using the formula, my data
yield a very large F (~130), since I am dividing by 2 in the numerator,
but by 488 in the denominator. I can't help having a "gut feeling" that this
isn't correct, since this formula yields such a vastly different F from
the one with the "equal number of parameters" approach. So should I use
the "reasonable approximation" of equal number of parameters, or is this
formula correct? And is this formula REALLY correct, or should one sum the
degrees of freedom?
-Finally, as I said "usually the more complicated model fits better". But,
in about 30% of all cases, there doesn't seem to be such a second
component or so, since the program diverges during the analysis. How
should one handle this complicating factor? Supposing the above analysis
yields F's that indicate statistically better fits for the two-exponential
model, what should one do with the 25% where it was impossible to describe
the data that way? Just ignore them? Or, since the one-exponential model
never showed that problem and converges in 100% of the cases, prefer the
simpler model since it can always describe the data, even if it isn't as
good? I haven't been able to find an answer to this problem.
Well, that's about it. Sorry about this long post, but I wanted to make
the problem clear. I hope I've succeeded in doing that and I would
appreciate it greatly if you could give me some hints on how to solve
this. If anyone cares to answer: I prefer email, since I don't read this
newsgroup often. However, if people mail me requesting a summary of mailed
responses, I promise to post this after a little while (if I get any
solutions to my riddle).
All the best, many thanks in advance to any willing to study this,
Theo
--
/---------- Dr. Theo J.M. Schoenmakers <theos at sci.kun.nl> ------------\
| Dept. Animal Physiology, Faculty of Science, University of Nijmegen |
| Toernooiveld, NL-6525 ED Nijmegen, The Netherlands, Europe |
\---------------------------------------------------------------------/
More information about the Bioforum
mailing list