How to judge the best of 2 fits to 1 data set?

Theo Schoenmakers theos at sci.kun.nl
Wed Jun 8 03:12:47 EST 1994


Dear knowledgeable statisticians,

I am in doubt regarding the analysis of the results of a nonlinear 
regression analysis encountered during my research. I hope you can shed 
some light on the following problem:

-My experiments display an "exponentially" decreasing curve. The analysis 
program offers a one- or a two-exponential decay (plus an offset). I have 
no clue as to which model is correct. So I try both, receiving from the 
program the 3 or 5 parameters plus an R value (4 digits after the comma). 
The curve contains 493 sampled data points, so I guess the degrees of 
freedom are 490 and 488. Okay?

-Usually, the more complicated model fits the data better (not 
surprisingly). The increase in R is from +- 0.9984 to 0.9994. I guess the 
high R is also due to the great number of data points, since I think it is 
calculated as R = sqrt(SS[regression]/(SS[regression]+SS[error])), where 
SS is the sum of squares due to regression, or due to residual error. 

-I had a look at: Motulsky,H.J. & Ransnas,L.A. Fitting curves to data 
using nonlinear regression: a practical and nonmathematical review. 
FASEB J. 1: 365-374 (1987). To test whether the R's differ significantly, 
they recommend the F test. For two fits with an equal number of 
parameters (bear with me), they suggest F = SS1/SS2, where both numerator 
and denominator have N-V degrees of freedom (SS=residual sum of squares; 
N=number of data points; V=number of parameters fit by the program). Since 
this division eliminates the unknown sum of squares due to regression, I 
can calculate an F, which ends up to be about 1.9 and is highly 
significant for +- 489 degrees of freedom. The formula I used:

     1/(R1*R1) - 1
F = ---------------
     1/(R2*R2) - 1

Apart from the actually unequal number of parameters, I am not doing 
anything completely wrong here, am I? I have the impression it's a 
reasonable approximation to assume the equal number of parameters, due to 
the large number of data points.

-Of course, the number of parameters isn't equal. So I should use another 
formula they propose:

    (SS1 - SS2) / (df1 - df2)
F = -------------------------
           (SS2 / df2)


Again, I use the R's here: if I am right,

    (1/(R1*R1)-1/(R2*R2)) / (df1-df2)
F = ---------------------------------
          (1/(R2*R2) - 1) / df2

Model 1 is the one with fewer parameters. Here's something I don't 
understand: if one is trying to explain the difference between the two SS, 
why doesn't one add up the degrees of freedom? Using the formula, my data 
yield a very large F (~130), since I am dividing by 2 in the numerator, 
but by 488 in the denominator. I can't help having a "gut feeling" that this 
isn't correct, since this formula yields such a vastly different F from 
the one with the "equal number of parameters" approach. So should I use 
the "reasonable approximation" of equal number of parameters, or is this 
formula correct? And is this formula REALLY correct, or should one sum the 
degrees of freedom?

-Finally, as I said "usually the more complicated model fits better". But, 
in about 30% of all cases, there doesn't seem to be such a second 
component or so, since the program diverges during the analysis. How 
should one handle this complicating factor? Supposing the above analysis 
yields F's that indicate statistically better fits for the two-exponential 
model, what should one do with the 25% where it was impossible to describe 
the data that way? Just ignore them? Or, since the one-exponential model 
never showed that problem and converges in 100% of the cases, prefer the 
simpler model since it can always describe the data, even if it isn't as 
good? I haven't been able to find an answer to this problem.

Well, that's about it. Sorry about this long post, but I wanted to make 
the problem clear. I hope I've succeeded in doing that and I would 
appreciate it greatly if you could give me some hints on how to solve 
this. If anyone cares to answer: I prefer email, since I don't read this 
newsgroup often. However, if people mail me requesting a summary of mailed 
responses, I promise to post this after a little while (if I get any 
solutions to my riddle).

All the best, many thanks in advance to any willing to study this,
Theo
--
/---------- Dr. Theo J.M. Schoenmakers <theos at sci.kun.nl> ------------\
| Dept. Animal Physiology, Faculty of Science, University of Nijmegen |
|     Toernooiveld, NL-6525 ED Nijmegen, The Netherlands, Europe      |
\---------------------------------------------------------------------/



More information about the Bioforum mailing list