<- file stat 97nonlin.html -> r for non-linear fit
  • r for non-linear fits?
  • =======================Dave Krantz, 18 Mar 1997==========sse Message-ID: <199703181643.LAA02764@paradox.psych.columbia.edu> From: dhk@paradox.psych.columbia.edu (Dave Krantz) Subject: Re: What is the meaning/value of 'r' for non-linear fits? In reply to Abe Mantell (mantell@dorsai.org) (message appended below): To understand the meaning of r or r^2, it is absolutely essential to keep in mind that these statistics always involve a comparison of TWO models on a particular SCALE of squared residuals. In the case of simple linear versus simple exponential fits, one might have some of the following models under consideration: (1) Y = a + b*X + error LINEAR, Y scale (2) Y = A*exp(B*X) + error EXPONENTIAL, Y scale (3) Y = a + error CONSTANT, Y scale (4) Y = 0 + error ZERO, Y scale (5) Y = 1 + error LOG ZERO , Y scale (6) log(Y) = log(A) + B*X + error EXPONENTIAL, log(Y) scale (7) log(Y) = log(A) + error CONSTANT, log(Y) scale (8) log(Y) = 0 + error LOG ZERO, log(Y) scale Any r^2 statistic arises from comparison of two models, one of which is a specialization of the other, with respect to squared residuals (estimates of "error" in the above equations). The specialization relations in the above 8 models are: (1) --> --> (4) (3) (2) --> --> (5) (6) --> (7) --> (8) Note that r^2 makes sense only in comparing models connected by a path; there are 11 different r^2 values that arise (count them!) in comparing the above 8 models. I am NOT saying that some of the other 17 pairwise model comparisons are impossible, merely that it does not make sense to calculate r^2 for purposes of these other 17 comparisons. In particular, the model comparisons for (2) with (6), (3) with (7), and (5) with (8) each involve the question of which scale makes the most sense for errors, the Y scale or log(Y) scale. Do the random processes add to Y or multiply Y (add to log Y)? This deserves careful thought. The very high r^2 values that you cited most often arise in practice from comparisons of (1) with (4) and (6) with (8), since software for nonlinear fits often assumes (4) or (8) as the default null model. It is often the case that (4), (5), and (8) do not make any sense as baseline models, and in such cases, the large r^2 values based on comparisons with those models are essentially useless. Even if the r^2 values are sensible [ i.e., the models do make sense as baselines, or one is comparing (1) or (2) with (3), and (6) with (7) ] it is not straightforward to compare the two different r^2 values--rather, the focus must be on the underlying theory (form of model, nature of error) and on the qualitative aspects of the fit and the residuals. Finally, it does not seem right to assume that the Y scale is a priori more meaningful or understandable than log(Y). Simplicity depends on the total context of the problem and on what one is familiar with from other contexts. Dave Krantz (dhk@columbia.edu) -----original query read as follows:------- > A few colleagues and I were discussing the meaning of the correlation > coefficient for non-linear least-squares fits. It is our understanding > that 'r' gives the linear correlation which can be used for comparisons. > Whereas, for example, for an exponential fit, it is the linearization > (i.e. taking logs of the dependent) whose correlation is evaluated, and > thus has little use for deciding which fit is "best." > This discussion was driven by a problem whose linear least-squares fit > had r approx = 0.997 (or so), while for an exponential fit r approx = 0.998 > -- i.e. they were very close! So then the question, which one is really > better? Sure 0.998 is *slightly* better than 0.997, but the 0.997 IS for > a linear fit (and thus 'r' has true value!?), whereas 0.998 is not for > the exponential, but the linearized exponential (and thus 'r' is not as > useful or meaningful!?). What do you think/know about this?
  • r in nonlinear fitting
  • =======================Rich Ulrich, 13 Mar 1997==========ssc,ssm,sse Subject: Re: What is the meaning/value of 'r' for non-linear fits? Message-ID: <5g9d6b$7jq@usenet.srv.cis.pitt.edu> Abe Mantell (mantell@dorsai.org) wrote: : A few colleagues and I were discussing the meaning of the correlation : coefficient for non-linear least-squares fits. It is our understanding : that 'r' gives the linear correlation which can be used for comparisons. : Whereas, for example, for an exponential fit, it is the linearization : (i.e. taking logs of the dependent) whose correlation is evaluated, and : thus has little use for deciding which fit is "best." : This discussion was driven by a problem whose linear least-squares fit : had r approx = 0.997 (or so), while for an exponential fit r approx = 0.998 : -- i.e. they were very close! -- NOT necessarily "very close". When you look at numbers bounded by 1.0, you should also look at the gap. In this case, treating the numbers as exact, you have "error variances" of 1-R^2; or 0.004 compared to 0.006, which is maybe 50% larger.... See how much variation lies in your round-off error? If one extreme score for X corresponds to an extreme for Y, then I can give you an "r" about as big as you name, by doing two transformations that will separate the bulk of (x,y) from that one point. But that will not be a useful Model for most purposes, since it basically fits ONE point with precision, and passes, somewhere, through the squeezed-together block which has all the other points. How useful is a model going to be? IF you fit all the values under a transformed Model, and then back-transform all the Predicted values, you can look at the correlation of Y/Predicted in the original metric -- which could happen to be greater than simple correlation in the original metric, and it could be much better or worse than the correlation observed in the transformed metric. Or, instead of correlation, you can describe the average precision of fit. By what metric should you WANT to evaluate your error? (Does it matter if your "average error per city", say, is expressed as "10% of the number" or "1 million people"?) * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html