- file stat 97logis.html ->
Logistic Regr - 5 comments
In this file about logistic regression,
there are notes on:
Tests - of FIT Nichols
Tests - betas Ulrich
Classification cutoffs Conroy
REFs Logistic/Disc function Helberg
Normality assumption Ulrich
Extreme splits in group Ulrich
*----------------
Tests in logistic (FIT)
=======================David Nichols, 01 jun 1993==========spss
Subject: Re: logistic regression goodness of fit indicators.
Message-ID:
In article
shawn@shadowfax.ori.org (-Shawn Boles-) writes:
>
>In the SPSS Advanced statistics User's Guide (1990) the Logistic Regression
>Section (2.11 pp.52-53) discusses the use of the following table for
>assessing the overall goodness of fit for a model (starting against a null
>model):
>
> Chi-Square df Significance
> -2 Log Likelihood 48.126 47 .4274 <- good model
> Model Chi-Square 22.126 5 .0005
> Improvement 22.126 5 .0005
> Goodness of Fit 46.790 47 .4812 <- good model
>
>The point made is that the first & last rows of the table are used to
>decide that the model does not differ from a perfect (i.e., saturated)
>model. The middle two rows are then used to test the significance of the
>coefficents themselves given an adequate model.
>
>My question is what interpretation does one make as to the adequacy of a model
>if the following table is obtained? Does the -2LL or the Goodness of Fit
>test control the decision? Or does the fact that they contradict one another
>point to some flaw in the model construction itself?
>
> Chi-Square df Significance
> -2 Log Likelihood 255.720 1 .0005 < - model no good
> Model Chi-Square 3.177 1 .0747 < - coef. <~> 0
(p<.10) if good model
> Improvement 3.177 1 .0747 " " "
> Goodness of Fit 188.000 186 .4452 < - good model
>
>direct replies appreciated,
>
>thanks in advance,
>
>
>Shawn Boles
>
>Oregon Research Institute Internet: shawn@ori.org
>1899 Willamette St. Voice: (503) 484-2123 Ext. 172
>Eugene, Oregon /97401 USA Fax: (503) 484-1108
>
>....................... non nova, sed novae .........................
Two things here, one a user misreading, the other a since corrected
mistake by SPSS (and some of the research literature).
First, only the first row of the table is represented as comparing the
current model with a saturated model in the User's Guide. It says that
the last line leads to the same conclusion as the first. You can't compare
Pearson chi-squares for any kind of nested models of any type and get
a chi-square distributed variable, which is the idea behind the statement
about the -2LL statistic.
Second, the -2LL idea is wrong. It is mistakenly stated in a number of
places in the literature (I know we relied heavily on John Fox's Wiley
series book _Linear Statistical Models and Related Methods_) that you
can compare a particular model with a saturated model via subtraction
of one -2LL from the other, with the difference being a chi-square
variable under the null hypothesis. This is not true for the situation
represented in the SPSS LOGISTIC REGRESSION procedure, in which you
have generally as many cells in your design as cases, because the
approximation there is based on asymptotics as the number of cases in
a cell becomes large. The difference between two nested models involving
some parameters is indeed chi square distributed on df equal to the
number of parameters by which the two models differ. You can check
McCullagh and Nelder's 2nd edition of _Generalized Linear Models_ for
a more precise discussion of the issue. Shelby Haberman proved the
nested result originally, I believe.
I posted a long discussion of this some time ago. What we have done for
version 5 is to print just the measures themselves (they are functions
of deviance and Pearson residuals, respectively), without any df or
significance approximation, since we don't know what the distributions
are under the null hypothesis. These should not be used as test
statistics. In more general interpretational terms, the two kinds of
residuals are sometimes sensitive to different things. I'm more
accustomed to seeing situations in which the Pearson residuals are
large while the deviance ones are not. This is often caused by one
or two very badly misfitting cases, which are in effect weighted more
heavily by the Pearson than by the deviance.
If anyone needs any further explanation, let me know.
--
*--------
Testing - betas.
=======================Rich Ulrich, 28 Jul 1997==========ssc
Subject: Logistic regression
Message-ID: <5rirki$4f6@usenet.srv.cis.pitt.edu>
Marc BUSSON (busson@NEPTUNE.CHU-STLOUIS.FR) wrote:
: Hello,
: Question about logistic regression.
: Suppose, you have a model with 3 independant covariates A,B,C.
: The Beta coefficients for A and B are significantly different from zero.
: You remove C from the model, and you observe that the likelihood of the
: model decreased significantly, so you decide to keep the first model with
: A,B AND C. but what do you do with the beta of C. In particular, if its
- I think your question should be answered by noting that (a) the
change in likelihood is, in general, a better test than the test based
on the asymptotic variance. Thus, your premise is wrong, that C is
"not significant". And, (b) as someone has noted, when two tests come
out different, your problem is probably screwed up in some way, so you
should be slow about drawing ANY conclusions. Finally, (c), if the
coefficient for C is large, sig. small, then C is greatly confounded
with some other variable or two. In that case, whether it is in the
equation should depend on knowledge of the variables and what can
make up an intelligible model - and NOT much on the p-level.
*--------
Classification (cutoffs)
=======================Ronan Conroy, 11 Apr 1997==========ssc
Message-ID: <199704111423.PAA25267@gate.rcsi.ie>
From: Ronan Conroy
Subject: Re: Choice of cutoff points in logistic regression
Anyone interested in the thorny issues involved in using logistic
regression for classification whould have a look at the paper by Frank
Harrell and his colleagues in Stats in Medicine
Harrell FE, Lee KL, Mark DB
Multivariate prognostic models: issues in developing models, evaluating
assumptions and adequacy, and measuring and reducing errors. Statistics
in Medicine 1996:15:361-87
Title says it all. Delightfully clear paper.
*--------
REFs logistic, discriminant f.
=======================Clay Helberg, 28 Jan 1997==========ssc
From: Clay Helberg
Subject: Re: Alternatives to Logistic Regression
Message-ID: <32EE30A3.6F2A@spss.com>
Richard F Ulrich wrote:
> In theory, you may feel comfortable looking at TESTS on Logistic
> when your data don't satisfy you for Discriminant function. But it
> still seems to me that should be little difference in how well they
> work, in the ordinary case where there the fit is far less than
> perfect. I would be interested in hearing of simulation results, or
> examples and counter-examples.
Here are a few things I happened to have handy:
*Srinivasan & Kim (1987) compared several classification procedures in
the context of credit granting decisions. Their results based on
resampling from an actual dataset indicate that logistic regression
provides better classification than linear discriminant analysis, mainly
due to inequality of VCV matrices across groups.
*Wiginton (1980) Also reports that LR outperforms DA in terms of
classification accuracy in credit related problems, although neither
procedure performed particularly well with this data.
Unfortunately, I'm afraid I don't have the library resources at hand
that I did when I worked for U of Wisconsin, so I can't really look up
more examples. I have a few generic citations which probably contain
useful info, but I don't have copies of them, so I can't comment on them
directly. Perhaps someone who is familiar with the papers/books in
question will comment....
--Clay
Generic citations:
Altman, et al. (1981) Application of Classification Techniques in
Business, Banking and Finance. CT: JAI Press
Eisenbeis (1977) Problems in applying discriminant analysis in credit
scoring models. J of Banking and Finance, 2, 205-219
Eisenbeis & Avery (1972) Discriminant Analysis and Classification
Procedures. Lexington, MA: Lexington Books
Efron (1975) The efficiency of logistic regression compared to normal
discriminant analysis. JASA, 70, 113-121
Press & Wilson (1978) Choosing between logistic regression and
discriminant analysis. JASA, 73, 699-705
References:
Srinivasan & Kim (1987) Credit granting: a comparative analysis of
classification procedures. J of Finance, 42, 665-683
Wiginton (1980) A note on the comparison of logit and discriminant
models of consumer credit behavior. J. Finance and Quant. Analysis, 15,
757-768
--
*--------
Normality assumption
=======================Rich Ulrich, 28 Jan 1997==========ssc
From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: Alternatives to Logistic Regression
Message-ID: <5cl6e6$s4g@usenet.srv.cis.pitt.edu>
Helberg, Clay (chelberg@SPSS.COM) wrote:
: Diana Kornbrot wrote:
: >what do you want the alternative to do for you?
: >
: >note that logistic regression is not distribution free
: >it assumes a LOGISTIC DISTRIBUTION underlying the relation between
: >probability of the dpendent variable and parameters in the explanatory
: >variable(s)
: Well, yes, but this is a *much* less restrictive assumption than the
: distributional assumptions required for discriminant analysis, e.g.,
: which requires MV normality among the predictors and homogeneity of VCV
: matrices *as well as* a linear link function (or functions, for
: polychotomous DA).
-- Clay: The text you recommended (I think) by Tabachnick and Fidell
points out that, since Logistic regression constructs a linear equation
in the predictors, you are better off with multivariate normality, or
something close to it, even in the Logistic case.
I can understand how a single variable can be modeled usefully,
logistically, taking advantage of non-linearity to incorporate simple
skewness... for one variable. But you still have to make a linear
combination and create a score, and then (perhaps) draw a line to see
how well you have done.
In theory, you may feel comfortable looking at TESTS on Logistic
when your data don't satisfy you for Discriminant function. But it
still seems to me that should be little difference in how well they
work, in the ordinary case where there the fit is far less than
perfect. I would be interested in hearing of simulation results, or
examples and counter-examples.
*--------
Extreme splits in criterion.
=======================Rich Ulrich, 29 Jun 1997==========spss
Subject: Logistic Regression -Reply
Message-ID: <5rlkk2$3gq@usenet.srv.cis.pitt.edu>
Dale Glaser (dale.glaser@SHARP.COM) wrote:
: >>> Gordon Behie 07/23/97 03:02pm >>>
: When using logistic regression, is it problematic to have a dichotomous
: dependent variable with the following split: Category 1 = 10 (10%) and
: Category 2 =90 (90%). If not, why? Is there a point where the split must
: be of a certain distinction (i.e. a 75-25 split).<<<<<
: Gordon...........
: I am also currently encountering this problem with a binary outcome
: variable: Respiratory Distress Syndrome: 0-no (95%); 1-yes
: (5%).......even though markedly unequal sample sizes for the outcome
: variable is not as problematic for logistic regression as it would be for
: discriminant analyses Fisher and Belle (1993) Biostatistics: A
: methodology for the Health Sciences; John Wiley & Sons state that "if the
: event of interest is rare it may be difficult to generate enough information
: to make the prediction of its occurence reasonably high. Particularly in
: epidemiolgical screening procedures, if the prevalence (prior probability)
: of the disease is rare the predictive value of a positive test (posterior
: probability) may not be high" (p. 659).
- The quote given ABOVE from Fisher and Belle does not back up the
assertion about markedly unequal sample sizes. Maybe they do say
something like that somewhere, but I just browsed in their text, and
I am not impressed with their own insights. This was my first look
at Fisher & Belle - they give some nice references and mention several
facets; but ultimately I would rather read their references, than
accept an assertion from them about, say, unequal sizes. [I judge
this partly by the fact that I REALLY don't like what they say about
step-wise analyses in an earlier chapter.]
: One advantage of discriminant analyses (DA), in the case of continuous
: level predictors, is being able to set the prior probabilites, an option
: that is not provided for the SPSS version of logistic regression (equal
: probabilities is the default).....
- "prior probabilities" is, in particular, an advantage of "SPSS-DA"
over "SPSS-logistic" (and, I think, SAS and BMDP, as well... ) rather
than a necessary difference between DA and logistic - which is not
always kept clearly in mind, since the computer packages do tend to
get identified with the Procedures. But it *should* be kept in mind.
: ...of course, with DA more rigorous
: assumptions (i.e., multivariate normality, homogeneity of
: variance-covariance matrices) must be met........<>
- You have to write a formula, in either case, with scores
that distinguish one end from the other, so (in my opinion) the
assumptions are not MUCH more rigorous for one than for the other.
Here is something to think about: If the R-squared or degree of
prediction is not rather HIGH, then there will be very, very
little difference between the results of logistic and DA. But
if the prediction is nearly perfect, then the ML-logistic has the
hazard of hitting on a pathological solution - which is a risk you
do not take with Least Squares.
- That is the way I characterized the example that W. Sarle
provided in another Usenet group, a few months ago, and neither
he nor anyone else came back with any response. That is, the
number of 'correctly classified cases' happens to be a useful piece
of information which can be derived and described, for either DA
or logistic. Most of the time, it is only *indirectly* related to
the numeric criteria for *either* analysis. However, if there
is SOME formula from the variables which gives 100% correct, at some
split, then the Maximum Likelihood solution used in SPSS logistic
will insist on that formula. I noted for the example provided, that
any LEAST SQUARES solution to the logistic, which could be
constructed conventionally, did not match the Logistic equation.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html