- file stat .html ->
FAQ - Chap. 7, regression
********************* regression topics *******************
alpha inflation in stepwise selection
===========================Alan Miller, 14 Sep 1995================REFS
From: alan@dmsmelb.mel.dms.CSIRO.AU (Alan Miller)
Message-ID: <438g3v$69a@news.dmpe.CSIRO.AU>
>> At 8:55 AM 9/12/95, William Feuer - Ophthalmology wrote:
>> >Can someone give me a quick reference to problems with alpha
>> >inflation in stepwise selection models?
I assume that `alpha inflation' refers to the `nominal' significance level.
Some references on this are:
Draper, N.R., Guttman, I. & Kanemasu, H. (1971) `The distribution of certain
regression statistics', Biometrika, vol. 58, pp. 295-8.
Draper, N.R., Guttman, I. & Lapczak, L. (1979) `Actual rejection levels in a
certain stepwise test', Commun. in Statist., vol. A8, pp. 99-105.
Pope, P.T. & Webster, J.T. (1972) `The use of an F-statistic in stepwise
regression procedures', Technometrics, vol. 14, pp. 327-40.
I recall that in one of the Draper et al. papers it was shown that the true
significance level is usually in excess of 50% for a nominal level of 5%,
and that examples can be constructed in which the true level can be made
arbitrarily close to 100%.
I can dig out more references if anyone wants.
Alan Miller
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Propensity scoring (outcomes research).
=====================Frank Harrell, 06 Nov 1995========ssc
From: Frank Harrell
Subject: Reputation of Outcomes Research
Message-ID: <47la54$h1l@news.duke.edu>
Some recent posts have made some nice points of the advantages and
disadvantages
of Outcomes Research as it has been done in the past few years. I want to
echo one poster's opinion that Outcomes Researchers frequently take non-
scientific short cuts when the going gets rough. One classic example is
a paper which appeared in the hallowed New England Journal of Medicine in
which angioplasty (PTCA) was compared with coronary bypass surgery (CABG)
for mortality outcomes. The analysis did not include the single most
important variable used to recommended which revascularization procedure
to use - the number of diseased coronary arteries (which is a strong
prognostic factor)!
One technique that is gaining ground in adjustment for 'treatment by
indication' is the propensity score. Here one uses a huge number of
patient descriptors (which should be carefully chosen to include ones
such as the number of diseased coronaries) to predict treatment received
and then adjusts for the predicted probability (propensity score) as
a single covariable, without having problems of overfitting. Here are
some useful references.
@article{ang95two,
author = {Angrist, Joshua D. and Imbens, Guido W.},
journal = "Journal of the American Statistical Association",
pages = {431-442},
title = {Two--stage least squares estimation of average causal effects in
models with variable treatment intensity},
volume = {90},
year = {1995},
annote = {causal inference; propensity score; instrumental variables;
two-stage models}
}
@article{coo88asy,
author = {Cook, E. Francis and Goldman, Lee},
journal = {American Journal of Epidemiology},
pages = {626-639},
title = {Asymmetric stratification: {An} outline for an efficient method for
controlling confounding in cohort studies},
volume = {127},
year = {1988},
annote = {CART; recursive partitioning; propensity score; confounding;
non-randomized data}
}
@article{dra93eff,
author = {Drake, Christiana},
journal = "Biometrics",
pages = {1231-1236},
title = {Effects of misspecification of the propensity score on estimators
of treatment effect},
volume = {49},
year = {1993},
annote = {propensity score; confounding; observation study; bias}
}
@article{gas94how,
author = {Gastwirth, Joseph and Krieger, Abba and Rosenbaum, Paul},
journal = "American Statistician",
pages = {313-315},
title = {How a court accepted an impossible explanation},
volume = {48},
year = {1994},
annote = {propensity score; adjustment; unmeasured covariables; study
design}
}
@article{lav94cau,
author = {Lavori, Philip W. and Dawson, Ree and Mueller, Timothy B.},
journal = "Statistics in Medicine",
pages = {1089-1100},
title = {Causal estimation of time--varying treatment effects in
observational studies: {Application} to depressive disorder},
volume = {13},
year = {1994},
annote = {propensity score; time-dependent covariates; discrete survival
model; repeated logistic model}
}
@article{lav95mul,
author = {Lavori, Philip W. and Dawson, Ree and Shera, David},
journal = "Statistics in Medicine",
pages = {1913-1925},
title = {A multiple imputation strategy for clinical trials with truncation
of patient data},
volume = {14},
year = {1995},
annote = {multiple imputation; propensity score; informative censoring;
dropouts; completers analysis; last value analysis; longitudinal data; study
design}
}
@article{mar94con,
author = {Mark, D. B. and Nelson, C. L. and Califf, R. M. and Harrell, F. E.
and Lee, K. L. and Jones, R. H. and Fortin, D. F. and Stack, R. S. and Glower,
D. D. and Smith, L. R. and {DeLong}, E. R. and Smith, P. K. and Reves, J. G.
and Jollis, J. G. and Tcheng, J. E. and Muhlbaier, L. H. and Lowe, J. E. and
Phillips, H. R. and Pryor, D. B.},
journal = {Circulation},
pages = {2015-2025},
title = {The continuing evolution of therapy for coronary artery disease:
{Initial} results from the era of coronary angioplasty},
volume = {89},
year = {1994},
annote = {CABG; PTCA; propensity; observational study; Cox model
applications; adjusted survival curves}
}
@article{mcc94,
author = {{McClellan, Mark} and {McNeil}, Barbara J. and Newhouse, Joseph
P.},
journal = {Journal of the American Medical Association},
pages = {859-866},
title = {?},
volume = {272},
year = {1994},
annote = {structural equations; propensity score; confounding; observational
study}
}
@article{rob89con,
author = {Robins, James},
journal = "Statistics in Medicine",
pages = {679-701},
title = {The control of confounding by intermediate variables},
volume = {8},
year = {1989},
annote = {propensity score; dynamic propensity score; time-dependent
treatment; confounding}
}
@article{rob92esti,
author = {Robins, James M. and Mark, Steven D. and Newey, Whitney K.},
journal = "Biometrics",
pages = {479-495},
title = {Estimating exposure effects by modeling the expectation of exposure
conditional on confounders},
volume = {48},
year = {1992},
annote = {propensity score; confounding; causality; continuous exposure
}
@article{ros83cen,
author = {Rosenbaum, P. R. and Rubin, D.},
journal = "Biometrika",
pages = {41-55},
title = {The central role of the propensity score in observational studies
for causal effects},
volume = {70},
year = {1983},
annote = {propensity score}
}
@article{ros84,
author = "Rosenbaum, P. R. and Rubin, D. B.",
journal = "Journal of the American Statistical Association",
pages = "516-524",
title = "Reducing bias in observational studies using subclassification
on the propensity score",
volume = "79",
year = "1984"
}
@article{ros85con,
author = {Rosenbaum, Paul R. and Rubin, Donald B.},
journal = "American Statistician",
pages = {33-38},
title = {Constructing a control group using multivariate matched sampling
methods that incorporate the propensity score},
volume = {39},
year = {1985},
annote = {propensity score; matching; confounding; bias}
}
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Regression, non-nested models
=====================Raymond Liedka, 10 Feb 1996========ssc
Message-ID:
From: "Raymond V. Liedka"
Subject: Re: Question about a regression hypothesis
On Wed, 7 Feb 1996, Michael Cohen wrote:
> A colleague has asked me to pose this question to the list:
> Does anyone know the appropriate test for testing the following?
> We have two regression models:
>
> Y = b0 + b1 X + e
> Y = b0' + b1' X + b2 Z + e'
> ...
> The hypothesis of interest is H0: b1 = b1'
American Journal of Sociology, March 1995, Volume 100, no 5
"Statistical Methods for Comparing Regression Coefficients between
Models" by Clifford C. Clogg, Eva Petkova, and Adamantios Haritou,
pp. 1261-1293.
"The Impact of Random Predictors on Comparisons of Coefficients between
Models: Comment on Clogg, Petkova, and Haritou." by Paul D. Allison, pp.
1294-1304.
"Reply to Allison: More on Comparing Regression Coefficients." by
Clifford C. Clogg and Eva Petkova with the assistance of Tzuwei Cheng,
pp. 1305-1312.
================Bob Wheeler, 19 Apr 1996========ssc
Message-ID: <31778DCF.6C81@echip.com>
From: Bob Wheeler
Subject: Re: Cox test comparing models
> From: "Barry W. Brown"
> In article <4kkj3s$ci5@mark.ucdavis.edu>,
> Mitch Watnik wrote:
> >Steve Buyske (buyske@eden.rutgers.edu) wrote:
> >
> >: I recently came across a reference to Cox's test for comparing
> >: two non-nested models, but I can't find any reference to, or
> >: explanation of, the actual test. The context was in comparing
> >: two logistic models, one with explanatory variables A & B, the
> >: other with explanatory variables B & C. The dependent variable
> >: is the same in both cases.
> >
> >You might start with his paper in the Fourth Berkeley Symposium (1961).
> >There is also a 1962 follow-up paper (I don't remember off-hand where it
> >is located).
> >--Mitch
>
> Try section 9.22 page 327, "Choice between Models" in
> Cox, D.R. and Hinkley, D.V. "Theoretical Statistics" (1986) Chapman and
> Hall, NY. ISBN 0 412 16160 5.
In addition, try searching under "Separable hypotheses."
There is a considerable literature.
Bob Wheeler, ECHIP, Inc.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Time-series , regression. Problems.
Raffolovich.
=====================Larry Raffolovich, 07 Mar 1996========ssc
Message-ID: <960307.102159.EST.LR096@cnsibm.albany.edu>
From: "L. Raffalovich"
Subject: Re: Regression, multicolinearity and time series
On Tue, 5 Mar 1996 23:06:48 GMT
Barry DeCicco wrote:
>(I will avoid snappy comments about economists. I will avoid
>snappy comments about economists. I will avoid ....)
>
> There has been a lot of research done, on dealing with correlated
>randomness in linear models. For a good start, I'd recommend
>a book by Diggle, Liang and Zeger (apologies for the spelling),
>'Analysis of Longitudinal Data'. It is strongly geared to
>the analysis of data which consists of many relatively short
>time series, such as would be found in biostatistics.
>
> When dealing with two or more time series, spurious correlations
>feel right at home, and will dive into your regression. You have
>the problem that there are probably many drivers acting on both
>series, making them move together to some extent.
>
> I saw an artical in a mathematical economics journal (specific
>title forgotten), which proved that two non-stationary time
>series will tend to increasing correlation, as the number of observation
>points goes to infinity. There's the classic example of taking the
>non-inflation-adjusted prices of any two commodities, and showing
>that one predicts the other (in a linear model sense).
>
> Frankly, I haven't the faintest idea of how people deal with this,
>when looking at purely observational data, with long time series,
>such as occur in economics and other social data. I await
>someone to post methods or references here.
Some references on spurious regression with time series:
Nelson, Charles R. and Charles I. Plosser, "Trends and Random
Walks in Macroeconomic Time Series: Some Evidence and Implications".
Journal of Monetary Economics 10:139-62 (1982).
Phillips, Peter C.B., "Understanding Spurious Regression in
Econometrics". Journal of Econometrics 33:311-40 (1986).
Stock, James H. and Mark W. Watson, "Variable Trends in Economic
Time Series". Journal of Economic Perspectives 2:147-74 (1988).
I provide a less technical discussion of spurious regression
in "Detrending Time Series", Sociological Methods and Research
22:492-519 (1994). Also, recent econometrics texts
(e.g. the third edition of Gujarati's Basic Econometrics (1995))
have detailed discussions of non-stationary time series.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Why is this R-squared negative?
(R^2 in regression through origin).
===================Rich Ulrich, 13 Apr 1995=======ssm
Subject: Re: R2
< D o m i n i k, heeb@urz.unibas.ch >
heeb@ubaclu.unibas.ch wrote:
: R2 in regression thru origin
: ----------------------------
: See e.g. Zar "Biostatistical Analysis"
: Y = a * X + error
: a = (Sum of XiYi) / (Sum of Xi ** 2)
: total SS = (Sum of Yi) ** 2, with total DF = n
You can use this formula if you want, but,
I think that one should ALWAYS note at this point that a choice
has been made to abandon the usual definition, where the total SS
is mean-corrected. So, R2 computed this way cannot be compared to
R2 from regressions that include the origin, and which use that
smaller Total SS.
This has been discussed here before, and I, for one, was persuaded
that I should use the above formula only on rare occasions (mainly,
where there was absolutely no doubt about the zero intercept).
The residual after regression through zero can be larger than the
mean-adjusted Total SS, in which case the Regression effect SS, and
R2 - computed by subtraction - comes out negative. That is not a
great kind of result to have, but I prefer explaining that occasional
outcome (negative R2, showing a totally inadequate regression model)
over the ever-present need of explaining the two definitions of R2.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
... estimate a break-point?
======================Bill Simpson, 3 Oct 1994==========ssc
Message-ID: <01HHU5UCC90I8XCDGP@UWPG02.UWINNIPEG.CA>
From: BILL SIMPSON
Subject: piecewise linear fit
When you want to estimate the breakpoint, you have a nonlinear regression
problem.
The model is
yhat = b0 + b1*x + b2*(x-b3)*(x>b3)
Use a nonlinear regression routine. Use one that doesn't use derivatives
e.g. simplex method. Eyeball your data to get good starting values.
Systat nonlin routine can handle this, contrary to one poster's comment.
In fact, their manual used to have this as an example! (Maybe it still does)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html