<- file 96stepw.html -> more words on Stepwise (1996). ******************* in addition to 3 originals ********** Stepwise? (numerous comments)
  • =========================Rich Ulrich, 15 May 1995============(spss) Not long ago, I wrote, on the subject of Stepwise Regression - : The manual makes reference to a lot of : options, and yes, the good advice is, `DON'T USE THEM!' I don't know if : the suggestion about a hazard warning was meant to be tongue-in-cheek, : but it seems to me to be about the proper level of discouragement that : is DESERVING for stepwise techniques. Seriously, folks. Since I don't think the manuals are going to take this step that a few people would be offended by, I would like to offer suggestions that may be more constructive: 1) Let the manuals give an emphasis to showing how to ENTER sets of variables as blocks, obtaining a test statistic on the variance that is added. One version of this is simply keeping track of a categorical variable that is dummy-coded - I think that BMDP does that. More simply, the manuals could promote the ENTER strategy as one that is standard for obtaining certain ANOVA tests. Also, 2) Show explicitly how to look at the `partial contribution' of several variables WITHOUT entering them into the equation which is accounting for design factors, etc. <Right now, I think you have to specify a Stepping choice, though you can prevent it from Entering any by setting the Entry criterion to be extreme.> =======================John P. Ball, 23 Apr 1996=============ssc,sse From: jpb@szooek.slu.se Subject: Re: Subject/variable ratio for regression? Message-ID: <4lihb4$1d1@populus.slu.se> >I tried to do some regression analysis of the data we collected a while >ago. The experiment includes 2 sections: 1. Physical measurements: >total 20 measure ments; 2. Dynamic data collection looking at the effect >of walking speed on foot plantar pressure distribution. In section 2, we >collected 3 trials under each walking speed. The total subject number >was 20. My question is how many variables can be included in the >step-wise regression analysis? Also, if there were missing trials, say 5 >subjects did not have certain measurement, how many variables can be >included in the step-wise regression? >I am not a regular reader of this group, please reply to my email >address. I will compile the replies if others are interested. I'll send you an email copy, but I'd humbly suggest that perhaps you _should_ consider reading the group regularly... you would learn some things. I still do. Regarding the number of variables that you can include in stepwise regression in your case: ZERO (well, OK one -- but then it is not then stepwise regression but simple correlation). Sorry to be the bearer of bad news. I'm afraid that to have any real hope of a "reliable" answer you need to go get a LOT more subjects if you really want to go on a fishing expedition and examine anything like 20 variables. I recently gave a seminar to graduate students on pretty much exactly your question, but I won't bother with all the multitude of references here. In the interests of brevity, I refer you to Tabachnick and Fidell, 1983, Using Multivariate Statistics. Harper and Rowe, New York page 91-92, where it is put forth quite clearly: "Ideally one would have 20 times more cases than variables. If stepwise regression is to be used, a procedure that is notorious for capitalizing on chance, a case-to-variable ratio of 40 to 1 would be appropriate. " Years ago, (as a grad student) I was assigned a similar problem: assess the needed case-to-variable ratio for "reliable" stepwise multiple regression. My computer similations indicated about 50 was the minimum for the best algorithm, and I was suitably impressed at how bad all stepwise algorithms were (I even used all-possible-subsets). Even with 80 to 1 ratios, the results were not particularly heartening. Today, I occassionally use multiple regresssion, but all the variables _stay_ in the model (NO stepwise ever, of any algorithm). Obviously, the degrees of freedom then directly limit the number of variables that you can include in the general linear model/regression model. Try too many and your SS will be undefined (or zero, depending on you stat program). Hope that this is useful information (even if it is NOT what you wanted to hear). Sorry for the typos -- I'm rushing today... ========================Paul Velleman, 8 April 1996===========ssc From: pfv2@cornell.edu (Paul Velleman) Subject: Re: Variable Reduction In article <4ka18h$40h@news1.h1.usa.pipeline.com>, jzhong@usa.pipeline.com wrote: > I am building a predictive model. The dependent variable is binary. I have > about > 400 block group level census variables as predictor variables. I plan to > perform > principal components analysis on the predictor variables and then apply > logistic regression on the selected a few principal components. > > Among the 400 predictor variables, many of them are not significantly > related to > the dependent variable. So I would like eliminate these insignificant > variables > before I do principal components analysis. What would be a good way to do > this? If you know that these variables are not significantly related to the dependent variable,and if your goal is prediction, what is wrong with simply deciding to omit the variables? You *are* allowed to apply thought to statistical analyses; it needn't be all automated computation. ===================Jesse A. Canchola, 15 Apr 1996=============ssc From: adminjc@psg.ucsf.edu (Jesse A. Canchola) Subject: Re: Variable Reduction Message-ID: <4ku0p6$10eo@itssrv1.ucsf.edu> On the subject of using stepwise regression for selecting variables for you, check out Flack's and Chang's article, "Frequency of Selecting Noise Variables in Subset Regression Analysis: A Simulations Study" found in the _The American Statistician_, February 1987, Volume 41, No. 1. It pretty much substantiates Rich Ulrich's response. They mention how you should not use the results of such a stepwise regression as the basis of any conclusions with respect to the subject matter at hand. However, if the results of the subset selection are "confirmed" and/or "validated" by other data sets, you will probably be ok. They go on to say that, "Such confirmation and validation are especially important when the number of candidate variables is large and a priori knowledge about their relationships to the response variables are not clear." You might also want to look at David Freedman's "A Note on Screening Regression Equations", _The American Statistician_, May 1983, Vol 37, No. 2. Good luck! ==========================William Ware, 30 Apr 1996===========sse Message-ID: <Pine.PCW.3.91.960430112212.8055A-100000@devil.soe.unc.edu> From: "William B. Ware" <wbware@email.unc.edu> << wbware@unc.edu >> Subject: Re: When should Stepwise reg be used? On Mon, 29 Apr 1996, IRA H BERNSTEIN wrote: <snip> > I think that there are two distinct questions here: (a) _when_ is > stepwise selection appropriate and (b) _why_ is it so popular. I agree with most of what Professor Bernstein wrote in his original message. However, I do think that "stepwise" regression (in which the computer algorithm selects the variables) does have a place in our statistics tool kits. However, that place is extremely limited! When we are willing to throw all rights to interpretation to the winds and when our primary goal is to develop an efficient predictive mechanism using a small set of variables selected from some larger set, then stepwise techniques are OK. The principal place in which this limited application might be appropriate is in personnel selection (e.g., college admissions). =====================Ira H Bernstein, 30 Apr 1996============sse Message-ID: <MAILQUEUE-101.960430164403.320@albert.uta.edu> From: "IRA H BERNSTEIN" <BERNSTEI@albert.uta.edu> Subject: Re: When should Stepwise reg be used? "William B. Ware" <wbware@email.unc.edu> noted, in response to my original (largely negative) posting about stepwise regression: > I agree with most of what Professor Bernstein wrote in his original > message. However, I do think that "stepwise" regression (in which the > computer algorithm selects the variables) does have a place in our > statistics tool kits. However, that place is extremely limited! > > When we are willing to throw all rights to interpretation to the winds > and when our primary goal is to develop an efficient predictive > mechanism using a small set of variables selected from some larger set, > then stepwise techniques are OK. The principal place in which this > limited application might be appropriate is in personnel selection > (e.g., college admissions). Note that I had said the following in my original posting: >I would probably only argue slightly with "never" as an answer to the >use of stepwise selection since I don't know what knowledge we would >lose if all papers using stepwise regression were to vanish from >journals at the same time programs providing their use were to become >terminally virus-laden. However, I have been in situations that >looked like "I have good reason to look at variables A, B, and C.; >then look at D, and E, but I have no basis to favor F over G or v.ice >versa past that point." Older versions of SPSS (I haven't used. newer >versions since switching to SAS a decade ago) allowed this mixture, >and I would personally not object to it as long as the strategy were >defined in advance and made clear to readers. I therefore don't think that Prof. Ware and I are in any disagreement as I believe we are both saying "not often, but sometimes". Ira H. Bernstein Professor of Psychology UT-Arlington P. O. Box 19528 Arlington, TX 76019-0528 (817) 272-3183 ==========================Jerry Dallal, 30 Apr 1996===========sse From: jerry@mint.hnrc.tufts.edu (Jerry Dallal) Message-ID: <1996Apr30.103720@mint.hnrc.tufts.edu> In article <960429.144610.CDT.B118MEE@UTARLVM1>, Mark Eakin <B118MEE@UTARLVM1.UTA.EDU> writes: > Why is stepwise so popular? Because it gives the appearance of objectivity. (Please do not interpret this comment as a statement for or against the use of the technique.) ==========================Kent Campbell, 30 Apr 1996=========sse From: campbell@acs.ryerson.ca (Kent Campbell) Subject: Re: When should Stepwise reg be used? Message-ID: <4m5lh0$i7g@ns2.ryerson.ca> Hi - try generating some random data sets and then analyzing them with stepwise regression. It is quite likely that you will discover all sorts of "significant" relationships. I have done this in a controlled manner and found that the type 1 error (using the default settings in spss) is much higher than 5%. So one reason why stepwise is so popular is that it produces statistically significant results when fed garbage. Best wishes, Kent. ============================Carl Huberty, 13 Feb 1996=========ssc Message-ID: <960213.083427.EST.CHUBERTY@UGA.CC.UGA.EDU> From: carl huberty <CHUBERTY@UGA.CC.UGA.EDU> Subject: Re: When are stepwise and backward regression methods appropriate? About the only time stepwise methods are remotely appropriate is when you have a large number of variables and you want to do some "pre screening" of the variable set -- and you would need a "large" N/p ratio to do such an analyis then. There are MUCH better ways to assess variable ordering and to determine good variable subsets. DOWN WITH STEPWISE!! ============================Rich Ulrich, 19 Feb 1996==========ssc From: wpilib+@pitt.edu (Richard F Ulrich) Subject: Re: When are stepwise and backward regression methods appropriate? Message-ID: <4ga6c7$2de@usenet.srv.cis.pitt.edu> David H. Uthe (uthed@clark.net) wrote: < stuff deleted... > : I've found stepwise methods useful when I have lots of variables, some of : which I'm not really sure contribute THAT much to the solution. Some of : these regressors may be deteriorating the solution by attracting bogus : coefficients. The stepwise methods eliminate weak regressors to the : benefit of stronger ones, thereby stregthening the solution. I like the first part of this statement, which echoes the doubts of several people. "...deteriorating the solution by attracting bogus coefficients" is perfectly apt, since variables that are only *randomly* correlated will survive the step-wise paring, which has the clear function of eliminating redundancy. Useful variables in my area typically do have redundancy, so eliminating them provides an ordinary preference for BOGUS ones. But "eliminate weak regressors to the benefit of stronger ones" is hopeful thinking, if it is not *wishful* thinking. - That would justify step-wise, sure; but, I think, you should only do the stepwise to get a valid statement (of any kind - for testing, or for concise description of a sample) when you can tell the difference between `attracting bogus coefficients' and benefitting the strong ones. You are safe, if the set of possible predictors does not include any predictors that might happen to be bogus.
  • * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html