<- file stat 97stepw.html -> more on Stepwise (1997) There are several files in this FAQ with more general comments on stepwise selection of variables for regression, discriminant function, etc. In this file : (summing up); all possible subsets; R^2 > .99.
  • (summing up)
  • =======================Donald F Burrill, 23 Mar 1997==========sse Message-ID: <Pine.BSF.3.91.970323011533.2395B-100000@user.xtdl.com> From: "Donald F. Burrill" <dburrill@user.xtdl.com> Subject: Re: Bad statistical practice Was: Re: Meta-Analysis with heterogeneous On Sat, 22 Mar 1997, Herman Rubin wrote, inter alia: > As for stepwise algorithms, they are never theoretically sound, > except in situations where they are unnecessary. Lovely! Thank you, Herman, that was _very_ nicely put! *--------
  • all-possible subsets
  • =======================Rich Ulrich, 28 Jan 1997==========ssc Subject: Re: All possible subset Message-ID: <5clo8l$242@usenet.srv.cis.pitt.edu> John P. Ball (john.ball@REMOVE.THIS.szooek.slu.se) wrote: << deleted, some of the stuff ... concerning stepwise solutions >> : .... In ecology, I see that interpretations of causality are : routinely made, hardly anyone performs cross-validation, few use : sophisticated methods like Aikaike's Information Criterion or Mallows : Cp plots to determine how many variables to include in the final : regression model, and nobody seems to realize the optimistic P-levels : that emerge from any stepwise procedure, etc. Still, if one has : insufficient replication to LEAVE all of your potential independent : variables in a (for example) GLM, then stepwise regression may be the : most appropriate choice (it is more data-exploration than hypothesis : testing though!). Of all the stepwise algorithms, at least : all-possible subsets ensures (computationally) that you get the best : regression model (which forward-stepping, backward stepping, stepwise : algorithms, and others cannot guarantee). So, if you HAVE to use : stepwise regression for some other reason, at least : all-possible-subsets ensures that you DO get the best one. -- yes, as they say, "You might as well be hung for a sheep, as a goat." One vital qualifier before doing stepwise, etc. -- this is close to being an absolute, unless you are absolutely EXPLORATORY -- is the condition that you KNOW that all the variables are potentially useful; and you just want a shorter, more elegant or cheaper equation - Then, "all-possible regressions" is what does it. (Well, you can accomplish a bit by cross-validation, but, as John says, who bothers?) This is FAR DIFFERENT from the condition that: you scanned 200 tests of various stuff, and took the 10 or 15 "significant" variables to put into a stat package. Doing the latter is called: "Using 200 variables to overcapitalize on chance." In fact, if your "real" predictors tend to be inter-correlated, and not much stronger in prediction than the "random" predictors, it is easy to see how the "random" predictors will be selected *more often* than the "real" ones, being preferred since they are uncorrelated. Then "stepwise" will give you a *worse* equation that you would have gotten by picking a which few of your variables to use, by chance; and "all-possible" will be among the worst-possible solutions. : ... I stand : by, awaiting correction by all the REAL statisticians reading this : list! :-] -- well, you did seem a little enthusiastic. I thought we had previously stomped out all pro-stepwise spirit among our readers. I will send you some other advice, from previous posts. *--------
  • R^2 > .99 (stepwise?)
  • =======================Rich Ulrich, 27 Jan 1997==========spss Subject: Re: HELP - Q : compute Message-ID: <5cin0d$gn7@usenet.srv.cis.pitt.edu> Here is one of the few applications where STEPWISE REGRESSION might actually be of good service. (...as compared to the ordinary world of research, where Stepwise is just a naive error.) If your R-squared is approaching 1.0, then you don't worry too much about having "irrelevent" predictors. Your professor could try to predict, or fit, the NEW_VAR by using stepwise regression (or, best subset) using variables that are likely candidates to have been included. When you hit 100% perfect prediction, then you have the original variables, or something that gives exactly the same scores as they did. =========================in response to ... Ling Ting (ting@COMP.UARK.EDU) wrote: << start, deleted >> : A professor use "COMPUTE" comment of SPSS for Windows (6.1.2) to calculate : a new variable say new_var. What she did is : click on : Transform -> Compute -> ... etc to get the new variable : in term of syntax would be : COMPUTE new_var=MEAN(var1, var2, var3, var4) : The problem is she doesn't have any information about which variables : were used to compute the new_var. Now, she need to know what these 4 : variables are. Is there any way can be used to find out what are they? << rest, deleted >> * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html