<- file stat .html -> FAQ - Chap. 10, other analyses. MISC. ******************* CHAID, CART, SEM, Neural nets**********
  • CHAID, CART
  • ===========================Rob Hughes, 09 Jun 1996======ssc From: rhughesmd@aol.com (RHughesMD) Subject: Re: CART Message-ID: <4pev5i$qqi@newsbf02.news.aol.com> KnowledgeSeeker from Angoss does CART & CHAID, SPlus has CART, SPSS sells CHAID, check out http://info.gte.com/%7Ekdd/ for a list of other classification packages. =============================Tony Dusoir, 10 Jun 196=======ssc Message-ID: <009A3A33.8F96C87C.3@ujvax.ulst.ac.uk> From: AE.Dusoir@ULST.AC.UK SC (Statistical Calculator) does CART. It allows you to use just about any definable function to do the splitting (you can use exact tests, for example), and includes options for multiplicity- correction. It produces postscript tree-diagrams, among other things. Details of SC are on the web site below, or e-mail me. * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • What is CHAID?
  • ======================Paul Thompson, 3 Oct 1995=========sse Message-ID: <199510031555.LAA04187@owl.INS.CWRU.Edu> From: pat@po.CWRU.Edu (Paul A. Thompson) Subject: Re: CHAID Reply to message from bgunter@pluto.njcc.com of Tue, 03 Oct > >maugx@csv.warwick.ac.uk (Gavin A Jordan) wrote: > >> Does anyone have any information about CHAID. It may be some kind of > >Just wanted to note for public consumption that CHAID (the last 3 >letters stand for "automatic interaction detector", I believe; first >two?) is SPSS's implementation and evolution of something known as I have a serious problem with this. In fact, it makes a serious confusion between an ALGORITHM and an IMPLEMENTATION. SPSS has an IMPLEMENTATION of the CHAID (CHi-squared Automatic Interaction Detection) ALGORITHM. The ALGORITHM is what is called CHAID. Now, in this case, SPSS has used the name of the ALGORITHM for their IMPLEMENTATION. I dislike that, but I can't do anything about it. The name CHAID and the associated ALGORITHM is in the public domain, because it was published before their use of it. Having worked with CHAID software myself, and done some consulting on it, I believe that I am in an informed position regarding it. Essentially, what goes on in CHAID is that you 1) have a dependent classification (i.e., voting for Clinton), and 2) a bunch of other variables (i.e., race, sex, SES, etc.) . 3) the algorithm finds splits in the population which are as different as possible. It usually works stepwise, and due to the problem of multiplicity, goes forward. You find the most diverse split, and then work each of these splits to find more diverse splits. You stop when splits no longer are signficant; i.e., that group is homogeneous with respect to variables not yet used. CHAID is a stepwise method. I would be real cautious about using it when your population is small; i.e., less than 10,000. When using it, I would do a 2/3 - 1/3 split, and replicate the exploration with the confirmatory group. >software and also in the S-Plus statistical software package. As with >all such approaches, there are situations where it is quite useful and >others where it is not. Some background knowledge is therefore >required for effective application. > I agree with this. -- Paul Thompson, Ph.D. | Department of Psychiatry | Case Western Reserve Univ| Cleveland, OH 44106 | * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • SEM features of stat packs?
  • ====================James H. Steiger, 09 Jun 1996========ssc From: steiger@unixg.ubc.ca (James H. Steiger) Subject: Re: Statistica any good? Message-ID: <31bb2490.12604793@news.ucs.ubc.ca> Diana Kornbrot <D.E.Kornbrot@HERTS.AC.UK> wrote: >i also find statistica's advertising offensive. > >however i might change (to support their software engineers rather than >their marketing folk) if i knew of any particular analysis that can be >done in statistica that CANNOT be done in SPSS or SAS or S-PLUS >specific egs are: >1. ANCOVA that gives separate slopes (and intercepts) for eahc > level of categorical variables (not obvious in SPSS) >2. confirmatory factor analysis >3. robust inferential statistics to go with robust descriptives that > are in most packages >4. easily available power calculations >5. easily speicifiable multi-level models >initially my dept had statistica becaues it calculates Cronbach's alpha >which is very useful for psychologists devising psychometric tests. >however SPSS now has this. > I'm moderately knowledgeable in the field of Structural Equation Modeling and factor analysis, Diana, so, to clarify to the best of my knowledge... a. Base SAS cannot do confirmatory factor analysis. However, the Advanced Stats part of SAS has CALIS, a module that can do confirmatory factor analysis. You have to pay an additional fee to get it, and it only works on a single sample. CALIS was written by Wolfgang Hartmann. b. SPSS base and advanced offers no confirmatory factor analysis. SPSS recently announced it will sell AMOS (price not yet established) as an add-on. AMOS was written by James Arbuckle of Temple University. It features a very nice, usier friendly, graphical interface for doing SEM and confirrmatory factor analysis. SPSS was offering LISREL as an add-on at very substantial cost, but is discontinuing it. I am assuming it will be priced as an add-on. c. Systat 6 for DOS offers RAMONA, which can do confirmatory factor analysis and SEM on single samples. RAMONA is part of the base Systat package, and does not cost you anything extra.. RAMONA offers a user-friendly interface, and was written by Michael Browne, a widely known expert in the field of structural modeling, and Gerhard Mels, a student. d. Statistica for Windows includes SEPATH, my module, which does confirmatory factor analysis and structural equation modeling on one or more groups (up to a maximum of 10 independent groups.). SEPATH is part of the base Statistica package, and does not cost anything extra. It features an extremely fast user interface, and was reviewed in a recent issue of Structural Equation Modeling, a journal published by Erlbaum. So, in the realm of confirmatory factor analysis and structural equation modeling, Statistica offers the most standard capability, Systat offers excellent capability for single samples, SAS offers excellent capability at an additional price. SPSS will offer an option sometime soon, at an additional price. Here is the summary table, which seems to indicate that Statistica has an advantage in this domain of confirmatory factor analysis and structural equation modeling. Summary - SEM and Confirmatory Factor Analysis -------------------------------------------------------------------- SAS SPSS Systat Statistica Is Module Available? X a X X Free w. Base Package? X X Multiple Sample Capability? X X a. Available later this year X - available now -------------------------------------------------------------------- I apologize in advance if any aspect of this is not correct, but I believe the information is correct and up-to-date. Nonetheless, I am donning my flame-retardand clothing. * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Neural nets - SPSS claims? SAS varieties? weight decay? Sarle.
  • =====================Warren Sarle, 07 Oct 1995========ssc From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Re: shareware packages + neural nets Message-ID: <DG3EG6.GJH@unx.sas.com> In article <453r80$7gp@netsrv2.spss.com>, nichols@spss.com (David Nichols) writes: |> SPSS has just begun distributing a product |> called Neural Connection that fits several kinds of neural net models |> (radial basis functions, Kohonen networks and multilayer perceptrons). The unsolicited brochure that SPSS sent me has some interesting claims in it. The brochure says that "Neural nets can model data faster and more closely than traditional statistics. ... neural networks do not assume linearity or homogeneity of variance." The "highlights" include the claim, "NO ASSUMPTIONS REQUIRED. Neural nets do not require assumptions about the form of the data to analyze it." It is true that a variety of neural nets can fit flexible nonlinear functions, much like splines or kernel regression, and thus do not "assume linearity." But I am very curious to know exactly what kinds of neural nets are free of assumptions about homogeneity of variance, independence or errors, and error distributions. Neural networks such as MLPs and RBF nets are usually trained by maximum likelihood assuming a normal or Bernoulli distribution (although the neural net literature does not use that terminology). What marvelous new estimation methods does the SPSS product use that evade the usual distributional assumptions? -- ================Warren Sarle, 10 Nov 1995========ssm From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Re: neural network and cross-sectional data Message-ID: <DHu7xK.JKo@unx.sas.com> In article <47q0nc$s93@clipperton.bondy.orstom.fr>, traissac@orstom.fr (Traissac Pierre) writes: |> In article bcr@hydra.unm.edu, bohara@unm.edu (Alok Bohara ECONOMICS) writes: |> > |> > What would be a good software to do a neural network modeling? How |> >much does it cost? (I am thinking in terms of environmental economics, |> >or finance applications etc.). |> |> SAS Institute has developped a neural network module that has been |> successfully tested in the fields of application you suggest. |> From what I remember it is not cheap. Our point-and-click neural net application is indeed "not cheap". But the macros that are used behind that front-end are free. If you have SAS/OR (6.08 or later) and don't mind typing percent signs (our macro language uses lots of percent signs), then you can try the macros to see if you can muddle through without the GUI. The macros are available by anonymous ftp from ftp.sas.com (Internet gateway IP 192.35.83.8) in the directory /pub/sugi19/neural. Read the README file. There will be a new version of the TNN macro by the end of the year that will include RBF nets and other neat stuff. Alok Bohara also asks: |> Could anyone give me a nice reference that shows a nonlinear (for |> example) model estimated using the neural network method that shows some |> interpretation of the estimated parameters? Some applications I have |> seen in finance simply shows the predictive power of the neural network |> over other methods such the ARIMA etc... The parameter estimates are rarely interpretable. In most cases, neural nets are so ill-conditioned that you can't even get accurate parameter estimates unless you have a huge sample size (the predictions may be quite accurate even if the parameter estimates are not). Even then, it is common for the parameters to be underidentified. For more information, check out the comp.ai.neural-nets newsgroup and get the FAQ from: http://wwwipd.ira.uka.de/~prechelt/FAQ/neural-net-faq.html * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Neural-net, weight decay. (refs). Sarle.
  • =====================Warren Sarle, 27 Mar 1996========ssm, can-n Newsgroups: comp.ai.neural-nets,sci.stat.math Subject: Re: weight decay Message-ID: <DowzJC.5xB@unx.sas.com> In article <4j5i74$sta@dfw-ixnews4.ix.netcom.com>, jdadson@ix.netcom.com(Jive Dadson ) writes: |> I've added weight decay to my NN software. Now I need to learn how |> to use it. :-) I'm not even sure I've done it right. Should the output |> layer have the same weight penalties as hidden layers? Does the type |> of layer matter? For example, should a softmax output layer have |> a different weight penalty than hidden tanh layers in front of it? I'm working on weight decay for the comp.ai.neural-nets FAQ. Here's what I've got so far. Weight decay adds a penalty term to the error function. The usual penalty is the sum of squared weights times a decay constant. In a linear model, this form of weight decay is equivalent to ridge regression. See "What is jitter?" for more explanation of ridge regression. The penalty term causes the weights to converge to smaller absolute values than they otherwise would. Large weights can hurt generalization in two different ways. Excessively large weights leading to hidden units can cause the output function to be too rough, possibly with near discontinuities. Excessively large weights leading to output units can cause wild outputs far beyond the range of the data if the output activation function is not bounded to the same range as the data. Other penalty terms besides the sum of squared weights are sometimes used. Weight elimination (Weigend, Rumelhart, and Huberman 1991) uses: (w_i)^2 sum ------------- (w_i)^2 + c^2 where w_i is the ith weight and c is a user-specified constant. Whereas decay using the sum of squared weights tends to shrink the large coefficients more than the small ones, weight elimination tends to shrink the small coefficients more, and is therefore more useful for suggesting subset models (pruning). The generalization ability of the network can depend crucially on the decay constant, especially with small training sets. One approach to choosing the decay constant is to train several networks with different amounts of decay and estimate the generalization error for each; then choose the decay constant that minimizes the estimated generalization error. Weigend, Rumelhart, and Huberman (1991) iteratively update the decay constant during training. There are other important considerations for getting good results from weight decay. You must either standardize the inputs and targets, or adjust the penalty term for the standard deviations of all the inputs and targets. It is usually a good idea to omit the biases from the penalty term. A fundamental problem with weight decay is that different types of weights in the network will usually require different decay constants for good generalization. At the very least, you need three different decay constants for input-to-hidden, hidden-to-hidden, and hidden-to-output weights. Adjusting all these decay constants to produce the best estimated generalization error often requires vast amounts of computation. Fortunately, there is a superior alternative to weight decay: hierarchical Bayesian estimation. Bayesian estimation makes it possible to estimate efficiently numerous decay constants. See "What is Bayesian estimation?" [Unfortunately, I haven't written the answer to that question yet]. References: Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press. Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press. Weigend, A. S., Rumelhart, D. E., & Huberman, B. A. (1991). Generalization by weight-elimination with application to forecasting. In: R. P. Lippmann, J. Moody, & D. S. Touretzky (eds.), Advances in Neural Information Processing Systems 3, San Mateo, CA: Morgan Kaufmann. |> The books I have don't have sufficient explanations of the theoretical |> motivation or the practical application. Can anyone recommend one? |> |> I haven't really come to grips with error criteria in general. I want |> to get thoroughly familiar with the Bayesian significance of the |> error penalty as it applies both to weights and to the difference |> between estimated values and the corresponding training values. Bishop (1995) and Ripley (1996) are, of course, excellent sources on weight decay and Bayesian issues. The best textbook I've seen on Bayesian inference is Gelman, Carlin, Stern, and Rubin (1995). O'Hagan (1985) is an excellent explanation of some of the odd things that can happen with MAP estimation. MacKay and Neal have done the most work on Bayesian methods for neural nets. I have had trouble getting some of MacKay's methods to work; my own efforts are described too briefly (there was a 10 page limit!) in Sarle (1995). Bernardo, J.M., DeGroot, M.H., Lindley, D.V. and Smith, A.F.M., eds., (1985), Bayesian Statistics 2, Amsterdam: Elsevier Science Publishers B.V. (North-Holland). Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (1995), Bayesian Data Analysis, London: Chapman & Hall, ISBN 0-412-03991-5. MacKay, D.J.C. (1992), "A practical Bayesian framework for backpropagation networks," Neural Computation, 4, 448-472. MacKay, D.J.C. (199?), "Probable networks and plausible predictions--a review of practical Bayesian methods for supervised neural networks," ftp://mraos.ra.phy.cam.ac.uk/pub/mackay/network.ps.Z. Neal, R.M. (1995), Bayesian Learning for Neural Networks, Ph.D. thesis, University of Toronto, ftp://ftp.cs.toronto.edu/pub/radford/thesis.ps.Z. O'Hagan, A. (1985), "Shoulders in hierarchical models," in Bernardo et al. (1985), 697-710. Sarle, W.S. (1995), "Stopped Training and Other Remedies for Overfitting," to appear in Proceedings of the 27th Symposium on the Interface, ftp://ftp.sas.com/pub/neural/inter95.ps.Z (this is a very large compressed postscript file, 747K, 10 pages) ====================Warren Sarle, 16 Feb 1996===============ssc Message-ID: <DMuDoH.FBv@unx.sas.com> From: Warren Sarle <saswss@UNX.SAS.COM> Subject: Re: stepwise In article <4fq9k4$dfn@news.duke.edu>, Frank Harrell <feh@DUKE.EDU> writes: |> I don't know when stepwise methods would be appropriate (unless you |> use Tibshirani's new "lasso" method in a recent JRSS article to |> shrink the regression coefficients). The lasso is somewhat similar to certain weight decay methods in the neural network (NN) literature. "Weight decay" is the NN term for regularization methods such as ridge regression. Ridge regression can be done by minimizing an objective function equal to the error sum of squares plus a penalty term given by the sum of the squared regression coefficients, sum(b_i)^2, times the ridge value. Instead of sum(b_i)^2, the NN folks sometimes use: (b_i)^2 sum ------------- (b_i)^2 + c^2 which is also called "weight elimination". c is a user-specified constant. Whereas ridge regression tends to shrink the large coefficients more than the small ones, weight elimination tends to shrink the small coefficients more, and is therefore more useful for suggesting subset models. How well this really works, no one seems to know. * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html