<- file stat 93mixtur.html -> Mixture of gaussians
  • Mixture of gaussians, REFs
  • =======================Steve Cassidy, 28 Jun 1993==========sms Subject: fitting a mixture of gaussians -- summary Message-ID: <STEVE.93Jun28131710@srsuna.shlrc.mq.edu.au> As requested, here's a summary of the responses I got to my query about fitting a gaussian mixture model to speech data. Radford Neal (radford@cs.toronto.edu) said: My view is that in most cases the question [of estimating the number of components in a mixture] is not really sensible, because in most cases the number of components in the mixture is known a priori to be infinite (or at least very large). To take a random example, say you are modelling the distribution of weights of pollen grains collected. It's maybe reasonable to suppose that each species of plant gives rise to a Gaussian distribution for weights, so that you expect to see a mixture of Gaussians, with a very large number of components (though it might be that a few species account for most of the pollen). I develop a Gibbs sampling approach to Bayesian inference for mixtures with an infinite number of components in the following paper: Neal, R. M. (1992) ``Bayesian mixture modeling'', in C. R. Smith, G. J. Erickson, and P. O. Neudorfer (editors) Maximum Entropy and Bayesian Methods: Proceedings of the 11th International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, Seattle, 1991, Dordrecht: Kluwer Academic Publishers. A slightly longer version of this paper is available as a TR, obtainable in PostScript form by anonymous ftp from ftp.cs.toronto.edu, directory pub/radford, file bmm.ps. These papers treat only the case where the variables are discrete, but the technique can be generalized. Radford Neal As I said, I found this paper (the tech report) to be very useful and mostly understandable given my (lack of) expertise in maths. I'm still working through the algorithm trying to understand how to apply it to my problem. Charles Johnson (charles@prd.co.uk) wrote: This is also a question of interest to me since it bears directly on the efficacy of meta-analysis. The one attempt at a practical solution I know of is Hoben Thomas's book ``Distributions of correlation coefficients'' (Springer-Verlag, New York, 1989) where he explores a number of correlational data sets assuming that they are mixtures and attempts to determine the number of components. He also offers a listing of his main programs in an appendix. Best wishes Charles Johnson Tobias Ryden (tobias@tts.lth.se) provided a couple of references which I haven't followed up yet: @ARTICLE{Henna85, AUTHOR = "Henna, J.", TITLE = "On Estimating of the Number of Constituents of a Finite Mixture of Continuous Distributions", JOURNAL = "Ann.\ Inst.\ Statist.\ Math.", VOLUME = 37, YEAR = 1985, PAGES = "235--240"} @ARTICLE{Leroux92b, AUTHOR = "Leroux, B. G.", TITLE = "Consistent Estimation of a Mixing Distribution", JOURNAL = "Ann.\ Statist.", VOLUME = 20, YEAR = 1992, PAGES = "1350--1360"} Peter Forster <forster@sispa.chem.TU-Berlin.DE> referred me to some work in Nuclear Magnetic Resonance based on Bayesian prob. theory. The references are: G. Larry Bretthorst in the Journal of Magnetic Resonance (JMR). There are three articles, they are in Vol. 88, No.3 (July 1990) and following. I found the second of these articles (Bayesian Analysis II: Signal Detection and Model Selection) quite useful (although the maths got me again). It basically seems to discuss a method of comparing models using bayes thm to derive the probability of the model given the data from the prob of the data given the model and the priors of the model and the data. Beyond that I didn't go as Radford Neal's paper came along and I understood it a bit better. Behzad M Shahshahani (behzad@ecn.purdue.edu) referred me to some work on Minimum Description Length (MDL) and similar models: AIC and MDL are both methods for selecting order in a model, such as the number of components in a mixture. AIC has been proven to be inconsistent and therefore recently MDL is used much more widely. For AIC, tou can look at H.Akike's paper "A new look at the statistical model identification" at IEEE Transactions on automatic control, vol AC-19, Dec 1974, 716-723. Basically AIC penalizes more complex models by subtracting a penalizing factor equal to the number of free parameters in the model from the maximum of the likelihood of the data obtained under that model. MDL is similar but the penalizing factor has also a factor equal to log of the number of samples in it. for MDL see Rissanen's papers: "Modeling by shortest data description", in Automatica 1978, 465-471, "A universal prior for integers..." in The Annals of Statistics 1983, 416-431 "Stochastic complexity and modeling" in The Annals of Statistics, 1986 1080-1100 Also look at Schwarz's paper which derived a criterion like MDL from a Bayesian standpoint: "Estimating the dimension of a model" in The Annals of Statistics 1978 461-464. I have yet to follow any of this up short of copying a few papers. Murray A. Jorgensen [ maj@waikato.ac.nz ] provided a reference to: Some information about this problem can be found in section 1.10 (Tests for the number of components in a mixture) in 'Mixture Models' by G.J. McLachlan and K.E. Basford. Which I haven't been able to get hold of yet. Adrian Raftery (raftery@stat.washington.edu) sent a pointer to the MCLUST package on StatLib which is a general model-based Gaussian hierarchical clustering package. I haven't had a chance to look at this yet. The all-Fortran version is available by sending a message to statlib@stat.cmu.edu of the form "send mclust from general". The S-driven version is available by sending a message of the form "send mclust from S". This is now included in S-PLUS (version 3.1). Caution : the program is very large. Finally, Peter Andreae (pondy@comp.vuw.ac.nz) is working on clustering data using a machine learning system based on COBWEB (sorry I don't have any references with me but look in machine learning texts). The algorithm does a hill climbing search through possible partitionings. The useful idea to you might be the evaluation criteria. COBWEB claims to use a partition evaluation criterion that combines intra-class homogeneity and inter-class distinctiveness. (my terms - he calls them predictiveness and predicatability or vice versa). In fact, he doesn't - he only uses intra-class homogeneity, and trades off against the number of classes. We have experimented briefly with a better metric, and are about (now that I almost have my program working) to experiment more solidly with a variety of metrics. I hope that this summary is useful. Again many thanks to those who responded to my query. Your replies have been very helpful. * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html