- file stat 93mixtur.html ->
Mixture of gaussians
Mixture of gaussians, REFs
=======================Steve Cassidy, 28 Jun 1993==========sms
Subject: fitting a mixture of gaussians -- summary
Message-ID:
As requested, here's a summary of the responses I got to my
query about fitting a gaussian mixture model to speech data.
Radford Neal (radford@cs.toronto.edu) said:
My view is that in most cases the question [of estimating the
number of components in a mixture] is not really sensible,
because in most cases the number of components in the mixture is
known a priori to be infinite (or at least very large).
To take a random example, say you are modelling the distribution of
weights of pollen grains collected. It's maybe reasonable to suppose
that each species of plant gives rise to a Gaussian distribution for
weights, so that you expect to see a mixture of Gaussians, with a very
large number of components (though it might be that a few species
account for most of the pollen).
I develop a Gibbs sampling approach to Bayesian inference for mixtures
with an infinite number of components in the following paper:
Neal, R. M. (1992) ``Bayesian mixture modeling'', in C. R. Smith,
G. J. Erickson, and P. O. Neudorfer (editors) Maximum Entropy
and Bayesian Methods: Proceedings of the 11th International Workshop
on Maximum Entropy and Bayesian Methods of Statistical Analysis, Seattle,
1991, Dordrecht: Kluwer Academic Publishers.
A slightly longer version of this paper is available as a TR, obtainable
in PostScript form by anonymous ftp from ftp.cs.toronto.edu, directory
pub/radford, file bmm.ps. These papers treat only the case where the
variables are discrete, but the technique can be generalized.
Radford Neal
As I said, I found this paper (the tech report) to be very useful and
mostly understandable given my (lack of) expertise in maths. I'm still
working through the algorithm trying to understand how to apply it to
my problem.
Charles Johnson (charles@prd.co.uk) wrote:
This is also a question of interest to me since it bears directly on
the efficacy of meta-analysis. The one attempt at a practical
solution I know of is Hoben Thomas's book ``Distributions of
correlation coefficients'' (Springer-Verlag, New York, 1989) where he
explores a number of correlational data sets assuming that they are
mixtures and attempts to determine the number of components. He also
offers a listing of his main programs in an appendix.
Best wishes
Charles Johnson
Tobias Ryden (tobias@tts.lth.se) provided a couple of references which
I haven't followed up yet:
@ARTICLE{Henna85,
AUTHOR = "Henna, J.",
TITLE = "On Estimating of the Number of Constituents of a Finite Mixture
of Continuous Distributions",
JOURNAL = "Ann.\ Inst.\ Statist.\ Math.",
VOLUME = 37,
YEAR = 1985,
PAGES = "235--240"}
@ARTICLE{Leroux92b,
AUTHOR = "Leroux, B. G.",
TITLE = "Consistent Estimation of a Mixing Distribution",
JOURNAL = "Ann.\ Statist.",
VOLUME = 20,
YEAR = 1992,
PAGES = "1350--1360"}
Peter Forster referred me to some
work in Nuclear Magnetic Resonance based on Bayesian prob. theory. The
references are:
G. Larry Bretthorst in the Journal of Magnetic Resonance (JMR). There are
three articles, they are in Vol. 88, No.3 (July 1990) and following.
I found the second of these articles (Bayesian Analysis II: Signal
Detection and Model Selection) quite useful (although the maths got me
again). It basically seems to discuss a method of comparing models
using bayes thm to derive the probability of the model given the data
from the prob of the data given the model and the priors of the model
and the data. Beyond that I didn't go as Radford Neal's paper came
along and I understood it a bit better.
Behzad M Shahshahani (behzad@ecn.purdue.edu) referred me to some work
on Minimum Description Length (MDL) and similar models:
AIC and MDL are both methods for selecting order in a model, such as the
number of components in a mixture. AIC has been proven to be inconsistent
and therefore recently MDL is used much more widely. For AIC, tou can
look at H.Akike's paper "A new look at the statistical model identification"
at IEEE Transactions on automatic control, vol AC-19, Dec 1974, 716-723.
Basically AIC penalizes more complex models by subtracting a
penalizing factor equal to the number of free parameters in the model
from the maximum of the likelihood of the data obtained under that model.
MDL is similar but the penalizing factor has also a factor equal to log of the
number of samples in it. for MDL see Rissanen's papers:
"Modeling by shortest data description", in Automatica 1978, 465-471,
"A universal prior for integers..." in The Annals of Statistics 1983,
416-431
"Stochastic complexity and modeling" in The Annals of Statistics, 1986
1080-1100
Also look at Schwarz's paper which derived a criterion like MDL from a
Bayesian standpoint:
"Estimating the dimension of a model" in The Annals of Statistics 1978
461-464.
I have yet to follow any of this up short of copying a few papers.
Murray A. Jorgensen [ maj@waikato.ac.nz ] provided a reference to:
Some information about this problem can be found in section 1.10 (Tests
for the number of components in a mixture) in 'Mixture Models' by G.J.
McLachlan and K.E. Basford.
Which I haven't been able to get hold of yet.
Adrian Raftery (raftery@stat.washington.edu) sent a pointer to the
MCLUST package on StatLib which is a general model-based Gaussian
hierarchical clustering package. I haven't had a chance to look at this
yet.
The all-Fortran version is available by sending a message to
statlib@stat.cmu.edu of the form "send mclust from general".
The S-driven version is available by sending a message of the form
"send mclust from S". This is now included in S-PLUS (version 3.1).
Caution : the program is very large.
Finally, Peter Andreae (pondy@comp.vuw.ac.nz) is working on clustering
data using a machine learning system based on COBWEB (sorry I don't
have any references with me but look in machine learning texts).
The algorithm does a hill climbing search through possible
partitionings. The useful idea to you might be the evaluation criteria.
COBWEB claims to use a partition evaluation criterion that combines
intra-class homogeneity and inter-class distinctiveness. (my terms -
he calls them predictiveness and predicatability or vice versa). In
fact, he doesn't - he only uses intra-class homogeneity, and trades
off against the number of classes. We have experimented briefly with
a better metric, and are about (now that I almost have my program
working) to experiment more solidly with a variety of metrics.
I hope that this summary is useful. Again many thanks to those who
responded to my query. Your replies have been very helpful.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html