- file stat 97clust.html ->
Clusters/proximities
- see also - several other Sarle contributions on
Clustering and related topics.
Sarle, REFs on Cluster membership,
Ulrich, notes on Proximity scoring.
Cluster membership
=======================Warren Sarle, 10 Mar 1997==========ssc
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Probability of Cluster Memb
Message-ID:
In article <5fps4f$k0l@usenet.srv.cis.pitt.edu>, wpilib+@pitt.edu
(Richard F Ulrich) writes:
|> << mconklin@cresearch.com >>
|> Michael Conklin (mconklin_1@CRESEARCH.COM) wrote:
|> : I have a clustering program which assigns respondents to a single
|> : cluster. I have been asked to assign probabilities to these cluster
|> : memberships, that is, for each respondent determine the probability that
|> : this person really belongs in this cluster instead of another cluster.
|>
|> : Is there a general approach for doing this?
|>
|> One way to make sense of "clusters" which might come out of various
|> sorts of programs, using arbitrary selection rules, is to run a
|> "discriminant function" which operates on multivariate-normal
|> distances. Various stat-packages or texts will describe these:
The resulting probability estimates would be extremely biased if
you used any of the classical clustering methods; see, for example:
Marriott, F.H.C. (1971), "Practical Problems in a Method of Cluster
Analysis,"Biometrics, 27, 501-514.
The proper approach is to fit a mixture model:
McLachlan, G.J. and Basford, K.E. (1988), Mixture Models,
New York: Marcel Dekker, Inc.
Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1985),
Statistical Analysis of Finite Mixture Distributions,
New York: John Wiley & Sons, Inc.
*--------
Calculate proximities?
=======================Rich Ulrich, 18 Apr 1997==========spss
From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: cluster analysis with mixed data
Message-ID: <5j8k99$k9q@usenet.srv.cis.pitt.edu>
Purnima Chawla (pchawla@ETS.ORG) wrote:
: Hello, everyone.
: I'm trying to do a cluster analysis based on how people have responded to
: a survey. Some of the questions in the survey required a Y/N response
: and thus yield binomial data. Others required a Likert type response;
: still others were designed like multiple choice questions but are graded
: and can be interpreted as ordinal data.
: My question is this: how should the proximities be calculated if some of
: the data are binomial and some are ordinal? Also, does SPSS do this?
How? carefully!
a) SPSS Proximities has a number of possibilities for scoring
distances or proximities for binomial data. It is possible to use
a subset of variables, and create a Distance-score from dichotomies;
then use that as just one among another set of variables.
b) the default (without STANDARDIZE) means that a variable scored
0-100 would have 10 or 100 times the weight as a variable scored 1-10.
Et cetera.
c) the choice of VARIABLES going into your distances is what the
weighting will be computed over; if there are 10 trivial variables
and one important one, the distances will be not depend much on the
important one....
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html