<- file stat 97clust.html -> Clusters/proximities - see also - several other Sarle contributions on Clustering and related topics. Sarle, REFs on Cluster membership, Ulrich, notes on Proximity scoring.
  • Cluster membership
  • =======================Warren Sarle, 10 Mar 1997==========ssc From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Re: Probability of Cluster Memb Message-ID: <E6t7ru.C6H@unx.sas.com> In article <5fps4f$k0l@usenet.srv.cis.pitt.edu>, wpilib+@pitt.edu (Richard F Ulrich) writes: |> << mconklin@cresearch.com >> |> Michael Conklin (mconklin_1@CRESEARCH.COM) wrote: |> : I have a clustering program which assigns respondents to a single |> : cluster. I have been asked to assign probabilities to these cluster |> : memberships, that is, for each respondent determine the probability that |> : this person really belongs in this cluster instead of another cluster. |> |> : Is there a general approach for doing this? |> |> One way to make sense of "clusters" which might come out of various |> sorts of programs, using arbitrary selection rules, is to run a |> "discriminant function" which operates on multivariate-normal |> distances. Various stat-packages or texts will describe these: The resulting probability estimates would be extremely biased if you used any of the classical clustering methods; see, for example: Marriott, F.H.C. (1971), "Practical Problems in a Method of Cluster Analysis,"Biometrics, 27, 501-514. The proper approach is to fit a mixture model: McLachlan, G.J. and Basford, K.E. (1988), Mixture Models, New York: Marcel Dekker, Inc. Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1985), Statistical Analysis of Finite Mixture Distributions, New York: John Wiley & Sons, Inc. *--------
  • Calculate proximities?
  • =======================Rich Ulrich, 18 Apr 1997==========spss From: wpilib+@pitt.edu (Richard F Ulrich) Subject: Re: cluster analysis with mixed data Message-ID: <5j8k99$k9q@usenet.srv.cis.pitt.edu> Purnima Chawla (pchawla@ETS.ORG) wrote: : Hello, everyone. : I'm trying to do a cluster analysis based on how people have responded to : a survey. Some of the questions in the survey required a Y/N response : and thus yield binomial data. Others required a Likert type response; : still others were designed like multiple choice questions but are graded : and can be interpreted as ordinal data. : My question is this: how should the proximities be calculated if some of : the data are binomial and some are ordinal? Also, does SPSS do this? How? carefully! a) SPSS Proximities has a number of possibilities for scoring distances or proximities for binomial data. It is possible to use a subset of variables, and create a Distance-score from dichotomies; then use that as just one among another set of variables. b) the default (without STANDARDIZE) means that a variable scored 0-100 would have 10 or 100 times the weight as a variable scored 1-10. Et cetera. c) the choice of VARIABLES going into your distances is what the weighting will be computed over; if there are 10 trivial variables and one important one, the distances will be not depend much on the important one.... * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html