On-Line Software for Clustering and Multivariate Analysis

This is a short review of programs and packages available for public access, by anonymous ftp, Gopher or World-Wide Web (Mosaic, Lynx or other browser). No attempt has been made to list codes which can be had by directly contacting the author. No attempt has been made (so far) to list system-specific sites (e.g. SAS, XLisp-Stat). No attempt has been made (again, so far) to list commercial or shareware codes. No guarantees are given nor implied in respect to software referred to here.

F. Murtagh (fmurtagh@eso.org), May 1994. Update Sept. 1994.
S. Hirtle (hirtle+@pitt.edu) Update May 1997.

Most Recent Update: Links to Montreal Programs

Fionn Murtagh has maintained his own set of updates in a non-html text file. We hope to merge these into a single file soon.

Statlib: a major site for statistical software of all sorts.

Here are some areas to check out:

CMLIB - Core Mathematics Library from NIST. CLUSTER "is a sublibrary of Fortran subroutines for cluster analysis and related line printer graphics. It includes routines for clustering variables and/or observations using algorithms such as direct joining and splitting, Fisher's exact optimization, single-link, K-means, and minimum mutations, and routines for estimating missing values. The subroutines in CLUSTER are described in the book "Clustering Algorithms" by J. A. Hartigan."

APSTAT - Selected Algorithms Transcribed from Applied Statistics. Mostly Fortran. Includes implementations of: minimal spanning tree, single-link hierarchical clustering, discriminant analysis of categorical data, branch and bound algorithm for feature subset selection, etc.

GENERAL - Software of General Statistical Interest. Includes the 3-d interactive data display package, XGobi. Algorithms for convex hull, and Delaunay triangulation. Mclust, model-based clustering routines (Banfield and Raftery). MVE, minimum volume ellipsoid estimator (Rousseeuw), PROGRESS, robust regression (Rousseeuw and Leroy), MARS, projection pursuit. Nonlinear discriminant analysis. LOESS regression. Etc.

MULTI - Multivariate Analysis and Clustering. Hierarchical clustering, principal components analysis, discriminant analysis. Former are mainly Fortran. Macintosh programs for multivariate data analysis and graphical display, linear regression with errors in both variables, software directory including details of packages for phylogeny estimation and to support consensus clustering.

Netlib at AT&T Bell Laboratories is a major site for numerical analysis software, including eigenvalue/vector packages EISPACK, SVDPACK, etc. Anonymous ftp to netlib.att.com.

The programs from the "First and Second Multidimensional Scaling Packages of Bell Laboratories" are available in the subdirectory netlib/mds.

MLC++, a Machine Learning library in C++. MLC++ is a library of C++ classes and tools for supervised Machine Learning being developed at the Robotics lab in Stanford University.

DOS-based programs from Glenn Milligan at Ohio State University
Departement des Sciences biologiques, Universite de Montreal
Kovach Computing Services including shareware ordination and clustering program MVSP, Pentraeth, Wales, U.K.

TOOLDIAG is a collection of methods for statistical pattern recognition. The main area of application is classification. The main capabilities of the program are:

        - Different classifier architectures
                KNN, QGC, RBF, Parzen, Q*
        - Feature selection 
                + Search strategies: BF, SFS, SBS, B&B, Exhaustive
                + Selection criteria: Minimum error, probabilistic distance, inter-class distance
        - Feature extraction 
                PCA, LDA, Sammon
        - Supervised learning of a classifier 
        - Error estimation 
                LOO, K-fold cross validation, Resubstitution, Holdout, Bootstrap
        - Normalization 
        - Graphical Interface to the GNUPLOT program 
For more information see, http://www.uninova.pt/~tr/home/tooldiag.html or directly from ftp://ftp.uninova.pt/users/tr/soft/tooldiag.README

ALN (Adaptive Logic Network; William W. Armstrong, Dept. of Comp. Sci., University of Alberta. arms@cs.ualberta.ca): "belongs to the class of artificial neural systems. ... uses only simple logical functions AND, OR, and NOT. In hardware, computations would be done in parallel in a tree of combinational logic gates."

Demonstration software in C-source form is available to researchers for non-commercial purposes only. (Contact author.)

Cluster (Andreas Stolcke, stolcke@ICSI.berkeley.edu): "cluster utility. ... performs Hierarchical Cluster Analysis (HCA) on a set of vectors and outputs the result in a variety of formats on standard output. ... performs Principal Component Analysis (PCA) on a set of vectors and prints the transformed set of vectors on standard output."

Available by anonymous ftp from ftp.icsi.berkeley.edu (, cd pub/ai Program is cluster-2.2.tar.Z

Voronoi diagram/Delaunay triangulation:

Summary of responses to message in Vision-List Digest (20 April 1994) - see below for compiler, and subscription details to this Digest:

Algorithm by Steve Fortune is available from netlib@research.att.com
Use: "send sweep2 from voronoi"
The alg calculates both Voronoi and Delaunay diagrams.

Quickhull by anonymous ftp from geom.umn.edu
get /pub/software/qhull.tar.Z
The alg calculates the Delaunay triangulation and convex hull.

Dave Watson sent me a copy of nnsort.c which computes the Delaunay triangulation and convex hull in 2D and 3D.

Olivier Devillers sent a copy of deltree.c which computes the Voronoi/Delaunay diagrams and also has a function that returns the nearest neighbour pt. in the diagram to any arbitarily chosen point. He also includes an interactive interface in SunView. (Comments in French)

"Computational Geometry in C", by Joseph O'Rourke, Cambridge University Press, 1994, ISBN 0-521-44592-2. This has complete programs for Voronoi/Delaunay diagrams.

[Msg. from feisal@ldc.uwi.tt, in moderated Vision-List Digest membership requests to vision-list-request@teleos.com]

3-d voronoi diagrams:

vcs (John M. Sullivan, Geometry Center, Univ. Minn.; sullivan@geom.umn.edu): "code for 3-d voronoi diagrams". Available by anonymous ftp from: geom.umn.edu:pub/vcs.tar.Z

Also see a discussion of CART-type methods from Warren Sarle on sci.stat.math.
Return to CSNA Home Page