Here is one posting with references, and a response to that,
with further commentary and references. The subject is
"compositional analyses", i.e., what to do when the
categories add to 100%, and information about the ratio
of A to B, for instance, may be more important than the
absolute levels of A or B.
=====================Steve Cumming, 09 Oct 1995========ssc
From: stevec@geog.ubc.ca (Steve Cumming)
Subject: Re: Testing for differences in proportions
Message-ID: <45ca3n$rrq@nntp.ucs.ubc.ca>
>> ...snip
>I have seen a lot of individuals who have posted help for Maria on this
>problem. I am sure that they have intended to help her with her problem.
And goes on to recommend Aitchison's book, the Statistical Analysis
of Compositional Data, singing the good professor's praises.
To which I add, hallelula and amen. Not least among the books virtues
is that it succeeds in explaining non-trivial multivariate methods so
that I almost feel I undertstand them.
I must take issue with Brian on one point though. Compositional data
are not rare, at least in landscape ecology (my discipline), other
areas of ecological enquiry, geo-chemistry, food science (evidently)
and gawd knows what else. I'm appending a fairly complete bibliography
of Aitchison's work, and a number of recent forward references from the
Ecological literature:
@string{jrssb = "J. Roy. Stat. Soc. Ser. B"} % Math Library
@string{mathgeol = "Math. Geol."} % Mathematical Biology, Main Stacks
@string{japecol = "J.\ Appl.\ Ecol."} % Journal of Applied Ecology, Woodward
@Article{aitchison82,
author = "J. Aitchison",
title = "The statistical analysis of compositional data",
journal = jrssb,
year = 1982,
volume = 44,
number = 2,
pages = "139-177",
annote = "With discussion."
}
@Article{aitchison84a,
author = "J. Aitchison",
title = "The statistical analysis of geochemical compositions",
journal = mathgeol,
year = 1984,
volume = 16,
number = 6,
pages = "531-564"
}
@Article{aitchison84b,
author = "J. Aitchison and S. M. Shen",
title = "Measurement Error in Compositional Data",
journal = mathgeol,
year = 1984,
volume = 16,
number = 6,
pages = "637-650"
}
@Article{aitchison84c,
author = "J, Aitchison",
title = "Reducing the Dimensionality of Compositional Data Sets",
journal = mathgeol,
year = 1984,
volume = 16,
number = 6,
pages = "617-635"
}
@Book{aitchison86,
author = "J. Aitchison",
title = "The Statistical Annalysis of Compositional Data",
publisher = "Chapman and Hall",
year = 1986,
series = "Monographs on Statistics and Applied Probability",
address = "London",
annote = "A greatly expanded version of the original 1982
paper, with lots of examples of hypothesis testing"
}
@Article{aitchison92,
author = "J. Aitchison",
title = "On Criteria of Measures of Compositional Difference",
journal = mathgeol,
year = 1992,
volume = 24,
number = 4,
pages = "365-379"
}
% Some selected applications from the recent literature
% follow
@Article{aebischer93,
author = "N. J. Aebischer and P. A. Robertson and R. E. Kenward",
title = "Compositional Analysis of habitat use from animal
radio-trackng data",
journal = "Ecology",
year = 1993,
volume = 74,
number = 5,
pages = "1313-1325",
annote = "I'm sending this to Kim'
}
@Article{robertson93,
author = "P. A. Robertson and M. I. A. Woodburn and W. Neutel
and C. E. Bealey",
title = "Effects of land use on breeding pheasant density",
OPTcrossref = "",
OPTkey = "",
journal = japecol,
year = "1993",
volume = "30",
pages = "465-477"
}
@Article{clements91,
author = "A.-M. Clements and M. C. Jones",
title = "An ecological exampled of the application of
projection pursuit to compositional data",
journal = "Vegetatio",
year = "1991",
volume = "95",
pages = "101-107",
annote = "An interesting but unsuccessful attempt to relate
vegetation and soils patterns in New South Wales
using the latest neato-keen methods"
}
@Article{hermy91,
author = "M. Hermy and P. J. Lewi",
title = "Multivariate ratio analysis, a graphical method for
ecological ordination",
journal = "Ecology",
year = 1991,
volume = 72,
number = 2,
pages = "735-738"
}
@Article{rayens91
author = "W. S. Rayens and C. Srinivasan,",
title = "Box-{C}ox transformations in the analysis of
compositional data",
journal = "J. Chemometrics",
year = 1991,
volume = 5,
pages = "227-239",
annote = "Generalise Aitchison's transform to improve
multi-variate normality in some cases. Also
discuss MLE methods for estimation of confidence for
the 'true, unknown compositional constituents'."
}
%% Science Citation Index (That I have'nt looked up yet)
%
% Quaternary Research 41 70 1994
% Oecologia 86 147 1991
% Behavioural Ecology 26 139 1990
% J Roy Stat Soc A 157 231 1994
% Biometrika 79 57 1983 QH 301 B5
% Ibis 136 39 1994 Woodward: treats method as routine
%% Silver Platter: search on Compositional near1 data
%
% Can. J. Plant Science 74(3) Mac S1 C35
% Sylvae Genetica 42(6)
% Sylvae Genetica 39 p 173
% Anatomical Record 240(4) 625-31 Woodward QL 801 A45
%% Further references abound in the Geological, medical, horticultural
%% and food science (ick!) literature
________________________________________________________________
...more on Compositional analysis. Watson.
=====================Dave Watson, 12 Oct 1995========ssm,sgg
From: watson@madvax (Dave Watson)
Newsgroups: sci.geo.geology,sci.stat.math
Subject: Re: (Wrong) Statistical Analysis of Compositional Data
Message-ID: <45hre5$ck5@styx.uwa.edu.au>
Steve Cumming (stevec@geog.ubc.ca) wrote:
: Actually, as far as I can gather, any good stats package should
: be able to do what you want, for example SAS. The only trick is
: doing the log-ratio transforms. I'm written a C-language utility to
: do this, and am working on adding multivariate normality tests.
: You can have a copy, if you wish. So can any one else, by
: writing me.
This is a misunderstanding. Compositional data, because it is
composed of mutually dependent components, only should be thought
of as "directional" data. The chemical, or mineral, composition
of a rock is a set of proportions - although we measure the
magnitudes of the components, it is a grave mistake to consider
the magnitude of a component, or the sum of the magnitudes of the
components, as being a significant characteristic of a composition.
Only the relative magnitudes, that is, proportions, of the
components are relevant. Those proportions define a direction or,
if you like, a vector with an undefined length. This means that
the difference between two compositions is an angle.
And that in turn means that treating compositional data on the
sum-to-one plane, or a ternary diagram, introduces significant errors.
Anyone who cares to, can see these errors for themselves with a sketch
on the back of an envelope. Consider two pairs of compositions,
where each pair has the same small angular difference but one pair is
near the center of the ternary diagram while the other pair is near a
vertex. Your sketch will show you that the distances between mates
of a pair are not the same. This contradicts the precondition that
each pair has the same angular difference.
The reason is simple. An angle is a distance in spherical space.
Distance in linear space, or any other space including log-ratio space,
is not the same as distance in spherical space. A fixed distance in
spherical space will not be invariant when projected onto an other
space. When a set of compositions are analysed on the sum-to-one
plane, any intention to find clusters, classifications, density
contours, or discriminary criteria of any sort, will be confounded
by the errors introduced by the initial projection onto that plane.
No amount of correctional transformations, including log-ratio, will
properly compensate that introduced distortion. If you want the right
answer, you must treat your compositional data in spherical space.
For example, over the last 25 years or more a large number of very
complicated programs have been written to translate a chemical or
oxide composition into the appropriate mineral composition and so
identify the rock that was chemically analysed. But this problem
is simple in spherical space - the correct proportions of the
appropriate minerals, and only those, are directly specified by
the spherical natural neighbor coordinates of the chemical composition
taken with respect to the set of all minerals.
Philip, G.M. and Watson, D.F., 1988, Determining the representative
composition of a set of sandstone samples, Geol. Mag., 125(3), 267-272.
Philip, G.M. and Watson, D.F., 1988, Angles measure compositional
differences, Geology, 16, 976-979.
Philip, G.M. and Watson, D.F., 1989, Some geometric aspects of the
ternary diagram, J. Geol. Education, 37(1), 27-29.
Watson, D.F. and Philip, G.M., 1989, Measures of variability for
geological data, Math. Geol., 21(2), 233-254.
Watson, D.F., 1988, Natural neighbor sorting on the n-dimensional
sphere, Pattern Recognition, 21(1), 63-67.
Traditional and conventional statistical procedures cannot
adequately treat compositional data. Means, and higher moments,
were designed and intended to treat replicate data - that is,
repeated observations of INDEPENDENT variables and have been
extended to apply to sets of independent events. Applying these
procedures to sets of independent events with associated dependent
variables provides radically different result which is surficial
rather than statistical. Again, applying these procedures to sets
of dependent variables provides another radically different results.
See the diagrams in Watson and Philip, 1989, which display the
same set of numbers after different independent/dependent
assumptions.
John Aitchison has vehemently disputed these conclusions. But
although he readily agrees that the differences between compositions
are angles, his refutation depends upon unsubstantiated assertions,
derogatory innuendo, and ridicule. This is understandable because
who could provide a reasoned and logical denial of the evidence
given by the back-of-an-envelope sketch.
Aitchison, J., 1990, Comments on " Measures of variability for
geological data", Math. Geol., 22(2), 223-226.
Aitchison, J., 1991, Delusions of uniqueness and ineluctability,
Math., Geol., 23(2), 275-77.
Aitchison, J., 1992, On criteria for measures of compositional
difference, Math. Geol., 24(4), 365-379.
--
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html