- file stat .html ->
Skewness and outliers (1997)
This file has general overviews of skewness
and outliers. Transformations and
nonparametrics are related topics. Most of the
notes were posted by me, but there are extensive
citations of earlier notes.
Which power transformation?
Robustness of normal. REFs Moore
Power transformations.
What's a "heavy tail"?
Outliers and fat tails?
What is "extreme" skewness?
*----------------
Which power transformation?
=======================Rich Ulrich, 27 Jan 1997==========ssc
Subject: Re: Which transfomation?
Message-ID: <5cjco3$ldm@usenet.srv.cis.pitt.edu>
Mark Myatt (mark@myatt.demon.co.uk) wrote:
: Albert Craig writes:
: >Can anyone help me find a test or guide/book that will
: >enable me to choose the appropriate transformation in order
: >to be able to use parametric stats on my data. The major
: >problem is the heterogeneity of the variances (lack of
: >homoscedasticity)
: Here is a table from my book "Analysing Data" (Brixton Books, ISBN 1-
: 873937-46-6, sorry for the plug!) but you'll find something similar in
: most applied stats books:
: Problem Severity / Nature Transformation
: ----------------- ----------------------------- --------------
: +ve skew severe 1/x
: moderate log(x)
: slight sqr(x)
-- I would like to note that if this is considered strictly true, it
must be a matter of DEFINITION or tautology -- that is, "severe
skewness" is "skewness which is mended by the 1/x transformation."
I discovered when doing some simulations that having a coefficient
of kurtosis of .7 (in my example) did not hurt my t-test, if the
underlying transformation needed was the square root, whereas it DID
matter if the transformation was the logarithm.
How did I know what was NEEDED? - easy, because I started with normal
samples, and transformed AWAY from normal.
If you take a N(3,1) and square it, you get something with much
higher kurtosis than if you take N(10,1) and square it; or N(3,0.3)
and square it. If you take N(0,1) and exponentiate, you get a lot
higher kurtosis than if you start with N(0,0.1).
My usual approach is to look at the extremes, the quartiles and the
median, and see which transformation makes them most symmetric.
*--------
Robustness of normal. REFs
=======================David S Moore, 26 Mar 1997==========sse
From: dsm@b.stat.purdue.edu (David S. Moore)
Subject: Re: sensitivity of normal theory methods
Message-ID: <5hbscv$21rk@b.stat.purdue.edu>
A compact discussion of the robustness of normal theory methods
for inference about _means_ and the lack of robustness of normal
theory methods for inference about _spread_ appears (with references)
in Chapter 7 of Moore and McCabe Introduction to the Practice of
Statistics.
Here is a partial list of simulation studies:
E. S. Pearson and N. W. Please, Relation between the shape of
population distribution and the robustness of four simple test
statistics, Biometrika, 62 (1975), 223--241.
Posten, H. O. (1978) The robustness of the two-sample t-test
over the Pearson system, J. of Statistical Computation and
Simulation 6, 295-311
Posten, H. O. (1979) The robustness of the one-sample t-test
over the Pearson system, J. of Statistical Computation and
Simulation 9, 133-149
Posten, H. O., Yeh, H. and Owen, D.B. (1982) Robustness of the
two-sample t-test under violations of the homogeneity assump-
tion, Communications in Statistics 11, 109-126.
Posten, H. O. (1982) Small sample power of the Wilcoxon test over
the Pearson system, and comparison with the t-test, J. of Statis-
tical Computation and Simulation 16, 1-18.
Posten, H. O. (1984) The Robustness of the t-test, Proceedings
of the Conference on Statistical Robustness and Nonparametric
Statistics Schwerin, East Germany, May, 1983. VEB Deutscher
Verlag der Wissenschaften, Berlin, pp. 92-99.
Posten, H.O. (1992) Robustness of the Two-Sample t-test Under
Violations of the Assumption of Homogeneity Assumption, Part II.
J. Comp. & Simul. 8, 2169-2184.
David Moore
*--------
Power transformations
=======================Rich Ulrich, 29 Jan 1997==========ssc
Subject: Re: Which transfomation?
Message-ID: <5cnp8a$aae@usenet.srv.cis.pitt.edu>
Mark Myatt (mark@myatt.demon.co.uk) wrote:
: Richard F Ulrich writes:
: > -- I would like to note that if this is considered strictly true, it
: >must be a matter of DEFINITION or tautology -- that is, "severe
: >skewness" is "skewness which is mended by the 1/x transformation."
: You are right. These are (only) rules of thumb but like most heuristics
: are useful. It's not quite that "tautological" as in the ordered
: response to skew "severe" means skewness NOT fixed by a log or square
: root transformation.
- I like that phrase, "the ordered response to skew". And it
inspires further thought and comment:
- On the one hand, "all things being equal", we should use the least
severe transformation that does the job (in practice, there is often
no doubt, but this is an abstract point).
- On the other hand, if the reciprocol changes Frequency to
Wavelength, is it really a "transformation" or is it
a re-parameterization? And, the square root is milder than the
log transform, but the latter is easier to talk about, and is a
frequently a "natural" candidate, when biological or chemical
processes are involved. So, "all things" are not always equal.
: >My usual approach is to look at the extremes, the quartiles and the
: >median, and see which transformation makes them most symmetric.
: This is better than the application of blind rules.
: Best wishes,
*--------
What's a heavy tail?
=======================Rich Ulrich, 1 Jun 1997==========sse
Subject: Re: What's a heavy tail?
Message-ID: <5msl6c$ral@usenet.srv.cis.pitt.edu>
Ursula Kellett (kellett@pigeon.qut.edu.au) wrote:
: I have read some times that a heavy tail(s) is not so good when it comes
: to inference. But what is a 'heavy' tail?
: I have also read about 'long', 'thick', and 'high' tails!
: My presumption is that _any_ shape qualifies as being 'heavy' if it
: increases the alpha probability at a given critical value compared to the
: reference non-heavy tailed distribution.
-- As a practical matter, I would say that your criterion is a
"sufficient" one, but it is not a "necessary" one. That is, if it
increases the alpha, then that qualifies it as heavy. But if it
decreases the alpha, THAT would also qualify it as heavy; for
instance, the t-distribution tends to have short tails, when the
data have long tails, for unequal Ns.
: Is this sounds ok, then it follows that a more powerful test is performed
: than would otherwise have occurred???
-- If the effective alpha is increased, then the direction of error is
to make the test "more powerful" in the sense that it will reject more
often. (I prefer to use the term "more power" when doing comparisons
between two tests that remain valid, where "not valid" is what we call
a test that is larger than its purported size.)
-- Here is one way to get fat tails from Normal distributions. Consider
the mixture of two Normals, where N1=90% or so has variance of 1, and
N2=10% has a variance of 10; or 100. Whatever you get as the estimate of
the variance of the mixture, the "VARIANCE of the variance" will be
larger than you would expect for the total N - In effect, the sample
acts as if the 10% WERE the total sample, when it comes to degrees of
freedom.
In the same way, even one or two EXTREME outliers can run the variance-
properties of ANY sample, making it unfit for ANOVA or correlation, etc.,
no matter how large the total N.
*--------
Outliers and fat tails
=======================Rich Ulrich, 9 May 1997==========ssc,ssm
Subject: Re: Testing For Normal Distribution
Message-ID: <5l04f6$j86@usenet.srv.cis.pitt.edu>
Paul Velleman (pfv2@cornell.edu) wrote:
: In article <5ktgvs$a67@usenet.srv.cis.pitt.edu>, (Richard
: F Ulrich) wrote:
: > I do remember a consultee who liked his data as it was, because it was
: > very wonderfully normal, except for just 1 or 2 really, really,
: > extreme data values.
: >
: > So here is a clue: If your results change much when you drop an
: > extreme point or two, then you do NOT have a "moderate deviation
: > from normality." (This still means you want to define when "results"
: I also agree with Clay. But I dont agree with Richard's drift here. If you
: have normal data with a few outliers, separate the outliers and analyze the
: rest of the data accepting the normality assumption. Then look at the
: outliers in terms of the models you fit to the bulk of the data. Never let
: an occasional outlier dictate anything about the distribution.
: Distributions are about the mass of the data. Outliers are often evidence
: of inhomogeneity -- that is, they come from some other process and should
: be dealt with separately.
-- and I further agree with Paul - there are SEVERAL good ways to
deal with outliers. I hope I can clarify my "drift" which drifted too
broadly.
My point was the narrower one, there is one BAD way to deal with
outliers, and that is to fold them into your parametric analysis;
there are other kinds of non-normal, but the Outlier is a sneaky
one which is sometimes overlooked.
IT takes experience to make that decision that Paul finds automatic -
remove the extreme outlier. Or transform it, do SOMETHING with it.
I *have* had the experience where my advisee wanted to do a his
tests with one or two extremes still in there - the data, in his view,
was "nearly normal" because there were only a couple of data values
that anyone might object to. Since 99% of his points were OK, he
figured the rest could be forgiven, I guess.... which is not quite how
it works.
And Clay's advice accidentally, wrongly, implies that "large n" is
a panacea - yes, large n is good, but *not* a cure-all. No matter
how large the n, one huge data value CAN do things like double the
Variance, or move the average by a ridiculous amount.
*--------
What is "extreme" skew ...?
=======================Rich Ulrich, 07 Mar 1997==========spss
Subject: Re: a question about those small unnoticeable indices (skewness, kurtosis. Kaiser-Meyer-Olkin, etc.)
Message-ID: <5fpe3e$iab@usenet.srv.cis.pitt.edu>
Amir Hetsroni (amiron@michtec.macam98.ac.il) wrote:
: Can any of the pros around can give me a good estimation what is the
: highest rate of kurtosis and skeweness parametric tests and central
: tendency indices can live with intact more or less?
-- With variables that are dichotomies or just a few scale points,
the measured kurtosis and skewness do not say much.
I haven't worried about the measures of kurtosis; I do look for
outliers, but they are usually on one end of the scale.
With continuous variables, samples of 75 or so, I have found that
skewness less than .3 never worries me. For the same measured
skewness, the effect on inference will be WORSE if taking logs
would produce symmetry, rather than just taking square roots.
I wish I could say more, but I just note that you do ask your
question precisely, concerning "parametric tests" and "central
tendency indices." Having *one* extreme outlier can mess up
parametric tests, and can move an average far from the mode or
media -- which is what is often meant by "central tendency."
I would be interested in hearing of rules-of-thumb, for when one
MUST trim/transform/drop one (or more) outlier, for ANOVA-test
purposes.
<< faq snip >>
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html