- file 95bonf.html ->
Multiple testing, Bonferroni (1995).
Multiple testing, Bonferroni, effect size.
(several).
=====================John Whitington, 25 Sep 1995========ssc
Message-ID: <199509251613.RAA21583@mag-net.co.uk>
From: John Whittington
Subject: Re: When to correct
On Fri, 22 Sep 1995, Patrick Fleury wrote:
> Test E is the sticky one. In division 1 it is just barely significant
>( p = .04 ) and in division 2 it is not significant ( p = .30 ). Test E
>is also something of an internal problem for my client. If it is significant,
>he will need to face some unpleasant facts.
>
> Question: Should I correct for multiple tests using a Bonferroni
>correction?
>
> If I do, then test E goes away and we have a nice tidy result. But
>then my client will be criticized for using a statistical "trick" to ignore a
>possible problem.
>
> If I do not, then my client is faced with his other problem.
Pat, this is a common dilemma which I suspect we are all faced with from
time to time, particularly in consultancy.
The greatest issue here is probably not the mathematical wrongs and rights
of undertaking the correction. Perhaps the greatest lesson is that one
should define one's analytical intentions, 'a priori' in a written 'analysis
plan'. Had you done that, you would not be asking this question now. Had
your plan said that you would apply Bonferroni corrections, then you would
not have been faced with a p-value below the magic threshold of 0.05. Had
your plan said that you would *not* undertake such corrections then, again,
the question would not have arisen.
Perhaps even more worrying is the way in which these p-values appear to be
regarded as having divine inspiration. To suggest that your client 'will
have to face unpleasant facts' if p=0.40, but that (s)he won't if p=0.51 is
carrying the 'magic' of p=0.05 a little too far, isn't it? Furthemore, what
about the *magnitude* of this effect, quite apart from any p-value. It is
surely much more likely that the maginitude, rather than the p-value, which
is going to determine whether your client has got a major cause for
potential concern?
With the sitaution as it is, I would think that by far the best
interpretation (possibly modified by knowledge of the effect *magintude*) is
that this one area is rather 'iffy' (whether p be 0.40, 0.51 or 0.60) and
that no firm conclusion can be drawn. The situation needs to be looked
into, maybe other/more data needs to be collected, but I certainly don't
think that your client should commit suicide (or book his/her world cruise) yet!
In terms of other sorts of data, this is the situation where confidence
intervals can cause so many few headaches than p-values!
...that's my two cents/pence' worth, anyway.
John
=================Rich Ulrich, 26 Sep 1995========ssc
From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: When to correct
Message-ID: <4497j3$9qj@usenet.srv.cis.pitt.edu>
Patrick Fleury (pfleury@popmail.mcs.com) wrote:
<: --Pat F : pfleury@mcs.com >
: This is a question which I ran up against recently and
: on which I would like to see a little discussion. It concerns
: how many tests to run before putting in corrections for multiple
: comparisons.
: Let me give an example. I have a client who has two different
: but parallel datasets. They are the same data on two different divisions
: of his business. He has 5 chi-square goodness of fit tests to run on each
: for a total of 10 tests.
: Let me call the divisions 1 and 2 and the tests A,B,C,D and E.
: Now, tests A B and C turn out to be very significant for both divisions.
< deleted, stuff about varying outcomes, leading to question of
whether Bonferroni adjustment should be made...>
What is an `experiment'? What is a `conservative approach'? What is
magical about the 0.05 test level? - I would like to endorse the
advice given by JohnW, that you should be concerned about magnitude of
effects, and not hypnotized by any stated cut-off like .05. But I
would like to say something more about the question of testing.
You do not necessarily have an `experimental hypothesis' just because
you can perform a TEST. If I do a randomized trial, it may be
important to me to check that two groups got randomized pretty evenly
in terms of age and sex (say). However, to be conservative in
testing my outcome, I do *not* take (Outcome, Age, Sex) as three
experimental hypotheses; instead, I test Age and Sex separately,
as potentially confounding variables that I have to worry about,
being CAREFUL to warn about any suggestion of differences, even if
they do not reach the .05 level. Then I test Outcome, and if age
and sex WERE important to outcome, then I should control for them
in my analysis: In Anova, that would reduce the size of my residual
Error, even it does not change the size of the estimated effect.
In the described results, several variables (A,B,C) have LARGE effects
on one outcome; the idea occurs to me that a) those effects should
be tested simultaneously, and b) perhaps those effects were quite
readily expected beforehand, putting them in a different status
or perspective from the two variables D and E. (Are we looking at
small samples and therefore large effects, or very large samples,
and relatively small effects...?)
If these several different variables really belong on one equal
platform for comparison, that is, they really are equal parts of
one `experiment', then the reported Outcome seems to be dominated
by the size of the unequivocal effects on the variables A, B, and C.
The fact that someone is still worried about E suggests to me that
it should be regarded as a separate `experiment'. Do you want a
5% error rate on THIS test? Or on the whole set of tests?
If you (or the client) don't want to see an effect in E, then it
is self-serving to apply Bonferroni correction to minimize the
public statement about the effect. But whether it is FAIR or
not depends on other facts and values.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html