<- file 95bonf.html -> Multiple testing, Bonferroni (1995).
  • Multiple testing, Bonferroni, effect size. (several).
  • =====================John Whitington, 25 Sep 1995========ssc Message-ID: <199509251613.RAA21583@mag-net.co.uk> From: John Whittington <johnw@MAG-NET.CO.UK> Subject: Re: When to correct On Fri, 22 Sep 1995, Patrick Fleury <pfleury@POPMAIL.MCS.COM> wrote: > Test E is the sticky one. In division 1 it is just barely significant >( p = .04 ) and in division 2 it is not significant ( p = .30 ). Test E >is also something of an internal problem for my client. If it is significant, >he will need to face some unpleasant facts. > > Question: Should I correct for multiple tests using a Bonferroni >correction? > > If I do, then test E goes away and we have a nice tidy result. But >then my client will be criticized for using a statistical "trick" to ignore a >possible problem. > > If I do not, then my client is faced with his other problem. Pat, this is a common dilemma which I suspect we are all faced with from time to time, particularly in consultancy. The greatest issue here is probably not the mathematical wrongs and rights of undertaking the correction. Perhaps the greatest lesson is that one should define one's analytical intentions, 'a priori' in a written 'analysis plan'. Had you done that, you would not be asking this question now. Had your plan said that you would apply Bonferroni corrections, then you would not have been faced with a p-value below the magic threshold of 0.05. Had your plan said that you would *not* undertake such corrections then, again, the question would not have arisen. Perhaps even more worrying is the way in which these p-values appear to be regarded as having divine inspiration. To suggest that your client 'will have to face unpleasant facts' if p=0.40, but that (s)he won't if p=0.51 is carrying the 'magic' of p=0.05 a little too far, isn't it? Furthemore, what about the *magnitude* of this effect, quite apart from any p-value. It is surely much more likely that the maginitude, rather than the p-value, which is going to determine whether your client has got a major cause for potential concern? With the sitaution as it is, I would think that by far the best interpretation (possibly modified by knowledge of the effect *magintude*) is that this one area is rather 'iffy' (whether p be 0.40, 0.51 or 0.60) and that no firm conclusion can be drawn. The situation needs to be looked into, maybe other/more data needs to be collected, but I certainly don't think that your client should commit suicide (or book his/her world cruise) yet! In terms of other sorts of data, this is the situation where confidence intervals can cause so many few headaches than p-values! ...that's my two cents/pence' worth, anyway. John =================Rich Ulrich, 26 Sep 1995========ssc From: wpilib+@pitt.edu (Richard F Ulrich) Subject: Re: When to correct Message-ID: <4497j3$9qj@usenet.srv.cis.pitt.edu> Patrick Fleury (pfleury@popmail.mcs.com) wrote: <: --Pat F : pfleury@mcs.com > : This is a question which I ran up against recently and : on which I would like to see a little discussion. It concerns : how many tests to run before putting in corrections for multiple : comparisons. : Let me give an example. I have a client who has two different : but parallel datasets. They are the same data on two different divisions : of his business. He has 5 chi-square goodness of fit tests to run on each : for a total of 10 tests. : Let me call the divisions 1 and 2 and the tests A,B,C,D and E. : Now, tests A B and C turn out to be very significant for both divisions. < deleted, stuff about varying outcomes, leading to question of whether Bonferroni adjustment should be made...> What is an `experiment'? What is a `conservative approach'? What is magical about the 0.05 test level? - I would like to endorse the advice given by JohnW, that you should be concerned about magnitude of effects, and not hypnotized by any stated cut-off like .05. But I would like to say something more about the question of testing. You do not necessarily have an `experimental hypothesis' just because you can perform a TEST. If I do a randomized trial, it may be important to me to check that two groups got randomized pretty evenly in terms of age and sex (say). However, to be conservative in testing my outcome, I do *not* take (Outcome, Age, Sex) as three experimental hypotheses; instead, I test Age and Sex separately, as potentially confounding variables that I have to worry about, being CAREFUL to warn about any suggestion of differences, even if they do not reach the .05 level. Then I test Outcome, and if age and sex WERE important to outcome, then I should control for them in my analysis: In Anova, that would reduce the size of my residual Error, even it does not change the size of the estimated effect. In the described results, several variables (A,B,C) have LARGE effects on one outcome; the idea occurs to me that a) those effects should be tested simultaneously, and b) perhaps those effects were quite readily expected beforehand, putting them in a different status or perspective from the two variables D and E. (Are we looking at small samples and therefore large effects, or very large samples, and relatively small effects...?) If these several different variables really belong on one equal platform for comparison, that is, they really are equal parts of one `experiment', then the reported Outcome seems to be dominated by the size of the unequivocal effects on the variables A, B, and C. The fact that someone is still worried about E suggests to me that it should be regarded as a separate `experiment'. Do you want a 5% error rate on THIS test? Or on the whole set of tests? If you (or the client) don't want to see an effect in E, then it is self-serving to apply Bonferroni correction to minimize the public statement about the effect. But whether it is FAIR or not depends on other facts and values. * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html