- file stat 97nonpar.html ->
Nonparametric - 6 topics
In this file, some miscellaneous:
Cramers' V meaning Bohlman
Why is X^2 "nonparametric" Maxwell
CI for *small* proportions Turner
Defining quantiles REF McClelland
Wilcoxson signed rank (no) Watzka
The trouble with tied Ranks Ulrich
Cramers' V
=======================Eric Bohlman, 25 Mar 1997==========sse
From: ebohlman@netcom.com (Eric Bohlman)
Subject: Re: Cramers'V coefficient
Message-ID:
LALOUM ERIC (laloum@pcm.ecp.fr) wrote:
: What is the law that follow the Cramer's V coefficient for contingency tables
: Where can I find informations about its use ?
Cramer's V is simply a way of normalizing the Pearson Chi-Square
statistic for a contingency table so that it will fall between 0 and 1.
It's defined as Pearson's statistic divided by (sample size time k-1),
where k is the smaller of the number of rows and columns. Higher values
of V indicate stronger association between the row and column categories.
The problem with V is that it's hard to interpret. There are other
measures of association for contingency tables that have more realistic
interpretations; the best known are Goodman and Kruskal's lambda, Goodman
and Kruskal's tau, and Theil's U (uncertainty coefficient). All of these
have the same interpretation as R-squared in regression: the proportional
reduction in variance in one measure based on knowing the other. They
differ in how they compute the "variance" of a row or column: for lambda
it's the probability that an observation is in a category other than the
most common (modal) one, for tau it's the probability that a pair of
observations are in different categories, and for U it's the entropy (sum
of Pi times log(Pi)) of the category probabilities. All of these
measures range from 0 to 1, and they're all asymmetric (treating one
dimension as a response variable and the other as an explanatory variable);
symmetric versions of all of them have been defined, but they're harder
to interpret than the asymmetric one.
Why is contingency testing nonparametric?
=======================Nicholas Maxwell, 22 May 1997==========sse
Message-ID:
From: Nicholas Maxwell
Subject: Re: Chi-Square Tests: Non Parametric?
On Thu, 22 May 1997, Javi wrote:
> I'd like to know some reasons (pros and cons) to put Chi-Square Tests in
> the Non-Parametric Section of a Course, instead of locate them in the
> Parametric
> Section.
Due to an accident of history, "non-parametric" has come to mean "not
based on an assumption of normally distributed errors". That is, it does
not mean that parameters in models are not being tested. In the case of
tests of categorical variables, the errors are not normally distributed,
so the test is non-parametric.
> (I'm interested in pros for Non-Parametric)...
In data analysis, non-parametrics usually protect you from being misled by
outliers in the data. I am inclined to test data with both parametrics
and non-parametrics, and if the results are the same, I am confident in
building a model. If the results differ, then I feel that I need to look
at the data more closely to understand what is in the data that is leading
to the inconsistency, and try to report that aspect of the data. (In that
situation, I would love to be able to run a larger data collection to see
if I could get a cleaner image, but that luxary is not always available.)
In statistics education, the thinking behind non-parametrics is usually
not more complex than that behind parametrics. In some cases, such as the
binomial test, the thinking is much easier to understand.
Teaching both parametrics and some non-parametrics may allow
students to extrapolate the commonalities, which are the basic conceptions
of significance testing.
Hope this helps.
Nick
CI for *small* proportion
=======================David L Turner, 29 Jan 1997==========ssc
Subject: Re: Negative confidence interval for proportion ????
Message-ID: <32EF9E6B.13D5@cc.usu.edu>
From: "David L. Turner"
Chiu Kit Jessica Tse wrote:
>
> I calculated the weighted proportion and standard error
> of a rare factor.
>
> I got the proportion as 0.0175 and
> the standard error as 0.011895.
>
> The corresponding 95% Confidence Interval is
> -0.0058, 0.0408
>
> How do I explain the negative lower C.I.?
> I know my math is correct.
>
> Could someone please help me understand this.
> Thanks a lot.
>
An exact solution (assuming an underlying binomial distribution) is
easily computed using the inverse beta distribution. I am assuming
your sample size was 121 and there were 2 successes. Adjustments
for what you really had should be obvious. 0.05 gives a 95% CI.
In Quattro the lower and upper limits are given by:
@BETAINV(0.05/2,2,121-2+1) and @BETAINV(1-0.05/2,2+1,121-2)
If Excel is your spreadsheet of choice, replace the @ with an =
This procedure will never give a negative lower limit, although if
there are no successes in the sample, an error is returned for the
lower limit. This should be replaced by zero. The upper limit
can still be computed.
Defining quantiles
=======================Gary McClelland, 12 Mar 1997==========sse
From: Gary McClelland
Subject: Re: Quartiles age? (y/n) y
Message-ID: <33271AE9.6127@Colorado.edu>
> In article <9703061918.AA06812@oz.plymouth.edu>, Bob Hayden
> wrote:
>
> > There are several ways of defining quartiles.
> >
> > There are even more ways of defining the hinges used in boxplots.
> >
> > Some books and software call the hinges quartiles.
> >
For a good article on the many definitions of sample quantiles
and which programs use which definitions, see
Hyndman, R.J., & Yanan, F. (1996). Sample quantiles in
statistical packages. The American Statistician, 50(4),361-
Their clear statement of the different definitions helped me a lot.
gary
--
Gary.McClelland@Colorado.edu http://psych.colorado.edu/~mcclella/
Dept of Psych, CB345, Univ of Colorado, Boulder, CO 80309-0345 USA
voice: 303-492-8617 fax: 303-492-5580
Wilcoxon Signed Rank comments
=======================Kurt Watzka, 21 Apr 1997==========ssc
From: watzka@stat.uni-muenchen.de (Kurt Watzka)
Subject: Re: Wilcoxon Signed Rank Test
Message-ID: <5jgfn5$gd7$1@sparcserver.lrz-muenchen.de>
Susan Mangiero writes:
>I need to document (for my dissertation in finance) that the Wilcoxon
>Signed Rank Test is robust to small sample size. I am using a sample of
>13 pairs of firms in one case and 16 pairs of firms in a second case. If
>anyone can help me with a citation, I will be especially grateful. Thanks.
The "signed ranks, matched pairs" Test is "robust" in the sense that
it does not depend on the assumption that the differences between
the matched observations have a normal distribution. However, it is
less powerful than a parametric test, and for that reason it is not
a good choice for small sample sizes, if the assumptions of the
parametric test (i.e. a matched pairs t-test) are met. The assumption
behind the "signed ranks, matched pairs" Test is _symmetry_ of the
_continous_ distribution of the differences, so you do not assume _much_
less if you use this test. If your differences come from a symmetric
distribution that is not normal, and if the sample size is reasonably
small, you might consider using Fisher's permutation test. It is
basically the same test as the "signed ranks, matched pairs" test, but
the distribution of the test statistic can only be computed _given_
the absolute values of the differences, and requires a lot of computation
for bigger samples. However, Fisher's permutation test is as powerful
as the matched pairs t-test _if_ the assumptions for that test are met,
but it does not depend on them.
Ranking with most scores tied.
=======================Rich Ulrich, 28 May 1997==========spss
Subject: Re: Spearman rank correlation
Message-ID: <5mi2ng$l0n@usenet.srv.cis.pitt.edu>
<< Bob Golub>>
Robert Golub (Procto@ix.netcom.com) wrote:
: Is it OK to use the spearman rank correlation with a limited range of
: ordered data for one of the variables? I am correlating interval data
: (1-100 range) with ordered subjective data (1-5 range) but most
: responses were 3-5. Spearman coefficient was chosen because the
: subjective data was not normally distributed. I was criticized for not
: correcting the correlation result for the restricted data range which
: could deflate the correlation coefficient. Is this true, and if so, is
: there any way to adjust the coefficient for a limited data range?
AFter you do a Rank-transform, computing the Spearman is done by
computing the Pearson. So, do you like what the transformation does
to the scores? If there are a whole bunch of ties at one extreme,
then the overall effect of Ranking is to convert the item into a
contrast that shows Extreme vs Other:
If there are 90% scores tied at "5", then your data consist of
10% of scores whose Ranks are between 0-10%; and 90% whose ranks are
at 55%. That is approximately like having 10% of scores that are
(0,1,2) and 90% of scores at 10 - when you do analyses, the trivial
differences between 0-2 are swamped by the difference of 10/other.
Look at your transformed scores: DO you like the apparent scaling
*after* transformation, the intervals shown between the scale-points,
better than the scaling before transformation? - For scales with
just 5 points, it is hard for the ranking to be much improvement,
unless the original scale was really awkward.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html