<- file stat 97nonpar.html -> Nonparametric - 6 topics In this file, some miscellaneous: Cramers' V meaning Bohlman Why is X^2 "nonparametric" Maxwell CI for *small* proportions Turner Defining quantiles REF McClelland Wilcoxson signed rank (no) Watzka The trouble with tied Ranks Ulrich
  • Cramers' V
  • =======================Eric Bohlman, 25 Mar 1997==========sse From: ebohlman@netcom.com (Eric Bohlman) Subject: Re: Cramers'V coefficient Message-ID: <ebohlmanE7LG0H.I0@netcom.com> LALOUM ERIC (laloum@pcm.ecp.fr) wrote: : What is the law that follow the Cramer's V coefficient for contingency tables : Where can I find informations about its use ? Cramer's V is simply a way of normalizing the Pearson Chi-Square statistic for a contingency table so that it will fall between 0 and 1. It's defined as Pearson's statistic divided by (sample size time k-1), where k is the smaller of the number of rows and columns. Higher values of V indicate stronger association between the row and column categories. The problem with V is that it's hard to interpret. There are other measures of association for contingency tables that have more realistic interpretations; the best known are Goodman and Kruskal's lambda, Goodman and Kruskal's tau, and Theil's U (uncertainty coefficient). All of these have the same interpretation as R-squared in regression: the proportional reduction in variance in one measure based on knowing the other. They differ in how they compute the "variance" of a row or column: for lambda it's the probability that an observation is in a category other than the most common (modal) one, for tau it's the probability that a pair of observations are in different categories, and for U it's the entropy (sum of Pi times log(Pi)) of the category probabilities. All of these measures range from 0 to 1, and they're all asymmetric (treating one dimension as a response variable and the other as an explanatory variable); symmetric versions of all of them have been defined, but they're harder to interpret than the asymmetric one.
  • Why is contingency testing nonparametric?
  • =======================Nicholas Maxwell, 22 May 1997==========sse Message-ID: <Pine.A41.3.95b.970522100618.66916A-100000@homer32.u.washington.edu> From: Nicholas Maxwell <nmaxwell@U.Washington.EDU> Subject: Re: Chi-Square Tests: Non Parametric? On Thu, 22 May 1997, Javi wrote: > I'd like to know some reasons (pros and cons) to put Chi-Square Tests in > the Non-Parametric Section of a Course, instead of locate them in the > Parametric > Section. Due to an accident of history, "non-parametric" has come to mean "not based on an assumption of normally distributed errors". That is, it does not mean that parameters in models are not being tested. In the case of tests of categorical variables, the errors are not normally distributed, so the test is non-parametric. > (I'm interested in pros for Non-Parametric)... In data analysis, non-parametrics usually protect you from being misled by outliers in the data. I am inclined to test data with both parametrics and non-parametrics, and if the results are the same, I am confident in building a model. If the results differ, then I feel that I need to look at the data more closely to understand what is in the data that is leading to the inconsistency, and try to report that aspect of the data. (In that situation, I would love to be able to run a larger data collection to see if I could get a cleaner image, but that luxary is not always available.) In statistics education, the thinking behind non-parametrics is usually not more complex than that behind parametrics. In some cases, such as the binomial test, the thinking is much easier to understand. Teaching both parametrics and some non-parametrics may allow students to extrapolate the commonalities, which are the basic conceptions of significance testing. Hope this helps. Nick
  • CI for *small* proportion
  • =======================David L Turner, 29 Jan 1997==========ssc Subject: Re: Negative confidence interval for proportion ???? Message-ID: <32EF9E6B.13D5@cc.usu.edu> From: "David L. Turner" <dturner@cc.usu.edu> Chiu Kit Jessica Tse wrote: > > I calculated the weighted proportion and standard error > of a rare factor. > > I got the proportion as 0.0175 and > the standard error as 0.011895. > > The corresponding 95% Confidence Interval is > -0.0058, 0.0408 > > How do I explain the negative lower C.I.? > I know my math is correct. > > Could someone please help me understand this. > Thanks a lot. > An exact solution (assuming an underlying binomial distribution) is easily computed using the inverse beta distribution. I am assuming your sample size was 121 and there were 2 successes. Adjustments for what you really had should be obvious. 0.05 gives a 95% CI. In Quattro the lower and upper limits are given by: @BETAINV(0.05/2,2,121-2+1) and @BETAINV(1-0.05/2,2+1,121-2) If Excel is your spreadsheet of choice, replace the @ with an = This procedure will never give a negative lower limit, although if there are no successes in the sample, an error is returned for the lower limit. This should be replaced by zero. The upper limit can still be computed.
  • Defining quantiles
  • =======================Gary McClelland, 12 Mar 1997==========sse From: Gary McClelland <Gary.McClelland@Colorado.edu> Subject: Re: Quartiles age? (y/n) y Message-ID: <33271AE9.6127@Colorado.edu> > In article <9703061918.AA06812@oz.plymouth.edu>, Bob Hayden > <hayden@oz.plymouth.edu> wrote: > > > There are several ways of defining quartiles. > > > > There are even more ways of defining the hinges used in boxplots. > > > > Some books and software call the hinges quartiles. > > For a good article on the many definitions of sample quantiles and which programs use which definitions, see Hyndman, R.J., & Yanan, F. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4),361- Their clear statement of the different definitions helped me a lot. gary -- Gary.McClelland@Colorado.edu http://psych.colorado.edu/~mcclella/ Dept of Psych, CB345, Univ of Colorado, Boulder, CO 80309-0345 USA voice: 303-492-8617 fax: 303-492-5580
  • Wilcoxon Signed Rank comments
  • =======================Kurt Watzka, 21 Apr 1997==========ssc From: watzka@stat.uni-muenchen.de (Kurt Watzka) Subject: Re: Wilcoxon Signed Rank Test Message-ID: <5jgfn5$gd7$1@sparcserver.lrz-muenchen.de> Susan Mangiero <mangiero@SHU.SACREDHEART.EDU> writes: >I need to document (for my dissertation in finance) that the Wilcoxon >Signed Rank Test is robust to small sample size. I am using a sample of >13 pairs of firms in one case and 16 pairs of firms in a second case. If >anyone can help me with a citation, I will be especially grateful. Thanks. The "signed ranks, matched pairs" Test is "robust" in the sense that it does not depend on the assumption that the differences between the matched observations have a normal distribution. However, it is less powerful than a parametric test, and for that reason it is not a good choice for small sample sizes, if the assumptions of the parametric test (i.e. a matched pairs t-test) are met. The assumption behind the "signed ranks, matched pairs" Test is _symmetry_ of the _continous_ distribution of the differences, so you do not assume _much_ less if you use this test. If your differences come from a symmetric distribution that is not normal, and if the sample size is reasonably small, you might consider using Fisher's permutation test. It is basically the same test as the "signed ranks, matched pairs" test, but the distribution of the test statistic can only be computed _given_ the absolute values of the differences, and requires a lot of computation for bigger samples. However, Fisher's permutation test is as powerful as the matched pairs t-test _if_ the assumptions for that test are met, but it does not depend on them.
  • Ranking with most scores tied.
  • =======================Rich Ulrich, 28 May 1997==========spss Subject: Re: Spearman rank correlation Message-ID: <5mi2ng$l0n@usenet.srv.cis.pitt.edu> << Bob Golub>> Robert Golub (Procto@ix.netcom.com) wrote: : Is it OK to use the spearman rank correlation with a limited range of : ordered data for one of the variables? I am correlating interval data : (1-100 range) with ordered subjective data (1-5 range) but most : responses were 3-5. Spearman coefficient was chosen because the : subjective data was not normally distributed. I was criticized for not : correcting the correlation result for the restricted data range which : could deflate the correlation coefficient. Is this true, and if so, is : there any way to adjust the coefficient for a limited data range? AFter you do a Rank-transform, computing the Spearman is done by computing the Pearson. So, do you like what the transformation does to the scores? If there are a whole bunch of ties at one extreme, then the overall effect of Ranking is to convert the item into a contrast that shows Extreme vs Other: If there are 90% scores tied at "5", then your data consist of 10% of scores whose Ranks are between 0-10%; and 90% whose ranks are at 55%. That is approximately like having 10% of scores that are (0,1,2) and 90% of scores at 10 - when you do analyses, the trivial differences between 0-2 are swamped by the difference of 10/other. Look at your transformed scores: DO you like the apparent scaling *after* transformation, the intervals shown between the scale-points, better than the scaling before transformation? - For scales with just 5 points, it is hard for the ranking to be much improvement, unless the original scale was really awkward. * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html