- file stat 97reli.html ->
Reliability cautions (1997)
In this file -
- Misused SAS: a Profile (across items)
is not a proper ICC.
- Pearson is simple for EXAMINING reliability.
Negative IC?
=======================Rich Ulrich, 07 Feb 1997==========ssc
From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: Negative intraclass correlations?
Message-ID: <5dg2la$m4f@usenet.srv.cis.pitt.edu>
Tom Bohman, Ph.D. (tmb@WILDER.ORG) wrote:
: Greetings,
: I'm trying to estimate the degree of agreement between mothers and
: fathers in 172 families which I would like to use in further analyses.
: Both parents responded to 14 items using a 4-point Likert response
: scale. I'm using an intraclass correlation (ICC) (as opposed to a
: Pearson correlation) to measure the agreement across the 14 items for
: each set of parents.
-- Do I read this right: You are computing something like a
correlation (or like an ICC) across 14 items for each 2 parents?
If so, I do not think that it is fair to call this any kind of
'correlation', but rather, it is a sort of profile coefficient, which
is computed like a correlation - except the role of 'subjects' is not
random, but is replaced by the fixed set of 14 items with arbitrarily
different means.
: I've downloaded the SAS macro developed by Robert
: Hamer that implements the 6 types of ICCs identified by Shrout and
: Fleiss in their Psychological Bulletin article. The SAS macro uses Proc
: Glm to derive the variance components used to compute the particular
: ICCs.
: About 15 of the 172 ICCs are negative which I would like to understand
: better since it doesn't seem possible to have a negative variance. In
: fact, one of the negative ICCs is a value over 1 (-1.42).
-- Since your data are laid out differently (I am pretty sure) than
SAS ever expected to see for ICC data, and you have an ICC of -1.42,
you are *almost* assuredly applying the formulas wrong.
If you post the data that led to a negative corr., you will probably
be given a corrected test.
*--------
Why I like Pearson
=======================Rich Ulrich, 05 Mar 1997==========spss
Subject: Re: interrater reliability
Message-ID: <5fkpov$5r@usenet.srv.cis.pitt.edu>
David A. Rowe (darowe@FRANK.MTSU.EDU) wrote:
: On Mon, 3 Mar 1997, Michael Lacy wrote:
: (from a previous message)
: > >Hi. I have SPSS 5.0.2 for Windows. I have a number of 7-point Likert
: scales.
: > >There are two raters for all the measures. The raters tend to hover around
: > I can't get a decent correlation with paired T's, Pearson or Kendall's.
: (Michael's response)
: > Pearson's r, for example,
: > will tell you the extent to which one set of ratings are a linear function
: > of the other, rather than identical to the other. Kappa is available
DAR:
: - in fact, for many, if not most, reliability situations, the Pearson
: coefficient is not appropriate anyway. As an interclass coefficient, it
: should not be used with two measures of the same variable (rather, it is
: appropriate for estimating correlations between two different variables).
Given ordinal variables, you certainly should *not* use the kappa
which someone else suggested. Kappa ignores order. And kappas for
tables bigger than 2x2 are not readily compared to any other size, so
kappa is mainly for 2x2 comparisons.
The Pearson coefficient is just FINE for looking at reliability between
two scores with *equal means and variations*: then, it will exactly
EQUAL an intraclass (consider, intrAclass vs intERclass) correlation.
There is a tradition or habit of recommending intraclass correlations,
but the assumption behind doing Intraclass seems (to me) to be that one
is ignoring the differences that might exist between means or variances.
Since I insist on looking at those, separately, I like to look at
similarities using the same Pearson coefficient that everyone is
used to from other contexts.
The advantages of the Pearson over an Intraclass correlation, in
fact, are several: You can look at it immediately as output of many
programs; and there is only one version of it; and what you have is
essentially orthogonal to the difference between raters that you should
detect with the paired t-test. By contrast, you usually have to do
some special computation to get your intraclass correlation, and there
are three or four different formulas (which, by the way, are easy to
mess up), a couple of varieties being grossly different; and what you
get is NOT independent (statistically) of the difference between scores.
DAR:
: One of the family of intraclass coefficients should be used (I'm talking
: generally, not as a solution to the problem posed in the original posting)
The minor differences in the Family are whether you are assuming
THESE raters, or 'random' raters. The big difference is whether you
get an r that represents a SINGLE score, or the combined, average
score for multiple raters.
*--------FAQ note, March 1998 - for examining your data, I
still recommend using Pearson correlations, as many as needed,
between pairs of raters, pairs of diagnoses, etc.; and paired
t-tests (or McNemar's test) should always be used with them, to
check for differences in "level". For the limited purpose of
summarizing for publication, after you are sure that there are
no problems, then the Intraclass Correlation may be used as a
combined indicator of what you have achieved.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html