Computing inter-rater (or inter-method) reliability with panel data?

Tyler Wray

Join Date: Apr 2015

Posts: 19
#1

Computing inter-rater (or inter-method) reliability with panel data?

23 Jul 2015, 16:56

Hi, everybody! Can't find a strong answer to this elsewhere, so I'm hoping you all might be able to point me in the right direction.
I want to compute interrater agreement (actually, in this case, inter-method agreement) for the following scenario: The same person provides nominal ratings of whether or not an event occurred each day for 30 days, and I have 10 people doing this. What's the best/appropriate approach for calculating overall agreement (e.g., kap, kappa, or even icc? Something else)? Data looks like this, where cols 3 and 4 are the ratings I want to compare (but of course, I can reshape):

+-------------------------------------+
| id date mnsexo~d tlfbse~d |
|-------------------------------------|
1. | 1 03/12/14 0 0 |
2. | 1 03/12/14 0 0 |
3. | 1 03/12/14 0 0 |
4. | 1 03/12/14 0 0 |
5. | 1 03/13/14 0 1 |
+-------------------------------------+

+-------------------------------------+
| id date mnsexo~d tlfbse~d |
|-------------------------------------|
89. | 2 03/19/14 0 0 |
90. | 2 03/19/14 0 0 |
91. | 2 03/19/14 0 0 |
92. | 2 03/19/14 0 0 |
93. | 2 03/22/14 1 0 |
+-------------------------------------+
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

23 Jul 2015, 19:30

You can get the intraclass correlation coefficient with xtprobit. See the example below, starting at the "Begin here" comment. (pid is the participant ID, manifest is the binomial outcome variable).

.ÿversionÿ14.0

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsetÿseedÿ`=date("2015-07-24",ÿ"YMD")'

.ÿquietlyÿsetÿobsÿ10

.ÿ
.ÿgenerateÿbyteÿpidÿ=ÿ_n

.ÿgenerateÿpid_uÿ=ÿrnormal()

.ÿforvaluesÿdayÿ=ÿ1/30ÿ{
ÿÿ2.ÿÿÿÿÿgenerateÿdoubleÿlatent`day'ÿ=ÿpid_uÿ+ÿrnormal()
ÿÿ3.ÿ}

.ÿquietlyÿreshapeÿlongÿlatent,ÿi(pid)ÿj(day)

.ÿgenerateÿbyteÿmanifestÿ=ÿlatentÿ>ÿ0

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿquietlyÿxtprobitÿmanifestÿi.day,ÿi(pid)ÿre

.ÿdisplayÿinÿsmclÿasÿtextÿ"ICCÿ=ÿ"ÿasÿresultÿ%04.2fÿe(rho)
ICCÿ=ÿ0.59

.ÿ
.ÿexit

endÿofÿdo-file

.
Comment
Tyler Wray

Join Date: Apr 2015

Posts: 19
#3

23 Jul 2015, 21:12

Ah, thanks for your help! Since I ultimately want to compare across two methods (or "raters," as in compare the mnsex* and tlfbs* vars from my post above), would I just reshape so they're one variable and add the dummy for "method" as another factor variable? Using your example:

. quietly xtprobit manifest i.method i.day, i(pid) re
. display in smcl as text "ICC = " as result of %04.2f e(rho)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

23 Jul 2015, 23:00

I'm guessing that you'd like to consider your raters as a representative sample of raters in the wild, and want to be able to generalize beyond your sample of ten. If that's true, then you'll need to fit a cross-classified random effects model (method × rater) using, for example, meprobit, manually extract the variance component for method, and then divide it by the sum of the variance components.

I don't have the volumes in front of me at the moment, but I recall that Sophia Rabe-Hesketh and Anders Skrondal show an example of this in detail with a linear mixed model in their Multilevel and Longitudinal Modeling Using Stata, Third Edition. The principle is the same with a generalized linear mixed model, although with such models you could be more liable to problems with convergence when you have few levels of a random effect, such as with your two methods. If that happens, then you could "regularize" by means of a judicious choice of prior distribution on its variance component in a bayesmh approach.
Comment
Tyler Wray

Join Date: Apr 2015

Posts: 19
#5

24 Jul 2015, 08:16

Thanks again for your response! Having trouble finding this example in the Rabe-Hesketh & Skrondal book, so if you have time to add some specifics at some point, it would be immensely appreciated.

Yes, it'd be ideal to consider the raters as representative, but I understand the limitations of this since it's only 10 raters. Any ideas about what an alternative method would be if I didn't care about considering them as representative?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#6

26 Jul 2015, 04:41

In the first volume of the third edition, it's discussed twice, first on pp. 435–441 (the detailed step-by-step, how-to Stata code is on pp. 440–441), and then again on pp. 448–449. I found these before, and again this time, by looking up "intraclass correlation" in the book's index.

Absent ignoring them altogether (that is, leave rater out of the model), the alternative to treating the raters as random is to treat them as fixed. With 60 observations (two methods × 30 days each) for each of the 10 raters, it's possible that you would avoid the so-called incidental variables problem, but you might want to run a simulation just to be sure.
Comment
Tyler Wray

Join Date: Apr 2015

Posts: 19
#7

27 Jul 2015, 13:32

Wonderful. Thank you so much for your help, Joseph!
Comment

Announcement

Computing inter-rater (or inter-method) reliability with panel data?

Comment

Comment

Comment

Comment

Comment

Comment