Inter rater reliability with missing observations

Olivia Helen

Join Date: Jan 2019

Posts: 7
#1

Inter rater reliability with missing observations

16 Jan 2019, 19:34

Hi everyone,

I am trying to calculate inter rater reliability for my data but am struggling due to missing data.

In this dummy data (very similar to my own data but a smaller sample) I have 9 raters (1-9), who have scored (score) 4 Vignettes (1-4) out of 100. The 9 raters are constant throughout, however not all raters completed the questionnaire, meaning some vignettes have only been rated by 7 or 8 raters. My data is currently in long format

e.g.
ID Vignette Score
1 1 8
1 2 32
1 3 8
1 4 65
2 1 16
2 2 16

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float ID byte Vignette float score 1 1 8 1 2 32 1 3 8 1 4 65 2 1 16 2 2 16 2 3 6 2 4 50 3 1 14 3 2 14 3 3 14 3 4 32 4 1 8 4 2 8 4 3 16 4 4 32 5 1 0 5 2 0 5 3 32 5 4 . 6 1 14 6 2 32 6 3 14 6 4 16 7 1 0 7 2 16 7 3 8 7 4 60 8 1 16 8 2 14 8 3 0 8 4 65 9 1 8 9 2 0 9 3 . 9 4 . end

(Apologies if this is not the correct way to post my data, please let me know!)

My initial thought before encountering the missing data was to approach this by calculating the icc using a two-way random effects model, however stata excludes two vignettes due to the missing values. In my real dataset I have more vignettes and in some cases majority are excluded due to missing data.

What is the best way to calculate inter-rater reliability for this data, taking into consideration the missing values?

Thank you so much,
Olivia
Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3850

17 Jan 2019, 03:43

Note that Stata's icc command requires a balanced design. It will not only delete cases (observations) with missing values; it will also omit all vignettes that have not been rated by each of the 9 raters.

kappaetc (SSC or SJ; at this point the two are identical) estimates inter-rater reliability for unbalanced designs and in the presence of missing values. With the data above

Code:

reshape wide score , i(Vignette) j(ID)
kappaetc score* , icc(random)

yields

Code:

(output omitted)

. kappaetc score* , icc(random)

Interrater reliability                           Number of subjects =       4
Two-way random-effects model               Ratings per subject: min =       7
                                                                avg =    8.25
                                                                max =       9
------------------------------------------------------------------------------
               |   Coef.     F     df1     df2      P>F   [95% Conf. Interval]
---------------+--------------------------------------------------------------
      ICC(2,1) |  0.6088  11.08     3.00   21.00   0.000    0.2105     0.9525
---------------+--------------------------------------------------------------
       sigma_s | 15.4766
       sigma_r |  0.0000 (replaced)
       sigma_e | 12.4072
------------------------------------------------------------------------------
Note: F test and confidence intervals are based on methods for complete data.

The methods and formulas that kappaetc implements are discussed in Gwet 2014, Ch. 7-10.

Best
Daniel

Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.

Last edited by daniel klein; 17 Jan 2019, 03:47.

Comment

Olivia Helen

Join Date: Jan 2019

Posts: 7
#3

17 Jan 2019, 17:52

Hi Daniel,

Thank you for that, kappaetc is very useful.

I have one follow up question:

Code:
reshape wide score , i(Vignette) j(ID)
kappaetc score* , icc(random)

I have added the code above into my .do file, however when I run the file twice the ICC returned is slightly different?

For example
1st run of do file ICC= 0.6039
clear

2nd run of do file ICC= 0.5896
clear

3rd run of do file ICC= 0.6201

I did not change anything in my do file between runs.

Can you think of a reason for this?

Thanks,
Olivia
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#4

18 Jan 2019, 01:26

Olivia

Thanks for reporting back. I cannot reproduce this problem with the example dataset. Do you have the latest version of kappaetc installed? The latest version is

Code:

. which kappaetc ... *! version 2.0.0 28jun2018 daniel klein

Also, do you have repeated measures for vignettes and raters? That is, does the same rater score the same vignette repeatedly? If so, you should indicate the vignette identifier in the i() option.

Best
Daniel
Comment
Olivia Helen

Join Date: Jan 2019

Posts: 7
#5

22 Jan 2019, 16:47

Hi Daniel,

Thank you for your reply and apologies for my delayed response.

After looking over my do file it seems the reason for this was further up in my code, when I was collapsing variables. Once I have fixed this I expect the ICC to be constant.

Thanks again,
Olivia
Comment

Announcement