Inter-rater Reliability- help!

Nick Leader

Join Date: Dec 2016

Posts: 9
#1

Inter-rater Reliability- help!

21 Dec 2016, 10:19

Hi all,

I am relatively new to both Stata and statistics in general. I am working on a research project investigating the inter-rater reliability between 3 different pathologists. So there are 3 raters per patient, which can give up to 15 different diagnoses. The data is set up so each of the 3 column heads is a different rater, with their diagnoses listed under it. Each row is represented by a different patient. My questions are:

1. How would I go about doing this? I have been using the command "kap rater1 rater2 rater3" - is this correct or should I be using a different command for Fleiss kappa since there are greater than 2 raters?

2. I have read a lot about weighted kappa, but since I am only looking at diagnoses (which are represented by a number), I don't need to use this right? To be more clear, someone either gets a diagnosis of lets say 1, while another pathologists gives them a diagnosis of 9. There is no scale of difference between the diagnoses- they either agree or don't.

3. for confidence interval, can I just use "kapci rater1 rater2 rater3"?

Thank you all for your help!

Last edited by Nick Leader; 21 Dec 2016, 10:31.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

21 Dec 2016, 11:45

1. -kap rater1 rater2 rater3- is correct for this problem.

2. You are probably correct. It depends a bit on what the domain of diagnoses involved is. Conceivably one could treat, say, CIN3 and CIN4 as a partial agreement, closer than, say, CIN3 vs normalt. But unless a bunch of your diagnoses fall into scales like that, it would be best to characterize pairs of diagnoses as either agreeing or not. So probably no reason to use weights hee.

3. I'm not familiar with -kapci- so I can't help you with that. It's a user written program from the Stata Journal. (The FAQ does request that you identify non-official Stata commands as such in your posts and indicate where they come from.) Its help file suggests that it will be suitable for your purpose, but I have no experience with it.
1 like
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#3

21 Dec 2016, 11:49

Thank you so much, I appreciate it!!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#4

21 Dec 2016, 11:58

For confidence intervals (and also more than one specific agreement coefficient) you might be interested in kappaetc (from SSC) released just yesterday.

Best
Daniel
2 likes
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#5

21 Dec 2016, 12:32

Thank you Daniel, I will definitely download that
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#6

21 Dec 2016, 13:23

kappaetc doesn't work for me since I have an older version of the program . Any other suggestions to figure out CI using kappa? As I run "kapci" it gives me slightly different confidence intervals every time. Upon further reading, I think this is do to the use of "bootstraps" (?) which I am unfamiliar with. This is the response Stata gives me every time I run "kapci rater1 rater2 rater3"- Note: default number of bootstrap replications has been set to 5 for syntax testing only.reps() needs to be increased when analysing real data.

How/why do I go about increasing the number of "reps"?

I obtained the code "kapci" by downloading it after discovering it on this site: http://www.stata-journal.com/sjpdf.h...iclenum=st0076

Thanks in advance!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

21 Dec 2016, 13:54

To specify the number of reps, just add it to the command as an option:

Code:

kapci rater1 rater2 rater3, reps(#)

where you replace # by the number of reps you want. The larger the number you choose, the more stable the result will be, but it will take longer to run. I'm not really sure what the sampling distribution of a kappa statistic looks like, but since it must be bounded, it is hard to imagine that anything more than 1000 would be needed to get a practical level of precision And 1000 kappa statistic calculations won't take very long at all. So I'd probably do reps(1000). Also, to make the calculations reproducible, you need to set the seed of the random number generator. If you already do that somewhere in your do-file before you get to -kapci-, then you don't need to do it again. If not, then I notice from the help file that you can do it with -kapci-'s -seed()- option. It doesn't matter what integer number you pick for the seed: different values will give you slightly different results, but you will get the same results each time you re-run your do-file with the same specification of -seed()-. So

Code:

kapci rater1 rater2 rater3, reps(1000) seed(1234)

will probably serve you well.

As for learning about bootstrapping, the manual section* on the -bootstrap- command (in [R]), particularly the subsection called Introduction, has a pretty straightforward explanation of what it is and how it works. You may or may not want to pursue it in greater depth than that: if so, the references cited there are quite comprehensive.

*I'm looking at the manual section for Stata version 14. Since you are using an earlier version, the documentation may differ. I don't have an older version available now, and, in any case, you didn't actually say which older version you're using.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#8

21 Dec 2016, 14:06

Originally posted by Nick Leader View Post

kappaetc doesn't work for me since I have an older version of the program .

This means older than version 11.2, which is quite old already, and the minimum required for kappaetc (although I might get it to run with Stata 10.1). Note that stating the version of Stata that you use (if not the latest) is requested in the FAQs and is indeed very helpful for understanding your problems and giving useful advice.

Best
Daniel
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#9

21 Dec 2016, 14:12

Awesome explanation, thank you!!
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#10

21 Dec 2016, 14:13

Sorry I never posted the version. I am unfortunately stuck with Stata version 8
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#11

22 Dec 2016, 07:49

One last question...

How would I go about finding the percent agreement between the three raters with the same setup as described above? I can not seem to find this process anywhere other than through the use of apps on third-party sites, but I need to be able to perform the analysis in Stata. Thanks again in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

22 Dec 2016, 09:23

Well, you have three raters. So there are three different proportions (or percentages) of rater agreement: 1 vs 2, 1 vs 3, and 2 vs 3. You'll have to do those separately. When you run -kap- with just two raters, the agreement is part of the output (and is also stored in r(prop_o) if you need to store it in a macro).

Added: At least this is true in current Stata. I don't remember much back to version 8.2. But looking at the code in kappa.ado, it looks like it was written for version 6, so I assume this behavior is still the same.
1 like
Comment
Nick Leader

Join Date: Dec 2016

Posts: 9
#13

22 Dec 2016, 09:26

Is there a way to create a "dummy" variable that's equal to 1 (for example) if all 3 agree with each other? Wouldn't that allow "tab dummyvariable" to give the percent agreement?

I guess my main question is, is there a way to get a percentage on if ALL 3 agree (vs not) out of the total number of patients (entries)?

Last edited by Nick Leader; 22 Dec 2016, 09:34.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

22 Dec 2016, 09:28

Sure. That's easy.

Code:

gen byte all_agree = (rater1 == rater2) & (rater2 == rater3)

That won't give you proportions of agreement just among rater1 and rater2, for example. But if you only want to know how often all three agree, that's all there is to it.
1 like
Comment

daniel klein

Join Date: Mar 2014
Posts: 3850

#15

22 Dec 2016, 09:39

Here is a code snippet to get the percent agreement that uses tuples (from SSC), which is supposed to work with Stata 8.

Code:

/// example dataset
webuse p615b , clear
keep rater1-rater3

// get percent agreement
tempname prop_o
scalar `prop_o' = 0
tuples 1 2 3 , min(2) max(2)
forvalues j = 1/`ntuples' {
    tokenize `tuple`j''
    kap rater`1' rater`2'
    scalar `prop_o' = `prop_o' + r(prop_o)
}
display `prop_o'/`ntuples'

Note that what you seem to be looking for is not the equivalent to percent agreement.

Best
Daniel

Announcement

Inter-rater Reliability- help!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment