Calculating Kappa with survey weighted data

Nirali Chakraborty

Join Date: Sep 2016

Posts: 6
#1

Calculating Kappa with survey weighted data

07 Sep 2016, 15:17

Greetings.
I have 2 questions regarding calculation of kappa in Stata 13.

I am trying to calculate inter-rater reliability using cohen's Kappa statistic. Each of the two variables has a 'score' ranging from 1-5.
The underlying data is survey data. Specifically, it is from a Demographic and Health Survey, and includes sampling weights. For other analyses of this data, I am using svy commands, or aweight.

1. Is there a way to calculate kappa while acknowledging that the underlying data is weighted - in other words, each observation is not necessarily representative of 1 person. It seems to me that those observations with a higher sampling weight should influence the kappa statistic more than those with a lower sampling weight.

2. I want to calculate the confidence interval of the kappa statistic. Since the data is from a population based survey with clustered responses, I chose to bootstrap the calculation, drawing 20 lots of 200 observations with replacement. Which of these two commands is correct? Alternatively, what is the difference in the two commands?

bootstrap r(kappa), reps(20) size(200): kap a b

bootstrap r(kappa), reps(20) size(200) cluster(psu): kap a b

Thank you.
Tags: None
Peta Hitchens

Join Date: Sep 2016

Posts: 13
#2

08 Sep 2016, 06:21

Some example data would be good so we can compare the output from both your proposed codes.
If each obs is not representative of 1 person, does that mean there are multiple obs for some people, hence the need to cluster on person (psu?).
If this is the case, though without being able to run it, I would say the second code adjusting for clustering would be better.
Comment

Nirali Chakraborty

Join Date: Sep 2016
Posts: 6

08 Sep 2016, 14:57

Thank you. Let me try and provide some output data which illustrates the question. If this is not enough, I can see if it is possible to provide the raw data.

The code below results in 2 tables. In the first, we have the frequencies used to calculate the kappa statistic. In the second, we have the 'true' frequencies, as the data has weights associated with it. I am looking for a way to calculate the kappa statistic and confidence interval using the data from the second table.

After this, there is the bootstrap command to calculate the confidence interval of the unweighted kappa, and a user defined command "kapci" which should do the same. The results are different.

Code:

tab hv270 TestedQuintile, cell nolab
tab hv270 TestedQuintile [aw=weight], cell nolab

kap hv270 TestedQuintile
bootstrap r(kappa), reps(20) size(200): kap hv270 TestedQuintile
kapci hv270 TestedQuintile, reps(20) size(200) seed(123455) estim(bsall)

Code:

    wealth |                     TestedQuintile
     index |         1          2          3          4          5 |     Total
-----------+-------------------------------------------------------+----------
         1 |     1,032        447         45          2          0 |     1,526 
           |     19.31       8.36       0.84       0.04       0.00 |     28.55 
-----------+-------------------------------------------------------+----------
         2 |       101        544        350         63          0 |     1,058 
           |      1.89      10.18       6.55       1.18       0.00 |     19.79 
-----------+-------------------------------------------------------+----------
         3 |         5        128        376        341          3 |       853 
           |      0.09       2.39       7.03       6.38       0.06 |     15.96 
-----------+-------------------------------------------------------+----------
         4 |         2          6         84        548        216 |       856 
           |      0.04       0.11       1.57      10.25       4.04 |     16.01 
-----------+-------------------------------------------------------+----------
         5 |         0          0          2         82        968 |     1,052 
           |      0.00       0.00       0.04       1.53      18.11 |     19.68 
-----------+-------------------------------------------------------+----------
     Total |     1,140      1,125        857      1,036      1,187 |     5,345 
           |     21.33      21.05      16.03      19.38      22.21 |    100.00 [/PHP][/PHP]

. tab hv270 TestedQuintile [aw=weight], cell nolab

+-----------------+
| Key             |
|-----------------|
|    frequency    |
| cell percentage |
+-----------------+

    wealth |                     TestedQuintile
     index |         1          2          3          4          5 |     Total
-----------+-------------------------------------------------------+----------
         1 | 630.03864  380.16759 57.7909885   1.728203          0 | 1,069.725 
           |     11.79       7.11       1.08       0.03       0.00 |     20.01 
-----------+-------------------------------------------------------+----------
         2 | 95.960198  513.00334   386.4086  72.821518          0 | 1,068.194 
           |      1.80       9.60       7.23       1.36       0.00 |     19.98 
-----------+-------------------------------------------------------+----------
         3 | 4.1849994  130.89715   497.6977  433.80299  2.9182943 | 1,069.501 
           |      0.08       2.45       9.31       8.12       0.05 |     20.01 
-----------+-------------------------------------------------------+----------
         4 |1.78537137  8.1804627  109.89313   703.8849  245.00869 | 1,068.753 
           |      0.03       0.15       2.06      13.17       4.58 |     20.00 
-----------+-------------------------------------------------------+----------
         5 |         0          0 7.17159787  88.903221  972.75241 | 1,068.827 
           |      0.00       0.00       0.13       1.66      18.20 |     20.00 
-----------+-------------------------------------------------------+----------
     Total |  731.9692  1,032.249  1,058.962  1,301.141  1,220.679 |     5,345 
           |     13.69      19.31      19.81      24.34      22.84 |    100.00 

. kap hv270 TestedQuintile

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
  64.88%      20.29%     0.5594     0.0068      81.71      0.0000

. bootstrap r(kappa), reps(20) size(200): kap hv270 TestedQuintile
(running kap on estimation sample)

Warning:  Because kap is not an estimation command or does not set e(sample), bootstrap has no way to determine which
          observations are used in calculating the statistics and so assumes that all observations are used.  This
          means that no observations will be excluded from the resampling because of missing values or other reasons.

          If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded.
          Be sure that the dataset in memory contains only the relevant data.

Bootstrap replications (20)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
....................

Bootstrap results                               Number of obs      =      5345
                                                Replications       =        20

      command:  kap hv270 TestedQuintile
        _bs_1:  r(kappa)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .5594453   .0401499    13.93   0.000     .4807528    .6381377
------------------------------------------------------------------------------

. kapci hv270 TestedQuintile, reps(20) size(200) seed(123455) estim(bsall)

                                 B=20    N=5345
------------------------------------------------
 Kappa (95% CI) = 0.559 (0.470 - 0.635)    (BC)
                        (0.470 - 0.635)    (P)
                        (0.452 - 0.666)    (N)
------------------------------------------------
 BC = bias corrected, P = percentile, N = normal

If I calculate kappa by hand from the weighted data, k=0.5258 (95%CI 0.5095 - 0.5421) (http://vassarstats.net/kappa.html).

Any suggestions to help understand these differences, and to be able to use Stata to calculate a kappa statistic from the survey weighted data, is much appreciated. It pains me to say that this appears possible in SPSS...

Thank you.

Comment

Nirali Chakraborty

Join Date: Sep 2016

Posts: 6
#4

14 Sep 2016, 09:40

After some further searching, I have learned that the calculations I want to do are possible in SPSS, SAS and R (svykappa). Any chance for some help in calculating kappa from survey weighted data in Stata?
Thanks!
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4438
#5

14 Sep 2016, 11:15

have you tried tech support? if you do get an answer from them, please post it here in case someone else later wants to do the same thing
Comment

Announcement