"insufficient number of targets" for ICC

Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#1

"insufficient number of targets" for ICC

04 Apr 2018, 02:55

Dear Statlisters,
I have a problem with calculating a basic ICC as an expression of the criterion validity for measurements on continuous scala.
I am using the command;

. icc depvar var

The dependent variable is step counts from step count sensors and the gold standard 'var' is actual steps taken.

I have 84 measurements and STATA tells me that
"(40 targets omitted from computation because of unbalanced data)"

And:
"insufficient observations
You have requested some statistical calculation, and while there
are some observations, the number is not sufficient to carry out
your request."

Anybody know what to do here?

I thank you all in advance.

Rasmus

Last edited by Rasmus Tolstrup; 04 Apr 2018, 02:59.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

04 Apr 2018, 03:57

Please show example data (using dataex). You need data in long form for icc to work.

Best
Daniel
Comment

Rasmus Tolstrup

Join Date: Sep 2016
Posts: 31

04 Apr 2018, 04:28

Hi Daniel,
Like this?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int V31 double Truestep
450   486
457   489
551   533
682   706
694   677
665   635
 55   482
590   635
554   582
512   517
672   624
612   555
 94   532
652   643
493   520
612   612
671   656
599   591
523   589
621   619
593   586
650   639
346   427
558 560.5
  0   424
620   638
657   645
 50   229
498   490
579   592
664   655
562   584
540   529
477   570
551   661
472   518
602   569
626   632
125   397
185   451
369   449
694   680
462   486
483   489
527   533
710   706
690   677
623   635
388   482
595   635
561   582
508   517
624   624
561   555
481   532
649   643
502   520
638   612
665   656
597   591
565   589
611   619
592   586
623   639
351   427
584 560.5
414   424
640   638
647   645
  0   229
485   490
427   592
653   655
586   584
541   529
521   570
508   661
488   518
584   569
632   632
367   397
345   451
460   449
706   680
end

------------------ copy up to and including the previous line ------------------

Listed 84 out of 84 observations

Comment

daniel klein

Join Date: Mar 2014
Posts: 3850

04 Apr 2018, 04:45

This is what I had in mind.

Conceptually, are you sure you want ICC(1)? This assumes that the subjects (84 trails in your case) are a random sample (which is probably true), and allows a different set of raters for each subject (which does not seem to be the case here). Perhaps ICC(3) is better suited; it assumes random subjects but fixed raters.

Anyway, allow me to slip in some advertisement for kappaetc (SSC) then show you how to do the same with Stata's icc (omitting code for inputting the data, given above).

Code:

// rename the variables
rename (V31 Truestep) (rating#) , addnumber

// use -kappaetc- (SSC) with data in wide form
// estimate ICC 1 (subjects random, different raters per subject)
kappaetc rating1 rating2 , icc(oneway)

// estimate ICC 3 (subjects random, raters fixed)
kappaetc rating1 rating2 , icc(mixed)

// could estimate weighted kappa statistics
kappaetc rating1 rating2 , wgt(quadratic)

// now replicate the above with Stata's -icc- command

// get data in long form
generate target = _n
reshape long rating , i(target) j(judge) // judge is the device/truth

// estimate ICC 1 (subjects random, different raters per subject)
icc rating target

// estimate ICC 3 (subjects random, raters fixed)
icc rating target judge , mixed

The results

Code:

[output omitted]
. // rename the variables
. rename (V31 Truestep) (rating#) , addnumber

. 
. // use -kappaetc- (SSC) with data in wide form
. // estimate ICC 1 (subjects random, different raters per subject)
. kappaetc rating1 rating2 , icc(oneway)

Interrater reliability                           Number of subjects =      84
One-way random-effects model                    Ratings per subject =       2
------------------------------------------------------------------------------
               |   Coef.     F     df1     df2      P>F   [95% Conf. Interval]
---------------+--------------------------------------------------------------
      ICC(1,1) |  0.6947   5.55    83.00   84.00   0.000    0.5657     0.7907
---------------+--------------------------------------------------------------
       sigma_s |112.2756
       sigma_e | 74.4224
------------------------------------------------------------------------------

. 
. // estimate ICC 3 (subjects random, raters fixed)
. kappaetc rating1 rating2 , icc(mixed)

Interrater reliability                           Number of subjects =      84
Two-way mixed-effects model                     Ratings per subject =       2
------------------------------------------------------------------------------
               |   Coef.     F     df1     df2      P>F   [95% Conf. Interval]
---------------+--------------------------------------------------------------
      ICC(3,1) |  0.7322   6.47    83.00   83.00   0.000    0.6150     0.8178
---------------+--------------------------------------------------------------
       sigma_s |114.0098
       sigma_e | 68.9478
------------------------------------------------------------------------------

. 
. // could estimate weighted kappa statistics
. kappaetc rating1 rating2 , wgt(quadratic)

Interrater agreement                             Number of subjects =      84
(weighted  analysis)                            Ratings per subject =       2
                                        Number of rating categories =     107
------------------------------------------------------------------------------
                     |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement |  0.9780    0.0078 125.14   0.000     0.9625     0.9936
Brennan and Prediger |  0.7293    0.0963   7.57   0.000     0.5377     0.9208
Cohen/Conger's Kappa |  0.6987    0.0710   9.84   0.000     0.5574     0.8400
    Scott/Fleiss' Pi |  0.6916    0.0748   9.25   0.000     0.5429     0.8404
           Gwet's AC |  0.7355    0.0943   7.80   0.000     0.5478     0.9231
Krippendorff's Alpha |  0.6935    0.0748   9.27   0.000     0.5447     0.8422
------------------------------------------------------------------------------

. 
. // now replicate the above with Stata's -icc- command
. 
. // get data in long form
. generate target = _n

. reshape long rating , i(target) j(judge) // judge is the device/truth
(note: j = 1 2)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       84   ->     168
Number of variables                   3   ->       3
j variable (2 values)                     ->   judge
xij variables:
                        rating1 rating2   ->   rating
-----------------------------------------------------------------------------

. 
. // estimate ICC 1 (subjects random, different raters per subject)
. icc rating target

Intraclass correlations
One-way random-effects model
Absolute agreement

Random effects: target           Number of targets =        84
                                 Number of raters  =         2

--------------------------------------------------------------
                rating |        ICC       [95% Conf. Interval]
-----------------------+--------------------------------------
            Individual |   .6947455       .5657488    .7906549
               Average |   .8198818       .7226559    .8830902
--------------------------------------------------------------
F test that
  ICC=0.00: F(83.0, 84.0) = 5.55              Prob > F = 0.000

Note: ICCs estimate correlations between individual measurements
      and between average measurements made on the same target.

. 
. // estimate ICC 3 (subjects random, raters fixed)
. icc rating target judge , mixed

Intraclass correlations
Two-way mixed-effects model
Consistency of agreement

Random effects: target           Number of targets =        84
 Fixed effects: judge            Number of raters  =         2

--------------------------------------------------------------
                rating |        ICC       [95% Conf. Interval]
-----------------------+--------------------------------------
            Individual |    .732211       .6149596    .8177858
               Average |   .8454062        .761579    .8997604
--------------------------------------------------------------
F test that
  ICC=0.00: F(83.0, 83.0) = 6.47              Prob > F = 0.000

Note: ICCs estimate correlations between individual measurements
      and between average measurements made on the same target.

. 
end of do-file

Best
Daniel

Comment

Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#5

04 Apr 2018, 05:12

Hi Daniel,
Thank you so much.
Do I have to rename and reshape my data?

I have replicated your results by using these 2 commands:

. ssc install kappaetc

. kappaetc V31 Truestep , icc(mixed)

Interrater reliability Number of subjects = 84
Two-way mixed-effects model Ratings per subject = 2
------------------------------------------------------------------------------
| Coef. F df1 df2 P>F [95% Conf. Interval]
---------------+--------------------------------------------------------------
ICC(3,1) | 0.7322 6.47 83.00 83.00 0.000 0.6150 0.8178
---------------+--------------------------------------------------------------
sigma_s |114.0098
sigma_e | 68.9478
------------------------------------------------------------------------------

You are probably right about ICC3,1 as the subjects are a random sample and the "gold standard"/Truestep is an average of two fixed raters' visual count of the steps.

I think I got it right?

Best and thanks again!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

04 Apr 2018, 05:45

Do I have to rename and reshape my data?

The rename is necessary for reshape to work, reshape in turn is necessary for icc to work. Conversely, if you do not rely on icc and stick with kappaetc, there is no need to rename or reshape, as you can see from your results that match those reported earlier.

[...] the "gold standard"/Truestep is an average of two fixed raters' visual count of the steps.

Perhaps one could argue that you want the average icc (over two raters) then, but I do not think so; for one thing only one of the measures is actually an average and additionally happens to be taken as "truth". I think ICC(3,1) is a good choice here.

Edit:

It might be worth looking at the agreement between the two visual inspections, given their standing as "truth", but this is up to you.

Best
Daniel

Last edited by daniel klein; 04 Apr 2018, 05:50.
Comment

Rasmus Tolstrup

Join Date: Sep 2016
Posts: 31

04 Apr 2018, 07:21

Originally posted by daniel klein View Post

The rename is necessary for reshape to work, reshape in turn is necessary for icc to work. Conversely, if you do not rely on icc and stick with kappaetc, there is no need to rename or reshape, as you can see from your results that match those reported earlier.

Perhaps one could argue that you want the average icc (over two raters) then, but I do not think so; for one thing only one of the measures is actually an average and additionally happens to be taken as "truth". I think ICC(3,1) is a good choice here.

Edit:

It might be worth looking at the agreement between the two visual inspections, given their standing as "truth", but this is up to you.

Best
Daniel

Hi again,
This is my way to calculate the "truestep"

gen truestep = (Talteskridt1+Talteskridt2)*0.5

So I assume that the true number of steps taken is the average of the visually observed number from the two testers. But I would like to calculate the ICC for the interrater reliability as you say. Can I use the same formula (ICC 3) or do I need a different one?

Thank you again!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(Talteskridt1 Talteskridt2) double Truestep int V31
484 488   486 450
486 492   489 457
532 534   533 551
704 708   706 682
682 672   677 694
632 638   635 665
478 486   482  55
638 632   635 590
592 572   582 554
526 508   517 512
616 632   624 672
556 554   555 612
548 516   532  94
640 646   643 652
520 520   520 493
614 610   612 612
650 662   656 671
588 594   591 599
590 588   589 523
622 616   619 621
594 578   586 593
644 634   639 650
424 430   427 346
554 567 560.5 558
412 436   424   0
640 636   638 620
650 640   645 657
232 226   229  50
488 492   490 498
574 610   592 579
638 672   655 664
584 584   584 562
534 524   529 540
576 564   570 477
666 656   661 551
516 520   518 472
584 554   569 602
632 632   632 626
386 408   397 125
446 456   451 185
442 456   449 369
686 674   680 694
484 488   486 462
486 492   489 483
532 534   533 527
704 708   706 710
682 672   677 690
632 638   635 623
478 486   482 388
638 632   635 595
592 572   582 561
526 508   517 508
616 632   624 624
556 554   555 561
548 516   532 481
640 646   643 649
520 520   520 502
614 610   612 638
650 662   656 665
588 594   591 597
590 588   589 565
622 616   619 611
594 578   586 592
644 634   639 623
424 430   427 351
554 567 560.5 584
412 436   424 414
640 636   638 640
650 640   645 647
232 226   229   0
488 492   490 485
574 610   592 427
638 672   655 653
584 584   584 586
534 524   529 541
576 564   570 521
666 656   661 508
516 520   518 488
584 554   569 584
632 632   632 632
386 408   397 367
446 456   451 345
442 456   449 460
686 674   680 706
end

------------------ copy up to and including the previous line ------------------

Listed 84 out of 84 observations

Comment

daniel klein

Join Date: Mar 2014

Posts: 3850
#8

04 Apr 2018, 08:35

So I assume that the true number of steps taken is the average of the visually observed number from the two testers. But I would like to calculate the ICC for the interrater reliability as you say. Can I use the same formula (ICC 3) or do I need a different one?

I was suggesting to estimate the reliability between the two testers, which might be of interest given that these values (or their average) serve as the benchmark. If the numbers were indeed true, in the literal sense, then the two testers should agree exactly in 100 per cent of the cases. If you estimate (unweighted) exact agreement, the two testers agree only on 7 per cent of all cases. The maximum difference being 36 steps. The mean difference between the testers is 0.4 steps, so the assumption of random error being canceled out by averaging seems to work. Taking the (ratio-)scale of stairs into account, you obtain a weighted agreement of 99.9 per cent. The ICC(3) is 98.8. These numbers suggest that the benchmark can be trusted.

Here is the code for the results above

Code:

// exact agreement kappaetc Talteskridt1 Talteskridt2 // difference generate diff = (Talteskridt1 - Talteskridt2) summarize diff // weighted agreement kappaetc Talteskridt1 Talteskridt2 , wgt(ratio) // ICC kappaetc Talteskridt1 Talteskridt2 , icc(mixed)

Best
Daniel
Comment
Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#9

04 Apr 2018, 08:41

Thank you very much!
Comment
Carolina Hincapie

Join Date: Jun 2023

Posts: 34
#10

19 Apr 2024, 01:57

Originally posted by daniel klein View Post

I think ICC(3,1) is a good choice here.

Dear all,
Does anyone knows how to perform a ICC(3, K) - Two-way mixed, average measure

I already tried to have a look on
help kappaetc icc and help kappaetc But couldn't find the right command

Thanks in advance

Last edited by Carolina Hincapie; 19 Apr 2024, 02:02.
Comment

Announcement