I have three days where three different measurement systems [variable name: system] provided a measure of clock time at which an event occurred (i.e., 5:42 AM, 5:43 AM, 5:42 AM) and a duration (i.e., 407 minutes, 413 minutes, 436 minutes, variable name: duration) over the course of three consecutive nights [variable name; night]. I want to test the the reliability of the systems to determine the time of the event. I converted my HH:MM time variables to decimals [decimaltime] in Excel as well as recoded the clock times into integers in STATA using . gen double [integertime] = clock(time, "hm"). That doesn't seem to make a difference.
I tried ICC three different ways and still have some issues with interpretation. Here are the means by system for reference:
-----------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
----------------+------------------------------------------------
decimaltime |
system1 | .244397 .0041795 .23605 .252744
system2 | .2409406 .0041865 .2325795 .2493017
system3 | .242519 .0042903 .2339508 .2510873
----------------+------------------------------------------------
duration |
system1 | 407.5909 11.99015 383.6449 431.5369
system2 | 413.9146 7.2453 399.4447 428.3845
system3 | 436.1818 4.518426 427.1579 445.2057
2) I combined the subject id and the repeated measure night to create an idbynight variable. I'm pretty sure this way is wrong because I have repeated measures.
Time
icc decimaltime system idbynight, abs
Intraclass correlations
Two-way random-effects model
Absolute agreement
Random effects: system Number of targets = 3
Random effects: idbynight Number of raters = 22
--------------------------------------------------------------
decimaltime | ICC [95% Conf. Interval]
-----------------------+--------------------------------------
Individual | .0019337 -.003774 .2284833
Average | .040882 -.0901752 .8669374
--------------------------------------------------------------
F test that
ICC=0.00: F(2.0, 42.0) = 1.34 Prob > F = 0.273
Duration
. icc duration system idbynight, abs
Intraclass correlations
Two-way random-effects model
Absolute agreement
Random effects: system Number of targets = 3
Random effects: idbynight Number of raters = 22
--------------------------------------------------------------
duration | ICC [95% Conf. Interval]
-----------------------+--------------------------------------
Individual | .0998525 .0051437 .8478667
Average | .7093394 .1021304 .99191
--------------------------------------------------------------
F test that
ICC=0.00: F(2.0, 42.0) = 4.58 Prob > F = 0.016
#2) I tried mixed model with estat icc. This way seems the most accurate from what I've read. The results for decimaltime make sense looking at the mixed model results, but seem really poor for duration given how similar the systems were. I am also not sure how to determine F and p values from this output.
Time
mixed decimaltime system##night || id:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 222.61294
Iteration 1: log likelihood = 222.61294
Computing standard errors:
Mixed-effects ML regression Number of obs = 66
Group variable: id Number of groups = 8
Obs per group:
min = 6
avg = 8.3
max = 9
Wald chi2(8) = 10.83
Log likelihood = 222.61294 Prob > chi2 = 0.2114
---------------------------------------------------------------------------------
decimaltime | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
system |
system2 | -.0016926 .0032166 -0.53 0.599 -.007997 .0046118
system3 | -.0018227 .0032166 -0.57 0.571 -.0081272 .0044817
|
night |
2 | .0004009 .0033478 0.12 0.905 -.0061607 .0069626
3 | -.0002163 .0033478 -0.06 0.948 -.0067779 .0063453
|
system#night |
system2#2 | .0014446 .0047086 0.31 0.759 -.0077841 .0106733
system2#3 | -.0069878 .0047086 -1.48 0.138 -.0162165 .0022409
system3#2 | -.0002605 .0047086 -0.06 0.956 -.0094892 .0089682
system3#3 | .000087 .0047086 0.02 0.985 -.0091417 .0093157
|
_cons | .2437065 .0068456 35.60 0.000 .2302894 .2571236
---------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | .0003335 .0001694 .0001232 .0009025
-----------------------------+------------------------------------------------
var(Residual) | .0000414 7.69e-06 .0000288 .0000596
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 111.07 Prob >= chibar2 = 0.0000
.
. estat icc
Residual intraclass correlation
------------------------------------------------------------------------------
Level | ICC Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id | .8896069 .0532137 .73589 .9588595
------------------------------------------------------------------------------
.
Duration
. mixed duration system##night || id:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -326.76031
Iteration 1: log likelihood = -326.76031
Computing standard errors:
Mixed-effects ML regression Number of obs = 66
Group variable: id Number of groups = 8
Obs per group:
min = 6
avg = 8.3
max = 9
Wald chi2(8) = 15.75
Log likelihood = -326.76031 Prob > chi2 = 0.0461
------------------------------------------------------------------------------
duration | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
system |
system2 | 3.386726 15.46029 0.22 0.827 -26.91488 33.68834
system3 | 25.0625 15.46029 1.62 0.105 -5.23911 55.36411
|
night |
2 | -24.08278 16.07087 -1.50 0.134 -55.5811 7.415554
3 | 13.57833 16.07087 0.84 0.398 -17.92 45.07666
|
system#night |
system2#2 | 21.12467 22.63155 0.93 0.351 -23.23234 65.48169
system2#3 | -11.89424 22.63155 -0.53 0.599 -56.25126 32.46277
system3#2 | 21.72321 22.63155 0.96 0.337 -22.6338 66.08023
system3#3 | -10.63393 22.63155 -0.47 0.638 -54.99094 33.72309
|
_cons | 411.125 13.48498 30.49 0.000 384.6949 437.5551
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 498.6761 309.0827 147.9921 1680.346
-----------------------------+------------------------------------------------
var(Residual) | 956.0821 177.5108 664.4419 1375.731
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 14.56 Prob >= chibar2 = 0.0001
.
. estat icc
Residual intraclass correlation
------------------------------------------------------------------------------
Level | ICC Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id | .3427897 .1485578 .1252824 .6551052
------------------------------------------------------------------------------
.
#3) I reshaped the data and used kappaetc. Same problem with duration results as in #2, except I can find the F and p values way more easily.
Time
. kappaetc decimaltime* , icc(mixed) i(idbynight)
Interrater reliability Number of subjects = 22
Two-way mixed-effects model Ratings per subject = 3
------------------------------------------------------------------------------
| Coef. F df1 df2 P>F [95% Conf. Interval]
---------------+--------------------------------------------------------------
ICC(3,1) | 0.8744 21.89 21.00 42.00 0.000 0.7651 0.9411
---------------+--------------------------------------------------------------
sigma_s | 0.0185
sigma_e | 0.0070
------------------------------------------------------------------------------
Duration
. kappaetc duration* , icc(mixed) i(idbynight)
Interrater reliability Number of subjects = 22
Two-way mixed-effects model Ratings per subject = 3
------------------------------------------------------------------------------
| Coef. F df1 df2 P>F [95% Conf. Interval]
---------------+--------------------------------------------------------------
ICC(3,1) | 0.3176 2.40 21.00 42.00 0.008 0.0563 0.5925
---------------+--------------------------------------------------------------
sigma_s | 22.4659
sigma_e | 32.9276
------------------------------------------------------------------------------
Am I missing something or just in denial about the poor inter-rater reliability for duration? Thanks in advance.
I tried ICC three different ways and still have some issues with interpretation. Here are the means by system for reference:
-----------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
----------------+------------------------------------------------
decimaltime |
system1 | .244397 .0041795 .23605 .252744
system2 | .2409406 .0041865 .2325795 .2493017
system3 | .242519 .0042903 .2339508 .2510873
----------------+------------------------------------------------
duration |
system1 | 407.5909 11.99015 383.6449 431.5369
system2 | 413.9146 7.2453 399.4447 428.3845
system3 | 436.1818 4.518426 427.1579 445.2057
2) I combined the subject id and the repeated measure night to create an idbynight variable. I'm pretty sure this way is wrong because I have repeated measures.
Time
icc decimaltime system idbynight, abs
Intraclass correlations
Two-way random-effects model
Absolute agreement
Random effects: system Number of targets = 3
Random effects: idbynight Number of raters = 22
--------------------------------------------------------------
decimaltime | ICC [95% Conf. Interval]
-----------------------+--------------------------------------
Individual | .0019337 -.003774 .2284833
Average | .040882 -.0901752 .8669374
--------------------------------------------------------------
F test that
ICC=0.00: F(2.0, 42.0) = 1.34 Prob > F = 0.273
Duration
. icc duration system idbynight, abs
Intraclass correlations
Two-way random-effects model
Absolute agreement
Random effects: system Number of targets = 3
Random effects: idbynight Number of raters = 22
--------------------------------------------------------------
duration | ICC [95% Conf. Interval]
-----------------------+--------------------------------------
Individual | .0998525 .0051437 .8478667
Average | .7093394 .1021304 .99191
--------------------------------------------------------------
F test that
ICC=0.00: F(2.0, 42.0) = 4.58 Prob > F = 0.016
#2) I tried mixed model with estat icc. This way seems the most accurate from what I've read. The results for decimaltime make sense looking at the mixed model results, but seem really poor for duration given how similar the systems were. I am also not sure how to determine F and p values from this output.
Time
mixed decimaltime system##night || id:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 222.61294
Iteration 1: log likelihood = 222.61294
Computing standard errors:
Mixed-effects ML regression Number of obs = 66
Group variable: id Number of groups = 8
Obs per group:
min = 6
avg = 8.3
max = 9
Wald chi2(8) = 10.83
Log likelihood = 222.61294 Prob > chi2 = 0.2114
---------------------------------------------------------------------------------
decimaltime | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
system |
system2 | -.0016926 .0032166 -0.53 0.599 -.007997 .0046118
system3 | -.0018227 .0032166 -0.57 0.571 -.0081272 .0044817
|
night |
2 | .0004009 .0033478 0.12 0.905 -.0061607 .0069626
3 | -.0002163 .0033478 -0.06 0.948 -.0067779 .0063453
|
system#night |
system2#2 | .0014446 .0047086 0.31 0.759 -.0077841 .0106733
system2#3 | -.0069878 .0047086 -1.48 0.138 -.0162165 .0022409
system3#2 | -.0002605 .0047086 -0.06 0.956 -.0094892 .0089682
system3#3 | .000087 .0047086 0.02 0.985 -.0091417 .0093157
|
_cons | .2437065 .0068456 35.60 0.000 .2302894 .2571236
---------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | .0003335 .0001694 .0001232 .0009025
-----------------------------+------------------------------------------------
var(Residual) | .0000414 7.69e-06 .0000288 .0000596
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 111.07 Prob >= chibar2 = 0.0000
.
. estat icc
Residual intraclass correlation
------------------------------------------------------------------------------
Level | ICC Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id | .8896069 .0532137 .73589 .9588595
------------------------------------------------------------------------------
.
Duration
. mixed duration system##night || id:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -326.76031
Iteration 1: log likelihood = -326.76031
Computing standard errors:
Mixed-effects ML regression Number of obs = 66
Group variable: id Number of groups = 8
Obs per group:
min = 6
avg = 8.3
max = 9
Wald chi2(8) = 15.75
Log likelihood = -326.76031 Prob > chi2 = 0.0461
------------------------------------------------------------------------------
duration | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
system |
system2 | 3.386726 15.46029 0.22 0.827 -26.91488 33.68834
system3 | 25.0625 15.46029 1.62 0.105 -5.23911 55.36411
|
night |
2 | -24.08278 16.07087 -1.50 0.134 -55.5811 7.415554
3 | 13.57833 16.07087 0.84 0.398 -17.92 45.07666
|
system#night |
system2#2 | 21.12467 22.63155 0.93 0.351 -23.23234 65.48169
system2#3 | -11.89424 22.63155 -0.53 0.599 -56.25126 32.46277
system3#2 | 21.72321 22.63155 0.96 0.337 -22.6338 66.08023
system3#3 | -10.63393 22.63155 -0.47 0.638 -54.99094 33.72309
|
_cons | 411.125 13.48498 30.49 0.000 384.6949 437.5551
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 498.6761 309.0827 147.9921 1680.346
-----------------------------+------------------------------------------------
var(Residual) | 956.0821 177.5108 664.4419 1375.731
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 14.56 Prob >= chibar2 = 0.0001
.
. estat icc
Residual intraclass correlation
------------------------------------------------------------------------------
Level | ICC Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id | .3427897 .1485578 .1252824 .6551052
------------------------------------------------------------------------------
.
#3) I reshaped the data and used kappaetc. Same problem with duration results as in #2, except I can find the F and p values way more easily.
Time
. kappaetc decimaltime* , icc(mixed) i(idbynight)
Interrater reliability Number of subjects = 22
Two-way mixed-effects model Ratings per subject = 3
------------------------------------------------------------------------------
| Coef. F df1 df2 P>F [95% Conf. Interval]
---------------+--------------------------------------------------------------
ICC(3,1) | 0.8744 21.89 21.00 42.00 0.000 0.7651 0.9411
---------------+--------------------------------------------------------------
sigma_s | 0.0185
sigma_e | 0.0070
------------------------------------------------------------------------------
Duration
. kappaetc duration* , icc(mixed) i(idbynight)
Interrater reliability Number of subjects = 22
Two-way mixed-effects model Ratings per subject = 3
------------------------------------------------------------------------------
| Coef. F df1 df2 P>F [95% Conf. Interval]
---------------+--------------------------------------------------------------
ICC(3,1) | 0.3176 2.40 21.00 42.00 0.008 0.0563 0.5925
---------------+--------------------------------------------------------------
sigma_s | 22.4659
sigma_e | 32.9276
------------------------------------------------------------------------------
Am I missing something or just in denial about the poor inter-rater reliability for duration? Thanks in advance.
Comment