Test-retest reliability?

Karin Jensen

Join Date: Apr 2014

Posts: 29
#1

Test-retest reliability?

12 Oct 2017, 04:46

Hi Stata forum

I have repeated measures of the same thing on the same subjects with a variable number (2 to 5) measurements per subject. What is the best way of measuring test-retest reliability in Stata?

Thankyou

Karin
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

12 Oct 2017, 08:33

If the measures are on a continuous scale, the conventional approach would be the intraclass correlation. -help icc-
Comment
Karin Jensen

Join Date: Apr 2014

Posts: 29
#3

12 Oct 2017, 08:47

I had had a look at icc, but I couldn't see how to apply it to my situation. It has the syntax "icc depvar target" but I don't have a dependent variable and a target. I have several measurements per person of equal status. Sorry if I am being stupid.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

12 Oct 2017, 09:21

The dependent variable refers to your measurement, and the target refers to the person being measured.
Comment
Karin Jensen

Join Date: Apr 2014

Posts: 29
#5

12 Oct 2017, 09:30

Oops I was being stupid. Thankyou Clyde.
Comment
Karin Jensen

Join Date: Apr 2014

Posts: 29
#6

13 Oct 2017, 06:36

As I mentioned I have variable numbers of measurements per subject (2-5). When I run icc I get the message that targets have been omitted from computation because of unbalanced data. The results seem to include only the subjects with 5 measurements.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#7

13 Oct 2017, 08:58

That's odd. Here's a different approach:

Code:

mixed measurement || subject_id: estat icc

This will not give you any trouble with unbalanced data.
Comment
Karin Jensen

Join Date: Apr 2014

Posts: 29
#8

13 Oct 2017, 09:58

Thankyou Clyde--that was just what I needed.
Comment

daniel klein

Join Date: Mar 2014
Posts: 3850

13 Oct 2017, 10:28

Originally posted by Clyde Schechter View Post

That's odd. Here's a different approach:

Code:

mixed measurement || subject_id:
estat icc

This will not give you any trouble with unbalanced data.

It will likely yield a slightly different result, though. You can replicate icc results with xtreg using the GLS estimator. The ML estimator will yield the same result as the mixed approach. Asymptotically, the two results should be the same, I think.

Code:

webuse judges , clear
icc rating target

quietly mixed rating || target :
estat icc

yields

Code:

. webuse judges , clear
(Ratings of targets by judges)

. icc rating target

Intraclass correlations
One-way random-effects model
Absolute agreement

Random effects: target           Number of targets =         6
                                 Number of raters  =         4

--------------------------------------------------------------
                rating |        ICC       [95% Conf. Interval]
-----------------------+--------------------------------------
            Individual |   .1657418      -.1329323    .7225601
               Average |   .4427971      -.8844422    .9124154
--------------------------------------------------------------
F test that
  ICC=0.00: F(5.0, 18.0) = 1.79               Prob > F = 0.165

Note: ICCs estimate correlations between individual measurements
      and between average measurements made on the same target.

.
. quietly mixed rating || target :

. estat icc

Intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
                      target |   .1102339   .1973357      .0023957    .8647096
------------------------------------------------------------------------------

Compare with

Code:

xtset target
xtreg rating
xtreg rating , mle

which yields

Code:

. xtset target
       panel variable:  target (balanced)

. xtreg rating

Random-effects GLS regression                   Number of obs     =         24
Group variable: target                          Number of groups  =          6

R-sq:                                           Obs per group:
     within  = 0.0000                                         min =          4
     between = 0.0000                                         avg =        4.0
     overall = 0.0000                                         max =          4

                                                Wald chi2(0)      =          .
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .

------------------------------------------------------------------------------
      rating |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   5.291667   .6843996     7.73   0.000     3.950268    6.633065
-------------+----------------------------------------------------------------
     sigma_u |  1.1155467
     sigma_e |  2.5027762
         rho |  .16574177   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg rating , mle
Iteration 0:   log likelihood = -57.297209
Iteration 1:   log likelihood = -57.279795
Iteration 2:   log likelihood = -57.279652
Iteration 3:   log likelihood = -57.279652

Random-effects ML regression                    Number of obs     =         24
Group variable: target                          Number of groups  =          6

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          4
                                                              avg =        4.0
                                                              max =          4

                                                Wald chi2(0)      =       0.00
Log likelihood  = -57.279652                    Prob > chi2       =          .

------------------------------------------------------------------------------
      rating |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   5.291667   .6247685     8.47   0.000     4.067143     6.51619
-------------+----------------------------------------------------------------
    /sigma_u |   .8809323   .8226624                      .1412682    5.493394
    /sigma_e |   2.502776   .4171294                      1.805324    3.469676
         rho |   .1102343   .1973356                      .0005208    .7963012
------------------------------------------------------------------------------
LR test of sigma_u=0: chibar2(01) = 0.39               Prob >= chibar2 = 0.267

Edit: It is interesting that the CIs seem to differ quite substantially.

Best
Daniel

Last edited by daniel klein; 13 Oct 2017, 10:41.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#10

13 Oct 2017, 18:22

Daniel,

Code:

quietly mixed . . ., reml estat icc
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#11

14 Oct 2017, 06:53

Joseph, thanks for the hint. REML indeed matches the point estimate. CIs are still off.

Best
Daniel
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#12

14 Oct 2017, 18:48

Daniel, I'm guessing that the CIs differ because icc is based on the ANOVA model formulation, which allows for negative estimates for variance components above the residual (and thus, in principle, negative ICC estimates), and estat icc is derived from an iterative maximum (restricted) likelihood fit, which constrains all variance component estimates to be nonnegative (maybe even strictly positive). So, the CI for icc is wider (and includes negative values) while the CI for estat icc is conditional on the ICC's being nonnegative and is narrower.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#13

14 Oct 2017, 21:01

Originally posted by Joseph Coveney View Post

So, the CI for icc is wider . . . while the CI for estat icc is . . . is narrower.

Sorry, I should have said, "So, the CIs for estat ci are shifted rightward relative to those CIs from icc", at least for those of the latter that entertain negative values.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#14

15 Oct 2017, 01:31

Originally posted by Joseph Coveney View Post

Daniel, I'm guessing that the CIs differ because icc is based on the ANOVA model formulation, which allows for negative estimates for variance components above the residual (and thus, in principle, negative ICC estimates), and estat icc is derived from an iterative maximum (restricted) likelihood fit, which constrains all variance component estimates to be nonnegative (maybe even strictly positive).

Thanks for this further explanation which sounds plausible. I have done a bit of reading in the manuals myself and it seems there are fundamental differences in the way CIs are estimated. In the ANOVA framework (icc) the CIs are based directly on the respective sums of squares and the F-distribution while in the mixed framework a logit transformation and normal approximation is involved. So it seems it is not only the restriction of the estimated variance components but other details that differ between the models as well.

Best
Daniel
Comment
Chris Martin

Join Date: Nov 2015

Posts: 96
#15

21 May 2020, 11:00

Is there a way to incorporate information about the order of waves here? For a given person a score in five successive waves could be (2,3,4,5,6) which would suggest reasonably high test-retest reliability or (2,4,6,3,4) which would not, but I'm not an icc would pick up the difference here. I know how to model auto-regressive residuals in mixed so maybe a formula using that?

Best,
Chris
Comment

Announcement