Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • power analysis with clustered data - simulated vs calculated

    I want to estimate my statistical power for two groups with clustered data. When I use the power command I get a very different answer then when I run my own simulations (that try to mimic the same analysis). Obviously I'm doing something wrong (or making a conceptual mistake), and I'm trying to figure out why.

    Here's the basic set up. I'm assuming a total sample of 100 clusters (50 in each group), with 25 observations per cluster. The degree of clustering (intraclass correlation) is 0.16. The sample std deviation is 1 and the difference between groups is 0.2.

    Here are the results when I use Stata's power twomeans command:

    Code:
    . power twomeans 0 .2, cluster k1(50) k2(50) m1(25) m2(25) rho(0.16) sd(1)
    
    Estimated power for a two-sample means test
    Cluster randomized design, z test assuming sd1 = sd2 = sd
    Ho: m2 = m1  versus  Ha: m2 != m1
    
    Study parameters:
    
            alpha =    0.0500
            delta =    0.2000
               m1 =    0.0000
               m2 =    0.2000
               sd =    1.0000
    
    Cluster design:
    
               K1 =        50
               K2 =        50
               M1 =        25
               M2 =        25
               N1 =     1,250
               N2 =     1,250
              rho =    0.1600
    
    Estimated power:
    
            power =    0.6228
    So my power (for an alpha of .05) is just over 62%. Now here's me trying to simulate the same analysis:

    Code:
    . program define myprog, rclass
      1.         clear
      2.         set obs 100 // number of clusters
      3.         gen cluster_num = _n
      4.         expand 25 // number of observations per cluster
      5.         local rho = 0.16 // intraclass correlation = .16
      6.         local sd_u = sqrt(`rho')
      7.         local sd_e = sqrt(1-`rho')
      8.         by cluster_num, sort: gen trial_num = _n
      9.         gen x = runiformint(0,1) // assigning clusters to one of two groups
     10.         by cluster_num (trial_num), sort: gen u = rnormal(0, `sd_u') if _n == 1
     11.         by cluster_num (trial_num): replace u = u[1]
     12.         gen e = rnormal(0, `sd_e')
     13.         gen y = (.2 * x) + u + e // generating outcome for each grou (group1: M = 0) (group2: M = 0.2)
     14.         sum y
     15.         return scalar sd = r(sd) // recording SD
     16.         regress y x, cluster(cluster_num)
     17.         lincom x
     18.         return scalar p = r(p) // recording p-value
     19. end
    
    . simulate sd=r(sd) p=r(p), reps(10000) nodots: myprog // running simulation
    
          command:  myprog
               sd:  r(sd)
                p:  r(p)
    
    
    . gen power = (p <= .05) // tabulating percentage of times p <= .05
    
    .
    . // checking that SD of y is approx 1
    . sum sd
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              sd |     10,000    1.004122    .0179245   .9414742   1.073392
    
    .
    . // estimating power (alpha = .05)
    . sum power
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           power |     10,000       .9984      .03997          0          1
    So when I run a set of simulations it suggests that my power (for an alpha of .05) is over 99%.

    In sum, when I use Stata's power command I get an estimated power of about 62%, but when I try to simulate the same results I get an estimated power north of 99%. I'm trying to figure out where I'm going wrong here.

  • #2
    I figured out the problem to my original question. I generated x after creating each cluster, when I should have done so beforehand (so x is assigned at the cluster level, rather than at the individual observation/trial level). Once I do this I find similar results when running the simulation.

    Comment


    • #3
      You may also consider a mixed model for analysis, to compare against the regression method with cluster-robust variance estimates. I suspect that with the large number of clusters and small effect size desired, there may not be any material difference between the two, but -mixed- models tend to be preferred in some domains.

      Comment

      Working...
      X