Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Power Simulations for DiD with Clusters

    Hi Statalist,

    I am trying to do power calculations by simulation for a difference-in-differences design in a repeated cross-section where treatment assignment is at the village level, therefore I want to generate data that is clustered. I have consulted the Stata FAQ, and a number of StataList posts, including this helpful answer by Clyde Schechter, and this other helpful answer from Joseph Coveney. But when I implement what I thought was a reasonable adaptation of this advice, I tended to find that as I increased my Intra-Cluster Correlation I actually had greater power, rather than less. That doesn't seem right to me. See code block below for the program to generate the data. I then use simulate with this program to estimate power. When I just run the program once it is clear that I get higher t-statistics with higher ICC, so that is consistent with higher power, but is not what I expected. Can anyone see if I made a mistake in the way I generated the data? Thanks a lot!

    Code:
    capture program drop ddpowersimu
    
    program ddpowersimu, rclass
    version 17.0
    
        // Input parameters
        syntax, nperclust(integer)   /// sample size
        treat_ratio(real)  /// ratio of treated to untreated
        clust_num(integer)   /// number of clusters
        icc(real)  /// set intra-cluster correlation
        b1(real)  /// b1 under the alternative hypothesis
        sd(real)  /// standard deviation of outcome
        [ alpha(real 0.05)  /// set alpha level
        ]
        
         // Gen random data
        clear
        set obs `clust_num'
        gen int clust = _n // clusters
        
        gen byte x = mod(_n, 2) // treatment
        
        expand `nperclust'
        scalar ntotal = `nperclust'*`clust_num'
        sort clust
        
        expand 2
        
        gen t = 0  // time
        by clust, sort: gen memb_num = _n
        sort clust memb_num
        by clust: replace t = 1 if memb_num > (ntotal/`clust_num')
        
        // y variable
        scalar sd_u = sqrt(`icc')
        scalar sd_e = sqrt(1-`icc')
        by clust (memb_num), sort: gen u = rnormal(0, sd_u) if _n == 1
        by clust (memb_num): replace u = u[1]
        gen e = rnormal(0, sd_e)
        gen mu = `b1'*x*t
        gen y = mu + e + u
        
        // Fit diff in diff regression
        reg y x##t, vce(cluster clust)
        
        // Return results
        mat a=r(table)
        local p1=el(a,rownumb(a,"pvalue"),colnumb(a,"1.x#1.t"))
        return scalar pvalue = `p1'
        return scalar reject = (`p1'<`alpha')
    end
    Code:
    simulate reject = r(reject) pvalue=r(pvalue), reps(100) seed(1234): ddpowersimu, clust_num(10) nperclust(10) b1(0.5) sd(1) icc(0.1) treat_ratio(0.5)
    sum reject

  • #2
    The simulations are right* and your intuition is wrong. The reason is that the effect you are testing here is a within lower level contrast. Higher ICC does confer higher power for contrasts at that level. It decreases power for contrasts at the cluster level.

    Yes, this is surprising and confusing, but that's the way it is. If it makes you feel better, I've been working with models like this for a few decades, and I still get blindsided by this from time to time.

    *I mean the simulations are right in this particular respect. I have not tested the code or otherwise scrutinized it for possible errors. I'm just pointing out that the pattern of results you are getting are, in fact, what you should expect.

    Comment


    • #3
      Thanks very much! That makes sense to me now that I think about it. Thank you for your response.

      Comment

      Working...
      X