Power Simulations for DiD with Clusters

Seth Henry Morgan

Join Date: Jul 2022
Posts: 2

Power Simulations for DiD with Clusters

26 Jul 2022, 04:53

Hi Statalist,

I am trying to do power calculations by simulation for a difference-in-differences design in a repeated cross-section where treatment assignment is at the village level, therefore I want to generate data that is clustered. I have consulted the Stata FAQ, and a number of StataList posts, including this helpful answer by Clyde Schechter, and this other helpful answer from Joseph Coveney. But when I implement what I thought was a reasonable adaptation of this advice, I tended to find that as I increased my Intra-Cluster Correlation I actually had greater power, rather than less. That doesn't seem right to me. See code block below for the program to generate the data. I then use simulate with this program to estimate power. When I just run the program once it is clear that I get higher t-statistics with higher ICC, so that is consistent with higher power, but is not what I expected. Can anyone see if I made a mistake in the way I generated the data? Thanks a lot!

Code:

capture program drop ddpowersimu

program ddpowersimu, rclass
version 17.0

    // Input parameters
    syntax, nperclust(integer)   /// sample size
    treat_ratio(real)  /// ratio of treated to untreated
    clust_num(integer)   /// number of clusters
    icc(real)  /// set intra-cluster correlation
    b1(real)  /// b1 under the alternative hypothesis
    sd(real)  /// standard deviation of outcome
    [ alpha(real 0.05)  /// set alpha level
    ]
    
     // Gen random data
    clear
    set obs `clust_num'
    gen int clust = _n // clusters
    
    gen byte x = mod(_n, 2) // treatment
    
    expand `nperclust'
    scalar ntotal = `nperclust'*`clust_num'
    sort clust
    
    expand 2
    
    gen t = 0  // time
    by clust, sort: gen memb_num = _n
    sort clust memb_num
    by clust: replace t = 1 if memb_num > (ntotal/`clust_num')
    
    // y variable
    scalar sd_u = sqrt(`icc')
    scalar sd_e = sqrt(1-`icc')
    by clust (memb_num), sort: gen u = rnormal(0, sd_u) if _n == 1
    by clust (memb_num): replace u = u[1]
    gen e = rnormal(0, sd_e)
    gen mu = `b1'*x*t
    gen y = mu + e + u
    
    // Fit diff in diff regression
    reg y x##t, vce(cluster clust)
    
    // Return results
    mat a=r(table)
    local p1=el(a,rownumb(a,"pvalue"),colnumb(a,"1.x#1.t"))
    return scalar pvalue = `p1'
    return scalar reject = (`p1'<`alpha')
end

Code:

simulate reject = r(reject) pvalue=r(pvalue), reps(100) seed(1234): ddpowersimu, clust_num(10) nperclust(10) b1(0.5) sd(1) icc(0.1) treat_ratio(0.5)
sum reject

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

26 Jul 2022, 10:33

The simulations are right* and your intuition is wrong. The reason is that the effect you are testing here is a within lower level contrast. Higher ICC does confer higher power for contrasts at that level. It decreases power for contrasts at the cluster level.

Yes, this is surprising and confusing, but that's the way it is. If it makes you feel better, I've been working with models like this for a few decades, and I still get blindsided by this from time to time.

*I mean the simulations are right in this particular respect. I have not tested the code or otherwise scrutinized it for possible errors. I'm just pointing out that the pattern of results you are getting are, in fact, what you should expect.
2 likes
Comment
Seth Henry Morgan

Join Date: Jul 2022

Posts: 2
#3

26 Jul 2022, 13:22

Thanks very much! That makes sense to me now that I think about it. Thank you for your response.
Comment

Announcement

Power Simulations for DiD with Clusters

Comment

Comment