Hi Statalist,
I am trying to do power calculations by simulation for a difference-in-differences design in a repeated cross-section where treatment assignment is at the village level, therefore I want to generate data that is clustered. I have consulted the Stata FAQ, and a number of StataList posts, including this helpful answer by Clyde Schechter, and this other helpful answer from Joseph Coveney. But when I implement what I thought was a reasonable adaptation of this advice, I tended to find that as I increased my Intra-Cluster Correlation I actually had greater power, rather than less. That doesn't seem right to me. See code block below for the program to generate the data. I then use simulate with this program to estimate power. When I just run the program once it is clear that I get higher t-statistics with higher ICC, so that is consistent with higher power, but is not what I expected. Can anyone see if I made a mistake in the way I generated the data? Thanks a lot!
I am trying to do power calculations by simulation for a difference-in-differences design in a repeated cross-section where treatment assignment is at the village level, therefore I want to generate data that is clustered. I have consulted the Stata FAQ, and a number of StataList posts, including this helpful answer by Clyde Schechter, and this other helpful answer from Joseph Coveney. But when I implement what I thought was a reasonable adaptation of this advice, I tended to find that as I increased my Intra-Cluster Correlation I actually had greater power, rather than less. That doesn't seem right to me. See code block below for the program to generate the data. I then use simulate with this program to estimate power. When I just run the program once it is clear that I get higher t-statistics with higher ICC, so that is consistent with higher power, but is not what I expected. Can anyone see if I made a mistake in the way I generated the data? Thanks a lot!
Code:
capture program drop ddpowersimu program ddpowersimu, rclass version 17.0 // Input parameters syntax, nperclust(integer) /// sample size treat_ratio(real) /// ratio of treated to untreated clust_num(integer) /// number of clusters icc(real) /// set intra-cluster correlation b1(real) /// b1 under the alternative hypothesis sd(real) /// standard deviation of outcome [ alpha(real 0.05) /// set alpha level ] // Gen random data clear set obs `clust_num' gen int clust = _n // clusters gen byte x = mod(_n, 2) // treatment expand `nperclust' scalar ntotal = `nperclust'*`clust_num' sort clust expand 2 gen t = 0 // time by clust, sort: gen memb_num = _n sort clust memb_num by clust: replace t = 1 if memb_num > (ntotal/`clust_num') // y variable scalar sd_u = sqrt(`icc') scalar sd_e = sqrt(1-`icc') by clust (memb_num), sort: gen u = rnormal(0, sd_u) if _n == 1 by clust (memb_num): replace u = u[1] gen e = rnormal(0, sd_e) gen mu = `b1'*x*t gen y = mu + e + u // Fit diff in diff regression reg y x##t, vce(cluster clust) // Return results mat a=r(table) local p1=el(a,rownumb(a,"pvalue"),colnumb(a,"1.x#1.t")) return scalar pvalue = `p1' return scalar reject = (`p1'<`alpha') end
Code:
simulate reject = r(reject) pvalue=r(pvalue), reps(100) seed(1234): ddpowersimu, clust_num(10) nperclust(10) b1(0.5) sd(1) icc(0.1) treat_ratio(0.5) sum reject
Comment