Dear Statalist,
I use Stata 16.1 MP and wish to study how a ownership change at the workplace level affects an outcome at the employee level. My dataset consists of app. 175,000 employees in 6,000 workplaces. Employees are observed monthly (conditional on their employment) from 2015 through 2019, which gives me a total of app. 5,000,000 observations. Both employees and workplaces have an unique ID. Ownership change occurs throughout the study period, which gives me 60 time periods and 59 treatment cohorts. I have covariates at different levels (eg. gender, marital status, and workplace size). A staggered diff-in-diff design and the commands csdid or jwdid (both SSC) seems most appropriate (correct me if I'm wrong!).
Due to privacy reasons, I can't share a dataex of the original data with you, but here is a toy example to get an idea of the structure:
Since I have observations nested in employees nested in workplaces I get an error message (r(451); repeated time values within panel) when trying to estimate with cdid using workplace ID as the panel identifier.
Will the repeated cross-section estimator still apply workplace FE? (not applicable to the example data)
When I apply the CS option on my real data, estimation is very slow. Is jwdid a better estimator considering my "messy" panel structure? Or will the I run into the same problem with regards to estimation time? Aggregetion to a higher time or ID unit is of course an option, but I also wish to keep as much variation as possible.
Hope this was clear. All suggestions are appreciated.
I use Stata 16.1 MP and wish to study how a ownership change at the workplace level affects an outcome at the employee level. My dataset consists of app. 175,000 employees in 6,000 workplaces. Employees are observed monthly (conditional on their employment) from 2015 through 2019, which gives me a total of app. 5,000,000 observations. Both employees and workplaces have an unique ID. Ownership change occurs throughout the study period, which gives me 60 time periods and 59 treatment cohorts. I have covariates at different levels (eg. gender, marital status, and workplace size). A staggered diff-in-diff design and the commands csdid or jwdid (both SSC) seems most appropriate (correct me if I'm wrong!).
Due to privacy reasons, I can't share a dataex of the original data with you, but here is a toy example to get an idea of the structure:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(eid time wid treat owner cohort y x1 x2 x3) 1 1 1 1 1 4 9.906464 .8840705 1 19 1 2 1 1 1 4 4.7443843 .1828533 1 20 1 3 1 1 1 4 2.8065584 .1280229 1 20 1 4 1 1 2 4 3.2066376 .6478783 1 20 1 5 1 1 2 4 .3065564 .8508041 1 20 2 1 1 1 1 4 4.385781 .6063875 0 19 2 2 1 1 1 4 3.988955 .322536 0 20 2 3 1 1 1 4 .23852494 .8962572 0 20 2 4 1 1 2 4 9.442598 .7763369 0 20 2 5 1 1 2 4 1.143186 .6614004 0 20 3 1 2 1 1 2 2.96908 .8210933 1 15 3 2 2 1 2 2 6.833038 .06343113 1 15 3 3 2 1 2 2 5.724059 .7743081 1 15 3 4 2 1 2 2 1.9303285 .4983811 1 17 3 5 2 1 2 2 7.387565 .6873622 1 17 4 1 3 0 1 0 4.0051527 .4709546 0 25 4 2 3 0 1 0 1.1618755 .14672393 0 25 4 3 3 0 1 0 8.574128 .7951593 0 22 4 4 3 0 1 0 6.941094 .7541171 0 22 4 5 3 0 1 0 4.971269 .8676175 0 22 5 1 3 0 1 0 5.524818 .62491 0 25 5 2 3 0 1 0 1.9233507 .10263631 0 25 6 1 3 0 1 0 5.307955 .6798756 1 25 6 2 3 0 1 0 2.655255 .8680485 1 25 6 3 3 0 1 0 2.7890375 .8407795 1 22 7 3 2 1 2 2 9.030829 .4029677 1 15 7 4 2 1 2 2 3.479616 .6724598 1 17 7 5 3 0 1 0 1.93837 .7535322 1 22 end label values owner owner label def owner 1 "public", modify label def owner 2 "private", modify
Since I have observations nested in employees nested in workplaces I get an error message (r(451); repeated time values within panel) when trying to estimate with cdid using workplace ID as the panel identifier.
Code:
csdid y x1 x2 x3, ivar(wid) time(time) gvar(cohort)
Code:
csdid y x1 x2 x3, cluster(wid) time(time) gvar(cohort)
Hope this was clear. All suggestions are appreciated.
Comment