Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Staggered DID with high dimensional data

    Dear Statalist,

    I use Stata 16.1 MP and wish to study how a ownership change at the workplace level affects an outcome at the employee level. My dataset consists of app. 175,000 employees in 6,000 workplaces. Employees are observed monthly (conditional on their employment) from 2015 through 2019, which gives me a total of app. 5,000,000 observations. Both employees and workplaces have an unique ID. Ownership change occurs throughout the study period, which gives me 60 time periods and 59 treatment cohorts. I have covariates at different levels (eg. gender, marital status, and workplace size). A staggered diff-in-diff design and the commands csdid or jwdid (both SSC) seems most appropriate (correct me if I'm wrong!).

    Due to privacy reasons, I can't share a dataex of the original data with you, but here is a toy example to get an idea of the structure:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(eid time wid treat owner cohort y x1 x2 x3)
    1 1 1 1 1 4  9.906464  .8840705 1 19
    1 2 1 1 1 4 4.7443843  .1828533 1 20
    1 3 1 1 1 4 2.8065584  .1280229 1 20
    1 4 1 1 2 4 3.2066376  .6478783 1 20
    1 5 1 1 2 4  .3065564  .8508041 1 20
    2 1 1 1 1 4  4.385781  .6063875 0 19
    2 2 1 1 1 4  3.988955   .322536 0 20
    2 3 1 1 1 4 .23852494  .8962572 0 20
    2 4 1 1 2 4  9.442598  .7763369 0 20
    2 5 1 1 2 4  1.143186  .6614004 0 20
    3 1 2 1 1 2   2.96908  .8210933 1 15
    3 2 2 1 2 2  6.833038 .06343113 1 15
    3 3 2 1 2 2  5.724059  .7743081 1 15
    3 4 2 1 2 2 1.9303285  .4983811 1 17
    3 5 2 1 2 2  7.387565  .6873622 1 17
    4 1 3 0 1 0 4.0051527  .4709546 0 25
    4 2 3 0 1 0 1.1618755 .14672393 0 25
    4 3 3 0 1 0  8.574128  .7951593 0 22
    4 4 3 0 1 0  6.941094  .7541171 0 22
    4 5 3 0 1 0  4.971269  .8676175 0 22
    5 1 3 0 1 0  5.524818    .62491 0 25
    5 2 3 0 1 0 1.9233507 .10263631 0 25
    6 1 3 0 1 0  5.307955  .6798756 1 25
    6 2 3 0 1 0  2.655255  .8680485 1 25
    6 3 3 0 1 0 2.7890375  .8407795 1 22
    7 3 2 1 2 2  9.030829  .4029677 1 15
    7 4 2 1 2 2  3.479616  .6724598 1 17
    7 5 3 0 1 0   1.93837  .7535322 1 22
    end
    label values owner owner
    label def owner 1 "public", modify
    label def owner 2 "private", modify


    Since I have observations nested in employees nested in workplaces I get an error message (r(451); repeated time values within panel) when trying to estimate with cdid using workplace ID as the panel identifier.

    Code:
    csdid  y x1 x2 x3, ivar(wid) time(time) gvar(cohort)
    Will the repeated cross-section estimator still apply workplace FE? (not applicable to the example data)

    Code:
    csdid  y x1 x2 x3, cluster(wid) time(time) gvar(cohort)
    When I apply the CS option on my real data, estimation is very slow. Is jwdid a better estimator considering my "messy" panel structure? Or will the I run into the same problem with regards to estimation time? Aggregetion to a higher time or ID unit is of course an option, but I also wish to keep as much variation as possible.

    Hope this was clear. All suggestions are appreciated.

  • #2
    This wasn't very clear, but repeated time values simply means you've misspecified your panel setup. You need to use the lowest level of aggregation and time (employee level).

    Comment


    • #3
      Apologies for being unclear!

      I do not want employee fixed effects, which I assume follows from choosing employee ID as the panel identifier in csdid (correct me if I'm wrong). So unless I do some sort of aggregation, I guess panel data estimators are out of the question. But the issue with slow estimation due to the high number of time periods and cohorts still remains.

      Comment

      Working...
      X