Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DID designs for repeated cross sections, a binary response variable, and common timing.

    Hi all,

    I am trying to figure out what my options are w.r.t difference in differences designs with common timing, repeated cross sections, and a binary outcome.

    I have individual patient-level repeated cross sections from 85 clinics over T = 26 months (and counting). On average, each clinic has about 75 patients per month, but there is wide variation. The two treated clinics are much larger, averaging ~900 patients per month. I am hoping to stratify by specialty (I have 4). On average, within each specialty-clinic-month cell there are about 22 patients (but among treated clinics there are close to 240 on average). The two large clinics I mentioned were treated on the exact same date, so I'm dealing with common timing. I have 6 months of post-treatment observations and 20 months of pre-treatment observations.

    I am planning on estimating LPMs with clinic and month dummies (both a static (i.e. T = 2 case) and dynamic TWFE specification), but I was also hoping to estimate a non-linear model.

    I am confused about a few things:

    (1) can I apply pooled QMLE as described by Wooldridge 2023 (see link) with repeated cross-sections because I don't have staggered timing?

    (2) if not, will a GLM with a logit or probit link function yield consistent estimates? I am concerned about/don't fully understand: the incidental parameters problem I might run into with the clinic and time dummies AND interpreting interaction terms (computing cross-partial effects) in a non-linear setting.

    Thank you!

    Jeffrey M Wooldridge, Simple approaches to nonlinear difference-in-differences with panel data, The Econometrics Journal, Volume 26, Issue 3, September 2023, Pages C31–C66, https://doi.org/10.1093/ectj/utad016
    Summary. I derive simple, flexible strategies for difference-in-differences settings where the nature of the response variable may warrant a nonlinear mode
    Last edited by Daniel Lipsey; 21 Nov 2024, 16:01.

  • #2
    I can help you on your second question. I would probably go for LPM in your case. Then perhaps apply the trimmed estimator of Horrace and Oaxaca (2006) if you have a lot of out of sample predicted values.

    I would not use probit. Logit on the other hand has a sufficient statistic for the incidental parameter, here clinic fixed effects. This eliminates the IPP, all else equal: this would be xtlogit, fe if my memory serves.

    You are correct: interaction terms in nonlinear models are very complicated. Here are two references that should help:

    INTERACTION TERMS IN POISSON AND LOG LINEAR REGRESSION MODELS - Shang - 2018 - Bulletin of Economic Research - Wiley Online Library

    Interaction terms in logit and probit models - ScienceDirect

    Comment


    • #3
      Thank you, Maxence! Will read up on all of the above.

      If anyone has thoughts about potential for pooled QMLE with repeated cross sections please let me know. Thanks again!

      Comment


      • #4
        I think pooled QMLE would be suitable. Do both LPM and QMLE and compare. I doubt much difference between the two.

        Comment


        • #5
          Thank you, Dr. Ford!

          Comment


          • #6
            Daniel: It all goes through. Common timing is a special case of staggered timing. Rather than cohort dummies Dg, you simply have one dummy whether the unit is ever treated. Then you define treatment dummies for each different period, Dg*fs where fs indicates a year during the intervention. I've recently extended the method for repeated cross sections, but I haven't written it down. I have Stata do files that I haven't yet posted to my shared Dropbox, which is here:

            https://www.dropbox.com/scl/fo/xvuiq...zr2ride3v&dl=0

            I'll post the files by later today.

            Comment


            • #7
              Thanks so much, Professor Wooldridge!

              Comment


              • #8
                Something else.
                Depending on your assumptions for variables, you can use JWDID. You just need to drop the "ivar" for the estimation

                Comment


                • #9
                  Thanks Fernando! I forgot that jwdid also does logit. One issue is that, as far as I can tell, it only reports the ATTs on the log odds, not on the probabilities themselves. I was able to reproduce what it does, and also add the partial effects on the probability. I've almost cleaned up the files and will post after a few meetings today.

                  I think we always want a way to compare the linear model with nonlinear models, and that's why I've worked on getting the ATTs in probability units. Same with an exponential function and Poisson regression. The percentage effects are nice, but we'd like to also know the robustness of the ATTs on the mean to different functional form.

                  Comment


                  • #10
                    Yea that is true. But that is not an issue with jwdid but with margins.
                    So depending on the Stata version and the “method” used for the analysis (logit probit regress poisson etc) when calling for aggregations, the command calls on margins to request the defaults. However, just like with margins, one could use other predictions like predict(xb) to obtain the linear index effect.
                    It is the same reason why SE does not produce
                    vce(unconditional) SE. because it’s not the default. But one can also request that manually.

                    On the latest edition, i also added a full CRE implementation for it. So it is probably very close to what your FLEX paper proposes for both panel and repeated cross section

                    Comment

                    Working...
                    X