Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching on pre-treatment outcomes and Difference in differences

    Hello, I have an highly unbalanced panel with t=5 years and n=20k firms. I want to estimate the impact of a training on firm performance.
    I am using the following DiD specification following Callaway and Sant'Anna to account for heterogeneity of the treatment effect.

    Code:
    csdid performance i.x1 i.x2, ivar(firm) time(yr) gvar(g_yr_first_post_trai) notyet long2 method(drimp)
    The problem I have is that the parallel trend assumption is unlikely to hold in my case given that firms tend to select into the treatment if they had poor performance the last time they were observed.

    I was thinking on matching on lag outcomes prior to run the DiD estimation. Would this be a reasonable approach?

    Ideally for reasons that relate to the data generating process in my case, I would like to match on the previous observed outcome, whether is 1,2,3, years previous it does not really matter in my case.


    Do you know how I could correctly implement this?

    thank you very much for your support

    Best

  • #2
    HTML Code:
    https://drive.google.com/file/d/14yZqtkVCw6Xp_c3yESI-LRz-ho7hk2ig/view

    Comment


    • #3
      Dear George, yes I know about this paper, but red this paper, but my sense is that the literature is divided, and that there can be situations where matching on pre-treatment outcomes reduces bias e.g. https://arxiv.org/pdf/2205.08644". Do you really think this is something to be discarted? In case it is a viable option, any chance that anyone know how I could implement this?
      Last edited by Tom Ford; 05 Dec 2024, 03:57.

      Comment


      • #4
        To clarify. I am asking for some practical suggestions. I want to match of pre-treatment outcome but how can I do it? Shall I create a variable 't*_dv' for each year in my panel, taking the value of the outcome during that year. Then set for treated units all post treatment values to missing and match on that variables? before running my main csdid regression?

        thanks for clarifying this.

        Comment


        • #5
          You can use egen to create a variable that spans all years:

          egen y2011 = mean(cond(year==2011,y,.), by(cross_section)

          Or, match in year==2011, then use egen(max) to expand to the rest of the observations.

          Comment


          • #6
            Dear George, thank you this is really helpful! but I am not entirely sure on what I should do next. Say I create y2011,y2012,y2013,y2014. How do I do the matching and then the estimation? would doing something like this be corrrect?

            Code:
            * estimate probit
            probit post_trai  i.prod i.year i.y2012 i.y2013 i.y2014
            predict pscore1
            
            * calculate weigths
            levelsof yr, local(cluster)
            foreach j of local cluster {
            psmatch2 treatment if yr==`j', pscore(pscore1) outcome(y) n(20)
            replace wpscore=_weight if yr==`j'
            }
            
            * use weights in my DiD regression
            reghdfe y  L_ttt_182_* F_ttt_182_* [pweight=wpscore], absorb(prod yr) cluster(prod)
            Thank you in advance for your support on this matter

            Comment


            • #7
              These are the sorts of things I usually need to play with until I figure out specifically how to do it (depends on the data organization).

              Here are some thoughts.

              I think csdid computes the effects based on the last pre-treatment period. So that would be your pre-treatment Y year. csdid also uses matching (or can), and I think matches on Xs. If so, then I guess you could include the pre-treatment Y as an X.

              Or, maybe switch to jwdid (mundlak regression). You could use ematch (entropy matching) on the mean of pretreatment y (jwdid uses all pre-treatment values as the "base" of long differences, so the egen/cond would be all pre-treatment periods). ematch generates a weight, and you could use that weight in the jwdid.

              Comment


              • #8
                Thank you George this is helpful! I will try to play around with this a bit as see what I get. What I am a bit unsure about in using matching and entropy balancing is how to match never treated units. Obviously for them, there is no pre-treatment outcome to match with. So I am not sure what would make sense doing. Simply use their lag rating? or do this by year? But in general thanks a lot. I will continues to look into this!

                Comment


                • #9
                  Simplest is to match on observations prior to any treatment.

                  Comment

                  Working...
                  X