Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Propensity Score Matching (PSM) + Difference-in-Difference (DID) regression with control variables

    Hi there,

    I have two-period balanced panel data (200 individuals in both periods), with which I have to estimate the effect of binary treatment "treated" on outcome variable y (I also have two continuous IVs: x1 and x2).


    I need to implement PSM 3 nearest neighbor matching (I do this with -psmatch2-), and thereafter perform a DID regression with the conditioning variables used to estimate the propensity score included as control variables in this regression.


    I see many people just using the weights constructed by -psmatch2- in the regression. However, this does not take into account which treatment id is matched with which 3 control id's.


    My questions are:

    1) Am I correct in thinking that just using the weights in the DID regression is not enough?

    2) How do I solve this problem? As stated before, x1 and x2 need to be added as controls in the DID regression as well.

    3) Does the method used to solve this change when kernel matching is used?



    Example Dataset
    Code:
    /* install -xfill- by typing net from https://www.sealedenvelope.com/ and clicking on the name */
    use https://www.stata-press.com/data/r17/parallelt, clear
    keep if inlist(t1,5,6)
    rename id1 id 
    rename t1 t 
    rename y1 y 
    replace t = t-5
    generate treated = treated1 if t==1
    xfill treated, i(id)
    order id treated t y x1 x2
    keep id-x2


    Example of what I mean with just using weights in the regression
    Code:
    psmatch2 treated x1 x2 if t==0, n(3) caliper(0.1) common 
    xfill _weight, i(id)
    regress y i.treated##i.t x1 x2 [aw=_weight], robust cluster(id)

  • #2
    I have the same concern, wish someone could help!

    Comment


    • #3
      Using the weights is enough for estimating average treatment effects (ATE), or ATE on the treated. Suppose subjects in set A are matched with those in set B: A1 is matched with B1, A2 is matched with B2, ... You may compute the treatment effect of each pair of subjects (Y_A1 - Y_B1, Y_A2 - Y_B2, ...), and then average them for ATE, or equivalently, compute average of Y_A and average of Y_B and make a difference -- the latter does not require the information of who is being matched with whom. In other words, the last three lines of code in #1 are appropriate.

      Comment


      • #4
        Fei Wang , can I ask you a favor.
        I have panel data. And I want to do a PSM then run a DID regression. But, I am confusing that:
        - which year can I use for matching (pre or post-treatment)?
        - After matching, can I drop the unmatched observations?
        - And for the matching method, which test can I use for checking it?
        thank you.

        Comment


        • #5
          Originally posted by CAO DUC SON View Post
          Fei Wang , can I ask you a favor.
          I have panel data. And I want to do a PSM then run a DID regression. But, I am confusing that:
          - which year can I use for matching (pre or post-treatment)?
          - After matching, can I drop the unmatched observations?
          - And for the matching method, which test can I use for checking it?
          thank you.
          1. Only pre-treatment years can be used for matching.

          2. You don't need to manually drop unmatched observations. If you match with -psmatch2- (from SSC), it automatically assigns zero weight to unmatched obs, and what you need to do is simply a DiD regression with weights.

          3. You need to check if pre-treatment characteristics are sufficiently similar between treatment and control groups (balancing test).

          Comment


          • #6
            Thank you Fei wang, I am more clear now
            Can I keep the panel data and run psmatch2... Or should I match in the pre-treatment data and then append it to post-treatment?

            Comment


            • #7
              Originally posted by CAO DUC SON View Post
              Thank you Fei wang, I am more clear now
              Can I keep the panel data and run psmatch2... Or should I match in the pre-treatment data and then append it to post-treatment?
              You may refer to #1, the last three lines of code. The OP matches treatment and control groups based on the first period of characteristics (also the pre-treatment period), and then pass the weights obtained from -psmatch2- onto other periods. Finally, run the DiD with weights. So yes, you may keep the panel data and run -psmatch2-, but only run it for the pre-treatment periods. Things may be more complicated if you have multiple pre-treatment periods, one way is to -psmatch2- the cross-sectional version of the panel data (after -reshape wide-). I show a simple example here. If you have a balanced panel data with 10 periods (t=1, 2, ..., 10). Variable d = 1 indicating treatment group and d = 0 for control group. Suppose treatment occurs at t = 5 (pre-treatment periods include t = 1, 2, 3, and 4). Then you reshape wide the panel data to a cross section, where x and z are time varying variables whose pre-treatment values are used for matching.

              Code:
              reshape wide x z, i(id) j(t)
              Then run -psmatch2- on the cross-sectional data, where x1-x4 are the values of x at t = 1, ..., 4, and z1-z4 are the values of z at t = 1, .., 4. Variable w is a time-invariant variable used for matching.

              Code:
              psmatch2 d x1 x2 x3 x4 z1 z2 z3 z4 w, ties
              After that you will obtain a variable _weight. Then reshape long the data back to panel.

              Code:
              reshape long x z, i(id) j(t)
              You'll find each panel id has the same weights through all periods. Some have zero or missing weights and they are going to be dropped automatically in DiD regression. Finally, run the DiD as below (d is the treatment indicator and p is the pre-post indicator).

              Code:
              xtset id
              xtreg y c.d#c.p covariates i.t [aw=_weight], fe vce(cluster id)

              Comment


              • #8
                Dear Fei wang,
                Thank you for your enthusiasm. I am clear now.

                Comment


                • #9
                  Fei Wang
                  Kindly assist,
                  I have panel data set t=2, i want to run PSM( Kernel and nearest neighbour) then DID( Binary outcome)
                  1. Do I match using the pre-treatment period? here is the command i use
                  psmatch2 Treatm age marstatus educ houssize child income2a if post==0,out(HWT_effctv) kernel kerneltype(normal) ate common
                  2. After matching to run DID I use logistic regression, as my outcome variable is binary.
                  logit HWT_effctv post##_treated[aw=_weight],noomitted vce(cluster Id) is this correct?
                  3. It seems analytical weight is not allowed? how to proceed forward?

                  Comment


                  • #10
                    It is an interesting read. It seems simple when we have 2 time period. But what happens when there is three time periods in the sence when some county get treated only in third time periods (There are of course other counties who gets treated in 2 period). How to handle this?

                    Comment


                    • #11
                      Fei Wang Can you explain a bit more about the w variable in your code ?

                      Comment

                      Working...
                      X