Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression after propensity score matching

    Hi everyone. I would need your help with the following
    I'm running a dif-in-dif analysis and the first stage is to match each observation in the treated group with one ob in the control group by nearest neighbour propensity score.
    For simplicity, I use the following sample data

    use http://ssc.wisc.edu/sscc/pubs/files/psm,replace

    ( a treatment indicator t, covariates x1 and x2, and an outcome y)


    Then, I use psmatch2 for propensity score match:

    psmatch2 t x1 x2, out(y) logit

    Now I have new id (generated by stata as _id) of treated observations and id of the matched control observations for each pair. After dropping obs in the control group that are not matched with any obs in the treated group, I now have a new sample

    Next, I want to run a regression to test the effect of the treatment and I want the variable t (1 for treated and 0 for control) to capture the difference between treated and control for each pair (that was matched before in the propensity score match). I got confused at this stage because if I simply run:

    reg y t x1 x2

    then what t captures is the average difference between the whole treated group and the whole control group, instead of the difference for each pair.

    Can you please suggest how I can solve this.

    Thank you so much

  • #2
    So, you need the variable which identifies each pair. I haven't used -psmatch2- in a long time and I don't remember how you get that variable. BUt, let's assume you have it and it's called pairid. You also presumably have a subject_id for each person (or firm, or whatever they are). Then you have to account for the pairing as follows:

    Code:
    mixed y t x1 x2 || pair_id: || subject_id
    BUT, there is another problem. You said you want to do a difference in differences analysis. That regression equation doesn't do that. So you also need another variable that indicates pre- and post- onset of treatment status, call it pre_post. Then what you want is:

    Code:
    mixed y i.t##i.pre_post x1 x2 || pair_id: || subject_id:
    The DID estimator of the treatment effect will be the coefficient of 1.t#.pre_post. The best way to understand the results, though, is to look at predicted outcomes in each group both before and after the onset of treatment. -margins t#pre_post- will give you that.



    Comment


    • #3
      Thank you so much Clyde
      I forgot to include the variable for pre and post treatment. Sorry about that.
      I just want to clarify that I do not have the pair id
      -psmatch2- provide _id, which is the id number for all observations (bot treated and control), and next to the column _id is the column _n1, which contains the id number of the obs that being matched with this obs. Since we match each ob in the treated group with 1 ob in the control group, the value of _n1 for obs in control group is missing.
      for example:
      firm A (treated) has _id=668 is paired with firm B (control) with _id=48
      therefore, Firm A value for _n1 =48 while firm B has value for _n1=.

      In this case what should I do?
      I'm thinking that I should create a variable that specify pair id, so that both firm A and B, since being paired with each other, will have the same pair id. However I'm still struggling with that. Can you give me some hints?
      Thank you

      Comment


      • #4
        So this should do it:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(_id _n1)
        32 31
        28 27
        17 18
         2  1
        18 17
        25 26
         9 10
        12 11
        15 16
         5  6
        21 22
        29 30
        19 20
        23 24
        37 38
        36 35
        13 14
        35 36
        33 34
        39 40
        27 28
        34 33
        26 25
        20 19
        38 37
        10  9
        16 15
        31 32
        30 29
        40 39
         7  8
        24 23
         8  7
         4  3
         1  2
        11 12
         6  5
         3  4
        14 13
        22 21
        end
        
        gen temp1 = min(_id, _n1)
        gen temp2 = max(_id, _n1)
        by temp1 temp2, sort: assert _N == 2
        by temp1 temp2: gen pair_id = (_n == 1)
        replace pair_id = sum(pair_id)
        drop temp1 temp2
        Note: In the toy data in that example, the pairings are 1 with 2 and 2 with 1, 3 with 4 and 4 with 3, etc. But the code in no way relies on that and will work generally.

        Comment


        • #5
          Thank you so much

          I try your code and it works smoothly if a control is being matched with only one treated.
          however, in my match, firm B (control) could be matched with both firm A and C (treated)
          For example, in the toy data, if 11 with 12 and 22 with 12.
          In this case there is an error:

          by temp1 temp2, sort: assert _N == 2
          522 contradictions in 522 by-groups
          assertion is false
          r(9);


          How should I solve this?
          thank you

          Comment


          • #6
            This gets a bit more complicated and I don't want to try to write code based on imaginary data. Please use -dataex- to post a small representative sample of your data. I only need the _id and _n variables.

            Comment


            • #7

              Sorry for the inconvenience. Below is the data sample.

              Thank you.
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(_id _n1)
              25 78
              46 78
              95 86
              34 89
              12 92
              26 41
              78  .
              86  .
              89  .
              92  .
              41  .
              32 51
              51  .
              end
              Last edited by Mia Pham; 07 Oct 2016, 18:01.

              Comment


              • #8
                So it appears in your data that _n1 is sometimes missing, but that values of _n1 can be linked to more than one value of _id. By contrast, _id is never missing, and no value of _id is ever duplicated. Relying on this assumption being true throughout your data:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int(_id _n1)
                25 78
                46 78
                95 86
                34 89
                12 92
                26 41
                78  .
                86  .
                89  .
                92  .
                41  .
                32 51
                51  .
                end
                
                isid _id //    VERIFY ASSUMPTION
                
                drop if missing(_n1)
                by _n1 (_id), sort: gen _j = _n
                reshape wide _id, i(_n1) j(_j)
                isid _n1
                gen long tuple_id = _n
                rename _n1 _id0
                reshape long _id, i(tuple_id) j(_j)
                drop if missing(_id)
                drop _j
                sort _id
                order _id, first
                should do it.

                Comment


                • #9
                  I'm sorry for confusing you. Let me try to clarify it.
                  _id is unique and never missing
                  for obs that has treat=1, we need to find a _id with treat=0 to match with this.
                  obs with treat=0, however, is a control group, and we don't need to find its pair.
                  Therefore _n1 of obs with treat=0 is missing
                  For example, in the first line: _id=12 (treat 1) is matched with _id =92 , therefore _n1 is 92
                  if you look for line 12th; you can find that _id=92 has treat=0 and _n1=.


                  In your code, _n1 missing is dropped out, but I need to keep them in my sample so that, for example, _id 12 has pairid 123, then _92 also has pairid 123

                  Thank you so much

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte(_id _n1 treat)
                  12 92 1
                  25 78 1
                  26 41 1
                  32 51 1
                  34 89 1
                  41  . 0
                  46 78 1
                  51  . 0
                  78  . 0
                  86  . 0
                  89  . 0
                  92  . 0
                  95 86 1
                  end

                  Comment


                  • #10
                    In your code, _n1 missing is dropped out, but I need to keep them in my sample so that, for example, _id 12 has pairid 123, then _92 also has pairid 123
                    But all the _id's are kept in the sample. Look carefully at the output it produces. Each value that appears in _n1, except the missing values, also appears as a value of _id (but associated with a missing value of _n1). That is preserved in my code. At the end of the code, you have a list of all _id numbers and an associated tuple-id (not pair, since sometimes there are multiple matches). To follow on your example, if you look at the output my code generates, _id 12 and _id 92 both appear there, and both have tuple_id 6 associated with them.

                    This is the layout you will need for your analysis. The _n1's serve no purpose when you get to the analysis. The data must have a variable (tuple_id) which distinguishes the various matched pairs (and triples and higher order tuples) for grouping purposes. This code creates it. And no _id is left out. (Well, an _id would be left out if it is never matched to any other, but then you don't have a matched pair or tuple for it to participate in.)

                    Comment


                    • #11
                      Thank you for your helpful explanation. I got it now. Thank you so much.
                      Have a nice weekend.

                      Comment


                      • #12
                        first of all: Many thanks to Clyde much for the responses. This has moved forward my analysis quite a bit!

                        my question
                        Do these mixed effects models make use of the paired structure of the data, which was created through matching? I.e. are the results similar to those using matched-pair differences (Ydi = Ylj- Y2j and Xdj = Xlj- X2j with j identifying the pair)?

                        background
                        I am looking at the effects of a rice farming method called SRI in my data set. To do so I am using a data set containing observational data of smallholder farms with each observation representing one farm. I have already identified factors influencing the adoption of SRI using a logit model and used psmatch2 to match households according to those variables.
                        I used the mahalinobis option to match households that are actually similar with regards to the matching variable to generate a fully blocked randomized pseudo-experiment. psmatch2 reports average treatment effects. However, I also want to report coefficients for covariates to show how this method effects different kinds of households differently (e.g. those using mechanization, those hiring external labour), following a suggestion by Rubin (1979) on combining matching with the use of regression adjustment.

                        something (potentially) useful from my side:
                        I have already calculated a pair_id (called block) and I am including my code below. As it is a bit simpler and does not necessitate reshaping, it might be useful for people less experienced with STATA and those working with large data sets that include too many variables to be reshaped.

                        *generate blocks from output
                        gen block=.
                        replace block =_id if _treated==0
                        replace block= _n1 if _treated==1
                        replace block=. if _weight==.

                        *check blocks
                        sort block
                        browse SRI _treated _id _n1 block _weight

                        Comment


                        • #13
                          my question
                          Do these mixed effects models make use of the paired structure of the data, which was created through matching? I.e. are the results similar to those using matched-pair differences (Ydi = Ylj- Y2j and Xdj = Xlj- X2j with j identifying the pair)?
                          By including a random intercept at the pair-id-level, the models are an appropriate matched-pair analysis for multi-level data. They are conceptually similar to doing paired t-tests when there are no covariates and no nesting involved, although they do not produce exactly the same results that a paired t-test (which is the same as a 1-sample t-test of the paired differences) would.

                          Comment


                          • #14
                            Hi, I would need your help with analyzing my data after propensity score matching. In my study, the outcome (y) is continuous, treatment (t) is binary, and covariates (x) includes all continuous, binary and categorical.
                            What I have done up to now is:

                            teffects psmatch (y) (t x1 x2 x3 x4 x5 …. x10)

                            The result shows the number of obs=7,288, min=1, and mix= 5.

                            Then, I examined overlap and balances:

                            teffects overlap
                            tebalance summarize
                            tebalance density
                            tebalance box

                            There is no issue with them.
                            now I want to run a regression to test the effect of the treatment, but I do not know how to run it. In my data browser, there are no new variables to indicate the matched cases or new id (as Mia said above). I use STATA 15.1.
                            Could you please suggest how I can figure out this problem?
                            Thank you so much in advance,

                            Comment


                            • #15
                              Hi Miriam,
                              In case you are still pondering: You could use the psmatch2 command instead of teffects psmatch. Then you should be able to use the procedure I described above.

                              Comment

                              Working...
                              X