Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Difference-in-Differences Analysis

    Dear All,

    I am currently facing challenges regards the use of DiD. I have a longitudinal data with two waves only (2009/2010 and 2014/2015). I am to analyze the changes in wealth due to child fostering. The fostering variable is an indicator variable with 1= foster, 0= non-foster. I constructed a wealth index using the MCA.

    Is it appropriate to run a DiD by defining households who received foster children as my treatment variable and households who did not as the control variable with my time variable as year>2014 (being the follow-up data)?

    Code:
    gen time= (combdate >= td(05mar2014)) & !missing(year)
    gen treatment = (foster_status==1) & !missing(foster_status)

    My model is specified as follows:
    Wit=B0 + B1Treatment + B2Time + B3Time*Treatment

    Your advise would be highly appreciated.
    Regards,
    Stephen

  • #2
    You have the right idea, assuming that the variable foster_status is always 0 or always 1 in all observations of the same household. But the code is not quite correct. It should be:

    Code:
     
    gen time= (combdate >= td(05mar2014)) if !missing(combdate)
    gen treatment = (foster_status==1) if !missing(foster_status)
    By the way, the code to create the variable treatment can be even further simplified to:
    Code:
    gen treatment = 1.foster_status

    Comment


    • #3
      I am very grateful for the response, Clyde.

      Comment


      • #4
        Follow-up question
        After running the following code:

        areg wealth treated did, a(foster_status) vce(robust)

        where treated= treatment and did= time*treated, following from previous post.

        The results showed that my 'did' variable was significant, but after adding other variables to the model, my 'did' variable became insignificant.

        Is this a valid output or am i doing something wrong?

        note: treated omitted because of collinearity

        Linear regression, absorbing indicators Number of obs = 35,245
        F( 1, 35242) = 32.03
        Prob > F = 0.0000
        R-squared = 0.0007
        Adj R-squared = 0.0006
        Root MSE = 0.7149

        ------------------------------------------------------------------------------
        | Robust
        wealth | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        treated | 0 (omitted)
        did | -.3566215 .0630114 -5.66 0.000 -.4801257 -.2331172
        _cons | -.2865537 .0038511 -74.41 0.000 -.2941019 -.2790054
        -------------+----------------------------------------------------------------
        foster_sta~s | absorbed (2 categories)



        areg wealth did treated c.child_age##c.child_age numchild i.gender c.age##c.age i.
        > marital_status i.educqual i.region i.urbrur avghhsize_adj i.hhinc i.time, a(foster
        > _status) vce(robust)
        note: treated omitted because of collinearity

        Linear regression, absorbing indicators Number of obs = 31,341
        F( 20, 31319) = 108.11
        Prob > F = 0.0000
        R-squared = 0.1286
        Adj R-squared = 0.1280
        Root MSE = 0.6414

        -----------------------------------------------------------------------------------
        | Robust
        wealth | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        ------------------+----------------------------------------------------------------
        did | -.0333492 .0609341 -0.55 0.584 -.1527824 .0860841
        treated | 0 (omitted)
        child_age1 | .0416836 .0049415 8.44 0.000 .0319979 .0513692
        |
        c.child_age1#|
        c.child_age1 | -.0020881 .0002823 -7.40 0.000 -.0026414 -.0015348
        |
        numchild | .0479884 .0024866 19.30 0.000 .0431146 .0528621
        |
        gender |
        5. Female | -.0181797 .0073199 -2.48 0.013 -.032527 -.0038323
        age | .0029435 .0007537 3.91 0.000 .0014662 .0044208
        |
        c.age#c.age | -.0000313 8.61e-06 -3.63 0.000 -.0000481 -.0000144
        |
        marital_status |
        1. married | -.0544142 .0124341 -4.38 0.000 -.0787855 -.0300428
        2. Divorced/se.. | -.1742315 .0252655 -6.90 0.000 -.2237529 -.12471
        |
        educqual |
        2. Basic | .0558607 .0115662 4.83 0.000 .0331905 .0785309
        3. Secondary | -.0214995 .0174891 -1.23 0.219 -.0557789 .0127799
        4. Post-Second.. | .0787309 .0188486 4.18 0.000 .041787 .1156748
        5. Other | .0667422 .0099745 6.69 0.000 .0471919 .0862926
        |
        region |
        1. North | -.1965014 .0183841 -10.69 0.000 -.2325349 -.1604679
        |
        urbrur |
        5. Rural | -.0443188 .0091567 -4.84 0.000 -.0622662 -.0263714
        avghhsize_adj | -.06198 .0020363 -30.44 0.000 -.0659713 -.0579887
        |
        hhinc |
        low_income | -.0070525 .0123708 -0.57 0.569 -.0312998 .0171949
        middle_income | .159465 .020203 7.89 0.000 .1198664 .1990636
        high_income | .1238394 .0201587 6.14 0.000 .0843276 .1633512
        |
        1.time | -.2914502 .0065949 -44.19 0.000 -.3043765 -.2785239
        _cons | -.0371366 .0231017 -1.61 0.108 -.0824168 .0081436
        ------------------+----------------------------------------------------------------
        foster_status | absorbed (2 categories)

        Comment


        • #5
          There are several things going on here. First, you should never be surprised when you change a model and a coefficient changes. You wouldn't be able to solve omitted variable bias by including the variable if that weren't the case, right? Also, always remember that the difference between statistically significant and not statistically significant is not, itself statistically significant. So the real issue here is, how much did the interaction coefficient change, and for what specific reason did it do that.

          In this case, the coefficient of did changed from about -.36 to -.03, which is a pretty large change. So we might wonder which variable or combination of variables that was added to the model accounted for that.

          But before we sink a lot of energy into that, let's look more carefully at whether we did the analysis correctly in the first place. You did this analysis absorbing foster_status. But that's not right. You have longitudinal data, so what you need to absorb here is the variable that identifies individuals (or households, or whatever each observation represents) in your study. That will give you a different value of the did coefficient (with or without the other covariates). So first let's fix that error and then we can see what the new versions of the did coefficient are and perhaps look into the source of the change further.

          Comment


          • #6
            Great, thank you very much Clyde. Will work on these and revert.

            Comment


            • #7

              FPrimary is the ID for households in my data, when i absorb that, this is what i get.

              Code:
              areg wealth treated did, a(FPrimary) vce(robust)
              
              Linear regression, absorbing indicators         Number of obs     =     35,245
                                                              F(   0,  29826)   =          .
                                                              Prob > F          =          .
                                                              R-squared         =     1.0000
                                                              Adj R-squared     =     1.0000
                                                              Root MSE          =     0.0000
              
              ------------------------------------------------------------------------------
                           |               Robust
                    wealth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                   treated |          0  (omitted)
                       did |          0  (omitted)
                     _cons |  -.2882232          .        .       .            .           .
              -------------+----------------------------------------------------------------
                  FPrimary |   absorbed                                    (5417 categories)

              Besides, i just read from https://www.stata.com/manuals13/rareg.pdf that the "absorb(varname) specifies the categorical variable, which is to be included in the regression as if it were specified by dummy variable"

              what must i do to resolve this?
              Thank you.

              Comment


              • #8
                Something is wrong with the way you have coded treated or did, and also the time variable needs to be in the model.

                It is expected that treated will be omitted, because it will be constant within FPrimary. But did should not be constant within primary: at least for the treated group it should be 0 before treatment and 1 afterwards. The fact that it was absorbed says that you have that wrong.

                In any case, hand coding a did variable is just an invitation to make a mistake. First check that your treated and time variables are correct. Then do it as:
                Code:
                areg wealth i.treated##i.time, areg(FPrimary) vce(cluster FPrimary)
                and you should be fine.

                The variable treated will be omitted, but time, and the treated#time interaction will not be.

                Comment


                • #9
                  Thanks Clyde, I am really grateful for your prompt responses.

                  I was doing everything wrong from the beginning, individual observations from my data is 35, 245 and the number of households is 5417. Since i am doing a household level analysis. I have collapsed my data to be able to do that.

                  Meanwhile, I have tried using the above suggested code.

                  Code:
                  areg wealth i.treated##i.time, areg(FPrimary) vce(cluster FPrimary)
                  but i keep getting the error (r198): "groupvar() required"
                  
                  To get around this, i used this code and it worked:
                  
                   reg wealth i.time##i.treated, vce(robust)
                  
                  Linear regression                               Number of obs     =      5,417
                                                                  F(3, 5413)        =     190.60
                                                                  Prob > F          =     0.0000
                                                                  R-squared         =     0.0086
                                                                  Root MSE          =     .96818
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                        wealth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                        1.time |  -.3421045   .0144504   -23.67   0.000    -.3704332   -.3137759
                     1.treated |  -.0479491   .0885624    -0.54   0.588    -.2215669    .1256688
                               |
                  time#treated |
                          1 1  |   .0471244   .0888942     0.53   0.596    -.1271439    .2213928
                               |
                         _cons |  -.0876378   .0144169    -6.08   0.000    -.1159007   -.0593749
                  ------------------------------------------------------------------------------
                  
                  and then as i include other variables of interest i obtain this:
                  
                  reg wealth i.time##i.treated c.child_age##c.child_age numchild i.gender c.age##c.a ge i.marital_status i.educqual i.region i.urbrur avghhsize_adj i.hhinc, vce(robust)
                  
                  Linear regression                               Number of obs     =      5,417
                                                                  F(21, 5395)       =      34.77
                                                                  Prob > F          =     0.0000
                                                                  R-squared         =     0.1859
                                                                  Root MSE          =      .8788
                  
                  -----------------------------------------------------------------------------------
                                    |               Robust
                             wealth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  ------------------+----------------------------------------------------------------
                             1.time |  -.6091334   .0446078   -13.66   0.000    -.6965827   -.5216841
                          1.treated |   .0764924    .080838     0.95   0.344    -.0819827    .2349674
                                    |
                       time#treated |
                               1 1  |  -.1567473   .0901331    -1.74   0.082    -.3334445    .0199499
                                    |
                          child_age |  -.0102547   .0111167    -0.92   0.356    -.0320479    .0115386
                                    |
                        c.child_age#|
                        c.child_age |   .0012477   .0006483     1.92   0.054    -.0000232    .0025187
                                    |
                           numchild |   .2211294   .0137352    16.10   0.000     .1942028    .2480561
                                    |
                             gender |
                            Female  |  -.2098426   .0324123    -6.47   0.000    -.2733838   -.1463013
                                age |    .017957   .0046023     3.90   0.000     .0089345    .0269794
                                    |
                        c.age#c.age |  -.0000842   .0000449    -1.87   0.061    -.0001722    3.86e-06
                                    |
                     marital_status |
                           married  |    .141425   .0366668     3.86   0.000     .0695432    .2133068
                  Divorced/separ~d  |   .0184881   .0479958     0.39   0.700    -.0756031    .1125794
                                    |
                           educqual |
                             Basic  |   .0202279   .0349925     0.58   0.563    -.0483716    .0888274
                         Secondary  |  -.0725679   .0473287    -1.53   0.125    -.1653514    .0202155
                    Post-Secondary  |   .0756635   .0557805     1.36   0.175    -.0336888    .1850158
                             Other  |   .0558392   .0350746     1.59   0.111    -.0129212    .1245995
                                    |
                             region |
                             North  |  -.0903384    .039311    -2.30   0.022    -.1674039   -.0132729
                                    |
                             urbrur |
                             Rural  |  -.1103094   .0274132    -4.02   0.000    -.1640503   -.0565685
                      avghhsize_adj |   -.220579   .0117509   -18.77   0.000    -.2436156   -.1975424
                                    |
                              hhinc |
                        low_income  |   .1958051   .0388891     5.03   0.000     .1195668    .2720434
                     middle_income  |   .0754736   .0420423     1.80   0.073    -.0069462    .1578933
                       high_income  |   .1168129   .0515738     2.26   0.024     .0157075    .2179183
                                    |
                              _cons |  -.1972036   .1134548    -1.74   0.082    -.4196208    .0252137

                  Comment


                  • #10
                    So, you have a lot of covariates there, and some of them have effects that are pretty large (relative to the interaction coefficient). Again, I want to remind you that when you add or remove variables and change a model, everything is up for grabs and the variables that are in both models can look very different. There is nothing wrong with that, and as I pointed out in #5, it is what makes it possible to deal with omitted variable bias.

                    So there isn't any real reason you need to pursue this. But if you are curious which covariate(s) are leading to the change in the interaction term, you can just try re-running the model several times, each time omitting one of the covariates from the full model, and see what happens to the interaction coefficient in each case. It may turn out, by the way, that no one covariate on its own is largely responsible--it might be some combination of them, but chasing down which combination would be an enormous amount of work.

                    In addition to thinking about the changes resulting from adding or removing covariates as being about omitted-variable bias, you can also think of it as an example of Simpson's paradox (sometimes called Lord's paradox I the context of regression.) There's a very readable explainer of this on Wikipedia, and it may help you understand whether it is the adjusted or unadjusted model that properly answers your research question.

                    Comment


                    • #11
                      Alright, Clyde. Thank you once again, you have been really helpful. Will look at what you suggested on the "curiosity bit"

                      Comment


                      • #12
                        Please, how do i interprete the coefficients of my full DiD model. Do i use the ordinary interpretations of an OLS model?
                        Again, what about these other effects, using margins..

                        Code:
                         margins, dydx(time) at(treated=(0 1))
                        
                        Conditional marginal effects                    Number of obs     =      5,417
                        Model VCE    : Robust
                        
                        Expression   : Linear prediction, predict()
                        dy/dx w.r.t. : 1.time
                        
                        1._at        : treated         =           0
                        
                        2._at        : treated         =           1
                        
                        ------------------------------------------------------------------------------
                                     |            Delta-method
                                     |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        1.time       |
                                 _at |
                                  1  |  -.3421045   .0144504   -23.67   0.000    -.3704332   -.3137759
                                  2  |  -.2949801   .0877118    -3.36   0.001    -.4669305   -.1230297
                        ------------------------------------------------------------------------------
                        Note: dy/dx for factor levels is the discrete change from the base level.
                        
                        
                         margins time#treated
                        
                        Adjusted predictions                            Number of obs     =      5,417
                        Model VCE    : Robust
                        
                        Expression   : Linear prediction, predict()
                        
                        ------------------------------------------------------------------------------
                                     |            Delta-method
                                     |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        time#treated |
                                0 0  |  -.0876378   .0144169    -6.08   0.000    -.1159007   -.0593749
                                0 1  |  -.1355869    .087381    -1.55   0.121    -.3068889    .0357151
                                1 0  |  -.4297424   .0009838  -436.81   0.000    -.4316711   -.4278137
                                1 1  |   -.430567   .0076104   -56.58   0.000    -.4454865   -.4156475
                        ------------------------------------------------------------------------------


                        Graph.gph

                        Attached Files

                        Comment


                        • #13
                          The interpretation of coefficients of non-interacted variables is the same as you are accustomed to. For variables that participate in interactions, it is different. The key thing to remember is that if you have a treated#time interaction in the model, then the coefficient of treated is no longer "the effect of treated" and the coefficient of time is no longer "the effect of time." In fact, in such a model, there is no such thing as "the effect of" either treatment or time. Rather, there are two such effects for each of those variables: one for when the other variable is 0 and another for when the other variable is 1. You can calculate them from the regression output. But it's easier to get them from -margins-. In the output you show in #12, you can see that when treated = 1, the expected difference in outcome between time = 0 and time = 1 is -.34, whereas when treated = 0, the expected change with time is -.29, The second table of -margins- output you show tells you what the expected values of the outcome variable are for each combination of time and treated.

                          Note, by the way, that instead of using the -at()- option in your first -margins- command, you could have written -margins treated, dydx(time)-. The results would have been the same, but instead of being labeled with values of _at, which you then have to cross-reference to the output above the table, they would have been labeled with the actual values of treated. Your results sugggest that at time 0 the two groups strat out with slightly different average outcome values, but by time 1 they converge to the same value (though that value is much more negative in both cases.) Of course we can't make too much out of that, because the difference between the outcomes at time 0 is pretty small, and the confidence intervals substantially overlap. So a more conservative reading of this would be simply that both groups show a marked decrease in the expected outcome over time, and they differ very little at either time.

                          Comment


                          • #14
                            Thank you so much for this very clear explanation, Clyde.

                            Comment

                            Working...
                            X