Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • diff command versus reg with interactions for Difference-in-differences estimation.

    Dear All,

    i am trying to understand the differences in estimation results for a difference-in-differences estimation resulting from using the user written diff command versus running it 'manually' using interaction terms. To illustrate my confusion I have run the following:

    use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta
    (Dataset from Card&Krueger (1994))

    . diff fte, t(treated) p(t)

    DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
    Number of observations in the DIFF-IN-DIFF: 801
    Baseline Follow-up
    Control: 78 77 155
    Treated: 326 320 646
    404 397
    ------------------------------------------------------
    Outcome var. | fte | S. Err. | t | P>|t|
    ----------------+---------+---------+-------+---------
    Baseline | | | |
    Control | 19.949 | | |
    Treated | 17.065 | | |
    Diff (T-C) | -2.884 | 1.135 | -2.54 | 0.011**
    Follow-up | | | |
    Control | 17.542 | | |
    Treated | 17.573 | | |
    Diff (T-C) | 0.030 | 1.143 | 0.03 | 0.979
    | | | |
    Diff-in-Diff | 2.914 | 1.611 | 1.81 | 0.071*
    ------------------------------------------------------
    R-square: 0.01
    * Means and Standard Errors are estimated by linear regression
    **Inference: *** p<0.01; ** p<0.05; * p<0.1

    . gen did=t*treated

    . reg fte did treated t, vce(robust)

    Linear regression Number of obs = 801
    F( 3, 797) = 1.43
    Prob > F = 0.2330
    R-squared = 0.0080
    Root MSE = 9.003

    ------------------------------------------------------------------------------
    | Robust
    fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    did | 2.913982 1.736818 1.68 0.094 -.4952963 6.323261
    treated | -2.883534 1.403338 -2.05 0.040 -5.638209 -.1288592
    t | -2.40651 1.594091 -1.51 0.132 -5.535623 .7226031
    _cons | 19.94872 1.317281 15.14 0.000 17.36297 22.53447
    ------------------------------------------------------------------------------

    . reg fte did treated t, vce(cluster id)

    Linear regression Number of obs = 801
    F( 3, 408) = 1.89
    Prob > F = 0.1305
    R-squared = 0.0080
    Root MSE = 9.003

    (Std. Err. adjusted for 409 clusters in id)
    ------------------------------------------------------------------------------
    | Robust
    fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    did | 2.913982 1.291448 2.26 0.025 .3752599 5.452705
    treated | -2.883534 1.401798 -2.06 0.040 -5.639182 -.1278858
    t | -2.40651 1.207109 -1.99 0.047 -4.779439 -.0335815
    _cons | 19.94872 1.318071 15.13 0.000 17.35766 22.53978
    ------------------------------------------------------------------------------

    Although the point estimate from the 3 regressions are the same they vary in the standard errors. Am not sure I understand why the standard errors are different. I would really appreciate some insight into this.

    Also, I am wondering if I wanted to include a store fixed effect can I incorporate that simply as follows:

    xtset id
    panel variable: id (unbalanced)

    . xtreg fte did treated t, fe vce(cluster id)

    Fixed-effects (within) regression Number of obs = 801
    Group variable: id Number of groups = 409

    R-sq: within = 0.0180 Obs per group: min = 1
    between = 0.0052 avg = 2.0
    overall = 0.0006 max = 4

    F(2,408) = .
    corr(u_i, Xb) = -0.1811 Prob > F = .

    (Std. Err. adjusted for 409 clusters in id)
    ------------------------------------------------------------------------------
    | Robust
    fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    did | 2.942513 1.318861 2.23 0.026 .3499016 5.535123
    treated | 1.278744 .6594305 1.94 0.053 -.0175617 2.575049
    t | -2.490132 1.23446 -2.02 0.044 -4.916828 -.0634351
    _cons | 16.62192 .6164617 26.96 0.000 15.41008 17.83376
    -------------+----------------------------------------------------------------
    sigma_u | 8.0015503
    sigma_e | 6.2117819
    rho | .62395631 (fraction of variance due to u_i)
    ------------------------------------------------------------------------------

    But in this case the estimates and the se's become very different.

    I will really appreciate some help.
    Sincerely,
    Sumedha.

  • #2
    I cannot comment on the results from -diff-: it is a user written command and the help file does not explain how it gets its standard errors.

    As for the two different -reg- commands, you have specified two different variance estimators. One is -vce(robust)- and the other is -vce(cluster id)- so, unsurprisingly, you are getting different standard errors.

    As for your -xtreg- command, the main takeaway I get from the output is that something is wrong with your data! In a standard diff-in-diff design, the treatment variable t should be constant within panel variable levels (i.e. within id). Consequently, when you run -xtreg, fe- the variable t should be omitted due to collinearity. The fact that it is not tells me that you have not properly specified t in your data (or you have some id's wrong), or that you do not actually have data that are appropriate for diff-in-diff analysis. Putting that aside, for the moment, even so, you should not expect the -xtreg, fe- results to be the same as those from -reg-, nor even expect them to be roughly similar! -xtreg, fe- estimates within id effects, whereas the results from -reg- are a mixture of within id and between id effects. They don't have to be the same. They don't even have to have the same signs or order of magnitude.

    Comment


    • #3
      Thank you for your prompt response Prof. Schechter. Can I please bother you with some follow up questions?
      1)The example I used above uses the Card and Krueger, 1994 data. The difference and difference framework naturally takes into account fixed effects between the treated and control groups. But within the treated and control groups there are individuals who are likely to have their fixed effects, which are not accounted for by DiD estimates. So should the xtreg, instead of the reg specification, be the right estimation to account for individual fixed effects?

      2) How can I test for the common (pre policy change) trend assumption if I have multiple pre and post policy time periods? I cannot illustrate this using the Card and Krueger as they have only 1 pre and 1 post policy change periods. But in the data I am using I have 19 pre policy change periods and 15 policy change periods.

      Many thanks for your direction. I will be very very grateful for your help.
      Sincerely,
      Sumedha.

      Comment


      • #4
        But within the treated and control groups there are individuals who are likely to have their fixed effects, which are not accounted for by DiD estimates. So should the xtreg, instead of the reg specification, be the right estimation to account for individual fixed effects?
        In general, yes. Fixed-effects regression will eliminate confounding (missing variable) bias for any attributes that are constant within individuals over time.

        But in the data I am using I have 19 pre policy change periods and 15 policy change periods.
        I'm not sure what you mean by this. Perhaps a small representative sample of your data, posted using the -dataex- command (-ssc install dataex-, -help dataex- if you do not have it) will make it clearer.

        As a general remark, it is not a good idea to cite abbreviated references such as Card and Krueger 1994. That paper or book, or whatever it is, may be folklore in your discipline. But I don't think I've ever heard of it. (I have heard the names Card and Krueger--but only in posts on Statalist). This is a multidisciplinary forum, so if you need to cite a reference, it needs to be complete enough that somebody who has no idea what it is, but has library access, can get it.

        In fact, since we're already on the topics of posting references and posting data examples, please take the time to read the entire FAQ. It has excellent advise on how to make the most of Statalist, including more details on those two topics.

        Comment


        • #5
          Dear Prof. Schechter,

          Here is the dataex example:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(prescriberid treated month post did y)
          1 1 1 1 1 2
          1 1 2 1 1 3
          1 1 3 1 1 2
          1 1 4 1 1 1
          1 1 5 1 1 0
          2 1 1 1 1 2
          2 1 2 1 1 2
          2 1 3 1 1 2
          2 1 4 1 1 0
          2 1 5 1 1 1
          3 0 1 1 0 3
          3 0 2 1 0 2
          3 0 3 1 0 2
          3 0 4 1 0 3
          3 0 5 1 0 2
          4 0 1 1 0 2
          4 0 2 1 0 3
          4 0 3 1 0 1
          4 0 4 1 0 3
          4 0 5 1 0 2
          end
          I want to estimate a difference - in-differences model of the impact of treatment on y. I have 5 time periods, variable 'month' with months 1, 2, 3 being pre-treatment and months 4 and 5 being post treatment. I have 4 different prescriber's. I want to include prescriber fixed effect and I want to test graphically and statistically if the trend in y was same between the treated and non-treated prescribers.

          My basic difference - in-differences regression is as follows:

          reg y did treated month, vce(cluster prescriberid)
          note: treated omitted because of collinearity

          Linear regression Number of obs = 20
          F( 2, 3) = 24.83
          Prob > F = 0.0136
          R-squared = 0.3940
          Root MSE = .75049

          (Std. Err. adjusted for 4 clusters in prescriberid)
          ------------------------------------------------------------------------------
          | Robust
          y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          did | -.8 .1220736 -6.55 0.007 -1.188493 -.4115074
          treated | 0 (omitted)
          month | -.275 .1455635 -1.89 0.155 -.7382479 .1882479
          _cons | 3.125 .4575216 6.83 0.006 1.668962 4.581038
          ------------------------------------------------------------------------------

          Can you please advise me on how I may be able to test graphically and statistically if the trend in y was same between the treated and non-treated prescribers in the pre-treatment period.
          Sincerely,
          Sumedha.

          Comment


          • #6
            First, there is a problem with your data. The variable post is set to 1 in every observation. It should be 1 in those observations where month = 4 or 5, and 0 where month = 1, 2, or 3. So let's fix that first.

            Code:
            clear
            input float(prescriberid treated month y)
            1 1 1 2
            1 1 2 3
            1 1 3 2
            1 1 4 1
            1 1 5 0
            2 1 1 2
            2 1 2 2
            2 1 3 2
            2 1 4 0
            2 1 5 1
            3 0 1 3
            3 0 2 2
            3 0 3 2
            3 0 4 3
            3 0 5 2
            4 0 1 2
            4 0 2 3
            4 0 3 1
            4 0 4 3
            4 0 5 2
            end
            
            gen post = (month > 3)
            
            xtset prescriberid month
            Note that I do not calculate a did variable. It is not needed because we will use factor-variable notation.

            The basic diff-in-diff analysis is then:

            Code:
            xtreg y i.treated##i.post, fe
            
            margins treated#post, noestimcheck
            marginsplot, xdimension(post)
            The coefficient for treated#post is, of course, your D-I-D estimator of the treatment effect.

            Now, you may also want to make some adjustments for time. If you think that there is a time-trend in the data, this might be done by simply adding month as a continuous covariate.

            I often see on this Forum situations like this where the person posting wants to adjust for time on a month-by-month basis. This is quite problematic. Recall that treated is constant within prescriberid, so you cannot get a coefficient for treated in the fixed effects regression. This isn't a problem because the coefficient of treated, if you had it, would just represent the difference between the treated and untreated groups in the pre-treatment period. That figure is not of great importance, and it is absorbed in the prescriberid fixed effects anyway. If you add i.month to the predictors, post is now constant within month. So, due to this multicollinearity, something additional gets omitted. If you put i.month before i.treated##i.post, the post term will be omitted due to colinearity. If you put it after i.treated##i.post you will lose not the usual one, but two indicators for month. So either you lose the estimation of the change from pre to post in the untreated group (coefficient of post), or you have only an incomplete adjustment for monthly shocks. You can't have both in a fixed effects model.

            So you might want to graphically explore your data by doing things like plotting the mean value of y in each group as a monthly time series. If the visual seems to show no time effects of any substance, then best to leave it out. If there appears to be a monotone trend (more or less) in y over time, then adding month as a continuous variable would be warranted. If there are one or two particular months that stand out as shocks, then indicators for just those one or two might make sensible additions to the model.

            Comment


            • #7
              Thinking about your original post, given that your real data has 19 months of data before policy change and 15 months after, there is another approach that would make sense (but would not make sense with only five months of data as in the example.) With long term analyses like this, one often wants to look at linear time trends before and after. The expectation is that both groups will exhibit parallel trends (same slope) before policy change, and diverge thereafter (different slopes). One nice way to model that is to create a linear spine for time with a knot at the time of policy change. So, let's say, for example, that the policy change took place in January 2012. You could do something like this:

              Code:
              mkspline pre `=tm(2012m1)'  post = month
              xtset prescriberid
              xtreg i.treated##(c.pre c.post), fe
              margins y treated, dydx(pre post)
              The output from margins will give you the slope of the y vs month regression lines in each group of prescribers both before and after the policy change, and you can compare those.

              In addition, if there are particular months that encompassed other events that might be characterized as "shocks" to the system, you can also include indicator variables for those particular months to adjust for those shocks.

              Comment


              • #8
                Dear Prof. Schechter,

                you had advised me a while ago regarding a difference in difference regression (above). Here is the dataex example again:


                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(prescriberid treated month y) float post
                1 1 1 2 0
                1 1 2 3 0
                1 1 3 2 0
                1 1 4 1 1
                1 1 5 0 1
                2 1 1 2 0
                2 1 2 2 0
                2 1 3 2 0
                2 1 4 0 1
                2 1 5 1 1
                3 0 1 3 0
                3 0 2 2 0
                3 0 3 2 0
                3 0 4 3 1
                3 0 5 2 1
                4 0 1 2 0
                4 0 2 3 0
                4 0 3 1 0
                4 0 4 3 1
                4 0 5 2 1
                end
                [/CODE]

                Could you please also help me figure out how I could run a quantile difference in difference regression. There is a user written command that apparently does it, but its a bit of a black box that doesn't explain much.

                So far I have run the following:

                reg y i.treated##i.post

                Now I want to do the analysis by different quantiles of y.

                I will really appreciate your help.
                Sincerely,
                Sumedha.

                Comment


                • #9
                  I am not sure what you mean by a quantile difference-in-difference regression. There is a procedure for quantile regression, which estimates a quantile of the predicted distribution of the outcome variable, rather than the mean. The command for that in Stata is -qreg-. And you can run it with interaction terms just as you would any other Stata regression command. See -help qreg-. If that isn't what you have in mind, please provide more detail.

                  Comment


                  • #10
                    One of the reasons you're getting different answers is that the commands are using slightly different datasets. Take a look at the example here that will make the sample identical.

                    Comment


                    • #11
                      Just in case someone who finds/follow this thread needs it, the command -diff- has been updated since the #10 and now allows for quantile DiD and diferent SE estimators.
                      Best Regards,

                      Pedro
                      (StataMP 16 user)

                      Comment


                      • #12
                        Dear Pedro, Thanks for this information. Have you tried the quantile with robust option in -diff- (ssc install diff) command? It does not work in the following case.
                        Code:
                        use "cardkrueger1994.dta", clear
                        // linear
                        diff fte, t(treated) p(t) robust
                        // quantile (median): using qreg directly (OK)
                        qreg fte i.treated##i.t, q(0.5) vce(robust)
                        // quantile (median): using diff indirectly (not OK)
                        diff fte, t(treated) p(t) qdid(0.5) robust
                        The results are:
                        Code:
                        . use "cardkrueger1994.dta", clear
                        (Sample dataset from Card and Krueger (1994))
                        
                        . // linear
                        . diff fte, t(treated) p(t) robust
                        
                        DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
                        Number of observations in the DIFF-IN-DIFF: 780
                                    Before         After    
                           Control: 76             76          152
                           Treated: 314            314         628
                                    390            390
                        --------------------------------------------------------
                         Outcome var.   | fte     | S. Err. |   |t|   |  P>|t|
                        ----------------+---------+---------+---------+---------
                        Before          |         |         |         |
                           Control      | 20.013  |         |         |
                           Treated      | 17.069  |         |         |
                           Diff (T-C)   | -2.944  | 1.440   | -2.04   | 0.041**
                        After           |         |         |         |
                           Control      | 17.523  |         |         |
                           Treated      | 17.518  |         |         |
                           Diff (T-C)   | -0.005  | 1.037   | 0.00    | 0.996
                                        |         |         |         |
                        Diff-in-Diff    | 2.939   | 1.774   | 1.66    | 0.098*
                        --------------------------------------------------------
                        R-square:    0.01
                        * Means and Standard Errors are estimated by linear regression
                        **Robust Std. Errors
                        **Inference: *** p<0.01; ** p<0.05; * p<0.1
                        
                        . // quantile (median): using qreg directly (OK)
                        . qreg fte i.treated##i.t, q(0.5) vce(robust)
                        Iteration  1:  WLS sum of weighted deviations =   2629.719
                        
                        Iteration  1: sum of abs. weighted deviations =   2631.375
                        note:  alternate solutions exist
                        Iteration  2: sum of abs. weighted deviations =   2622.375
                        note:  alternate solutions exist
                        Iteration  3: sum of abs. weighted deviations =   2606.375
                        note:  alternate solutions exist
                        Iteration  4: sum of abs. weighted deviations =   2604.375
                        note:  alternate solutions exist
                        Iteration  5: sum of abs. weighted deviations =   2603.875
                        
                        Median regression                                   Number of obs =        780
                          Raw sum of deviations 2611.125 (about 16.5)
                          Min sum of deviations 2603.875                    Pseudo R2     =     0.0028
                        
                        ------------------------------------------------------------------------------
                                     |               Robust
                                 fte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                             treated |
                                 NJ  |         -1    1.30912    -0.76   0.445    -3.569837    1.569837
                                 1.t |         -1   1.729112    -0.58   0.563    -4.394291    2.394291
                                     |
                           treated#t |
                               NJ#1  |          2   1.889586     1.06   0.290    -1.709306    5.709306
                                     |
                               _cons |         17   1.222667    13.90   0.000     14.59987    19.40013
                        ------------------------------------------------------------------------------
                        
                        . // quantile (median): using diff indirectly (not OK)
                        . diff fte, t(treated) p(t) qdid(0.5) robust
                        option robust is not allowed
                        r(198);
                        
                        end of do-file
                        
                        r(198);
                        I use vce(robust) instead of robust, but it does not work either. Do you have any idea about this?

                        By the way, the version of -diff- is
                        c:\ado\plus\d\diff.ado
                        *! 5.0.2 Jul2018
                        Last edited by River Huang; 25 Aug 2019, 19:16.
                        Ho-Chuan (River) Huang
                        Stata 19.0, MP(4)

                        Comment


                        • #13
                          Hi River,
                          The help file explicitly indicates that
                          qdid does not support weights nor robust standard errors.
                          .
                          Can't tell how to overcome this.
                          Best Regards,

                          Pedro
                          (StataMP 16 user)

                          Comment

                          Working...
                          X