diff command versus reg with interactions for Difference-in-differences estimation.

Sumedha Gupta

Join Date: May 2016

Posts: 289
#1

diff command versus reg with interactions for Difference-in-differences estimation.

14 Jul 2016, 11:08

Dear All,

i am trying to understand the differences in estimation results for a difference-in-differences estimation resulting from using the user written diff command versus running it 'manually' using interaction terms. To illustrate my confusion I have run the following:

use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta
(Dataset from Card&Krueger (1994))

. diff fte, t(treated) p(t)

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 801
Baseline Follow-up
Control: 78 77 155
Treated: 326 320 646
404 397
------------------------------------------------------
Outcome var. | fte | S. Err. | t | P>|t|
----------------+---------+---------+-------+---------
Baseline | | | |
Control | 19.949 | | |
Treated | 17.065 | | |
Diff (T-C) | -2.884 | 1.135 | -2.54 | 0.011**
Follow-up | | | |
Control | 17.542 | | |
Treated | 17.573 | | |
Diff (T-C) | 0.030 | 1.143 | 0.03 | 0.979
| | | |
Diff-in-Diff | 2.914 | 1.611 | 1.81 | 0.071*
------------------------------------------------------
R-square: 0.01
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

. gen did=t*treated

. reg fte did treated t, vce(robust)

Linear regression Number of obs = 801
F( 3, 797) = 1.43
Prob > F = 0.2330
R-squared = 0.0080
Root MSE = 9.003

------------------------------------------------------------------------------
| Robust
fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
did | 2.913982 1.736818 1.68 0.094 -.4952963 6.323261
treated | -2.883534 1.403338 -2.05 0.040 -5.638209 -.1288592
t | -2.40651 1.594091 -1.51 0.132 -5.535623 .7226031
_cons | 19.94872 1.317281 15.14 0.000 17.36297 22.53447
------------------------------------------------------------------------------

. reg fte did treated t, vce(cluster id)

Linear regression Number of obs = 801
F( 3, 408) = 1.89
Prob > F = 0.1305
R-squared = 0.0080
Root MSE = 9.003

(Std. Err. adjusted for 409 clusters in id)
------------------------------------------------------------------------------
| Robust
fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
did | 2.913982 1.291448 2.26 0.025 .3752599 5.452705
treated | -2.883534 1.401798 -2.06 0.040 -5.639182 -.1278858
t | -2.40651 1.207109 -1.99 0.047 -4.779439 -.0335815
_cons | 19.94872 1.318071 15.13 0.000 17.35766 22.53978
------------------------------------------------------------------------------

Although the point estimate from the 3 regressions are the same they vary in the standard errors. Am not sure I understand why the standard errors are different. I would really appreciate some insight into this.

Also, I am wondering if I wanted to include a store fixed effect can I incorporate that simply as follows:

xtset id
panel variable: id (unbalanced)

. xtreg fte did treated t, fe vce(cluster id)

Fixed-effects (within) regression Number of obs = 801
Group variable: id Number of groups = 409

R-sq: within = 0.0180 Obs per group: min = 1
between = 0.0052 avg = 2.0
overall = 0.0006 max = 4

F(2,408) = .
corr(u_i, Xb) = -0.1811 Prob > F = .

(Std. Err. adjusted for 409 clusters in id)
------------------------------------------------------------------------------
| Robust
fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
did | 2.942513 1.318861 2.23 0.026 .3499016 5.535123
treated | 1.278744 .6594305 1.94 0.053 -.0175617 2.575049
t | -2.490132 1.23446 -2.02 0.044 -4.916828 -.0634351
_cons | 16.62192 .6164617 26.96 0.000 15.41008 17.83376
-------------+----------------------------------------------------------------
sigma_u | 8.0015503
sigma_e | 6.2117819
rho | .62395631 (fraction of variance due to u_i)
------------------------------------------------------------------------------

But in this case the estimates and the se's become very different.

I will really appreciate some help.
Sincerely,
Sumedha.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

14 Jul 2016, 11:22

I cannot comment on the results from -diff-: it is a user written command and the help file does not explain how it gets its standard errors.

As for the two different -reg- commands, you have specified two different variance estimators. One is -vce(robust)- and the other is -vce(cluster id)- so, unsurprisingly, you are getting different standard errors.

As for your -xtreg- command, the main takeaway I get from the output is that something is wrong with your data! In a standard diff-in-diff design, the treatment variable t should be constant within panel variable levels (i.e. within id). Consequently, when you run -xtreg, fe- the variable t should be omitted due to collinearity. The fact that it is not tells me that you have not properly specified t in your data (or you have some id's wrong), or that you do not actually have data that are appropriate for diff-in-diff analysis. Putting that aside, for the moment, even so, you should not expect the -xtreg, fe- results to be the same as those from -reg-, nor even expect them to be roughly similar! -xtreg, fe- estimates within id effects, whereas the results from -reg- are a mixture of within id and between id effects. They don't have to be the same. They don't even have to have the same signs or order of magnitude.
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#3

14 Jul 2016, 12:49

Thank you for your prompt response Prof. Schechter. Can I please bother you with some follow up questions?
1)The example I used above uses the Card and Krueger, 1994 data. The difference and difference framework naturally takes into account fixed effects between the treated and control groups. But within the treated and control groups there are individuals who are likely to have their fixed effects, which are not accounted for by DiD estimates. So should the xtreg, instead of the reg specification, be the right estimation to account for individual fixed effects?

2) How can I test for the common (pre policy change) trend assumption if I have multiple pre and post policy time periods? I cannot illustrate this using the Card and Krueger as they have only 1 pre and 1 post policy change periods. But in the data I am using I have 19 pre policy change periods and 15 policy change periods.

Many thanks for your direction. I will be very very grateful for your help.
Sincerely,
Sumedha.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

14 Jul 2016, 13:06

But within the treated and control groups there are individuals who are likely to have their fixed effects, which are not accounted for by DiD estimates. So should the xtreg, instead of the reg specification, be the right estimation to account for individual fixed effects?

In general, yes. Fixed-effects regression will eliminate confounding (missing variable) bias for any attributes that are constant within individuals over time.

But in the data I am using I have 19 pre policy change periods and 15 policy change periods.

I'm not sure what you mean by this. Perhaps a small representative sample of your data, posted using the -dataex- command (-ssc install dataex-, -help dataex- if you do not have it) will make it clearer.

As a general remark, it is not a good idea to cite abbreviated references such as Card and Krueger 1994. That paper or book, or whatever it is, may be folklore in your discipline. But I don't think I've ever heard of it. (I have heard the names Card and Krueger--but only in posts on Statalist). This is a multidisciplinary forum, so if you need to cite a reference, it needs to be complete enough that somebody who has no idea what it is, but has library access, can get it.

In fact, since we're already on the topics of posting references and posting data examples, please take the time to read the entire FAQ. It has excellent advise on how to make the most of Statalist, including more details on those two topics.
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#5

14 Jul 2016, 14:40

Dear Prof. Schechter,

Here is the dataex example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(prescriberid treated month post did y) 1 1 1 1 1 2 1 1 2 1 1 3 1 1 3 1 1 2 1 1 4 1 1 1 1 1 5 1 1 0 2 1 1 1 1 2 2 1 2 1 1 2 2 1 3 1 1 2 2 1 4 1 1 0 2 1 5 1 1 1 3 0 1 1 0 3 3 0 2 1 0 2 3 0 3 1 0 2 3 0 4 1 0 3 3 0 5 1 0 2 4 0 1 1 0 2 4 0 2 1 0 3 4 0 3 1 0 1 4 0 4 1 0 3 4 0 5 1 0 2 end

I want to estimate a difference - in-differences model of the impact of treatment on y. I have 5 time periods, variable 'month' with months 1, 2, 3 being pre-treatment and months 4 and 5 being post treatment. I have 4 different prescriber's. I want to include prescriber fixed effect and I want to test graphically and statistically if the trend in y was same between the treated and non-treated prescribers.

My basic difference - in-differences regression is as follows:

reg y did treated month, vce(cluster prescriberid)
note: treated omitted because of collinearity

Linear regression Number of obs = 20
F( 2, 3) = 24.83
Prob > F = 0.0136
R-squared = 0.3940
Root MSE = .75049

(Std. Err. adjusted for 4 clusters in prescriberid)
------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
did | -.8 .1220736 -6.55 0.007 -1.188493 -.4115074
treated | 0 (omitted)
month | -.275 .1455635 -1.89 0.155 -.7382479 .1882479
_cons | 3.125 .4575216 6.83 0.006 1.668962 4.581038
------------------------------------------------------------------------------

Can you please advise me on how I may be able to test graphically and statistically if the trend in y was same between the treated and non-treated prescribers in the pre-treatment period.
Sincerely,
Sumedha.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

14 Jul 2016, 15:25

First, there is a problem with your data. The variable post is set to 1 in every observation. It should be 1 in those observations where month = 4 or 5, and 0 where month = 1, 2, or 3. So let's fix that first.

Code:

clear input float(prescriberid treated month y) 1 1 1 2 1 1 2 3 1 1 3 2 1 1 4 1 1 1 5 0 2 1 1 2 2 1 2 2 2 1 3 2 2 1 4 0 2 1 5 1 3 0 1 3 3 0 2 2 3 0 3 2 3 0 4 3 3 0 5 2 4 0 1 2 4 0 2 3 4 0 3 1 4 0 4 3 4 0 5 2 end gen post = (month > 3) xtset prescriberid month

Note that I do not calculate a did variable. It is not needed because we will use factor-variable notation.

The basic diff-in-diff analysis is then:

Code:

xtreg y i.treated##i.post, fe margins treated#post, noestimcheck marginsplot, xdimension(post)

The coefficient for treated#post is, of course, your D-I-D estimator of the treatment effect.

Now, you may also want to make some adjustments for time. If you think that there is a time-trend in the data, this might be done by simply adding month as a continuous covariate.

I often see on this Forum situations like this where the person posting wants to adjust for time on a month-by-month basis. This is quite problematic. Recall that treated is constant within prescriberid, so you cannot get a coefficient for treated in the fixed effects regression. This isn't a problem because the coefficient of treated, if you had it, would just represent the difference between the treated and untreated groups in the pre-treatment period. That figure is not of great importance, and it is absorbed in the prescriberid fixed effects anyway. If you add i.month to the predictors, post is now constant within month. So, due to this multicollinearity, something additional gets omitted. If you put i.month before i.treated##i.post, the post term will be omitted due to colinearity. If you put it after i.treated##i.post you will lose not the usual one, but two indicators for month. So either you lose the estimation of the change from pre to post in the untreated group (coefficient of post), or you have only an incomplete adjustment for monthly shocks. You can't have both in a fixed effects model.

So you might want to graphically explore your data by doing things like plotting the mean value of y in each group as a monthly time series. If the visual seems to show no time effects of any substance, then best to leave it out. If there appears to be a monotone trend (more or less) in y over time, then adding month as a continuous variable would be warranted. If there are one or two particular months that stand out as shocks, then indicators for just those one or two might make sensible additions to the model.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

15 Jul 2016, 07:04

Thinking about your original post, given that your real data has 19 months of data before policy change and 15 months after, there is another approach that would make sense (but would not make sense with only five months of data as in the example.) With long term analyses like this, one often wants to look at linear time trends before and after. The expectation is that both groups will exhibit parallel trends (same slope) before policy change, and diverge thereafter (different slopes). One nice way to model that is to create a linear spine for time with a knot at the time of policy change. So, let's say, for example, that the policy change took place in January 2012. You could do something like this:

Code:

mkspline pre `=tm(2012m1)' post = month xtset prescriberid xtreg i.treated##(c.pre c.post), fe margins y treated, dydx(pre post)

The output from margins will give you the slope of the y vs month regression lines in each group of prescribers both before and after the policy change, and you can compare those.

In addition, if there are particular months that encompassed other events that might be characterized as "shocks" to the system, you can also include indicator variables for those particular months to adjust for those shocks.
Comment
Sumedha Gupta

Join Date: May 2016

Posts: 289
#8

15 Aug 2016, 17:54

Dear Prof. Schechter,

you had advised me a while ago regarding a difference in difference regression (above). Here is the dataex example again:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(prescriberid treated month y) float post
1 1 1 2 0
1 1 2 3 0
1 1 3 2 0
1 1 4 1 1
1 1 5 0 1
2 1 1 2 0
2 1 2 2 0
2 1 3 2 0
2 1 4 0 1
2 1 5 1 1
3 0 1 3 0
3 0 2 2 0
3 0 3 2 0
3 0 4 3 1
3 0 5 2 1
4 0 1 2 0
4 0 2 3 0
4 0 3 1 0
4 0 4 3 1
4 0 5 2 1
end
[/CODE]

Could you please also help me figure out how I could run a quantile difference in difference regression. There is a user written command that apparently does it, but its a bit of a black box that doesn't explain much.

So far I have run the following:

reg y i.treated##i.post

Now I want to do the analysis by different quantiles of y.

I will really appreciate your help.
Sincerely,
Sumedha.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

15 Aug 2016, 19:23

I am not sure what you mean by a quantile difference-in-difference regression. There is a procedure for quantile regression, which estimates a quantile of the predicted distribution of the outcome variable, rather than the mean. The command for that in Stata is -qreg-. And you can run it with interaction terms just as you would any other Stata regression command. See -help qreg-. If that isn't what you have in mind, please provide more detail.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#10

16 Aug 2016, 19:45

One of the reasons you're getting different answers is that the commands are using slightly different datasets. Take a look at the example here that will make the sample identical.
Comment
Pedro Coelho

Join Date: Jan 2017

Posts: 20
#11

25 Aug 2019, 07:19

Just in case someone who finds/follow this thread needs it, the command -diff- has been updated since the #10 and now allows for quantile DiD and diferent SE estimators.

Best Regards,

Pedro
(StataMP 16 user)
Comment

River Huang

Join Date: Mar 2016
Posts: 1908

#12

25 Aug 2019, 19:08

Dear Pedro, Thanks for this information. Have you tried the quantile with robust option in -diff- (ssc install diff) command? It does not work in the following case.

Code:

use "cardkrueger1994.dta", clear
// linear
diff fte, t(treated) p(t) robust
// quantile (median): using qreg directly (OK)
qreg fte i.treated##i.t, q(0.5) vce(robust)
// quantile (median): using diff indirectly (not OK)
diff fte, t(treated) p(t) qdid(0.5) robust

The results are:

Code:

. use "cardkrueger1994.dta", clear
(Sample dataset from Card and Krueger (1994))

. // linear
. diff fte, t(treated) p(t) robust

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 780
            Before         After    
   Control: 76             76          152
   Treated: 314            314         628
            390            390
--------------------------------------------------------
 Outcome var.   | fte     | S. Err. |   |t|   |  P>|t|
----------------+---------+---------+---------+---------
Before          |         |         |         |
   Control      | 20.013  |         |         |
   Treated      | 17.069  |         |         |
   Diff (T-C)   | -2.944  | 1.440   | -2.04   | 0.041**
After           |         |         |         |
   Control      | 17.523  |         |         |
   Treated      | 17.518  |         |         |
   Diff (T-C)   | -0.005  | 1.037   | 0.00    | 0.996
                |         |         |         |
Diff-in-Diff    | 2.939   | 1.774   | 1.66    | 0.098*
--------------------------------------------------------
R-square:    0.01
* Means and Standard Errors are estimated by linear regression
**Robust Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1

. // quantile (median): using qreg directly (OK)
. qreg fte i.treated##i.t, q(0.5) vce(robust)
Iteration  1:  WLS sum of weighted deviations =   2629.719

Iteration  1: sum of abs. weighted deviations =   2631.375
note:  alternate solutions exist
Iteration  2: sum of abs. weighted deviations =   2622.375
note:  alternate solutions exist
Iteration  3: sum of abs. weighted deviations =   2606.375
note:  alternate solutions exist
Iteration  4: sum of abs. weighted deviations =   2604.375
note:  alternate solutions exist
Iteration  5: sum of abs. weighted deviations =   2603.875

Median regression                                   Number of obs =        780
  Raw sum of deviations 2611.125 (about 16.5)
  Min sum of deviations 2603.875                    Pseudo R2     =     0.0028

------------------------------------------------------------------------------
             |               Robust
         fte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     treated |
         NJ  |         -1    1.30912    -0.76   0.445    -3.569837    1.569837
         1.t |         -1   1.729112    -0.58   0.563    -4.394291    2.394291
             |
   treated#t |
       NJ#1  |          2   1.889586     1.06   0.290    -1.709306    5.709306
             |
       _cons |         17   1.222667    13.90   0.000     14.59987    19.40013
------------------------------------------------------------------------------

. // quantile (median): using diff indirectly (not OK)
. diff fte, t(treated) p(t) qdid(0.5) robust
option robust is not allowed
r(198);

end of do-file

r(198);

I use vce(robust) instead of robust, but it does not work either. Do you have any idea about this?

By the way, the version of -diff- is
c:\ado\plus\d\diff.ado
*! 5.0.2 Jul2018

Last edited by River Huang; 25 Aug 2019, 19:16.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Pedro Coelho

Join Date: Jan 2017

Posts: 20
#13

26 Aug 2019, 05:17

Hi River,
The help file explicitly indicates that

qdid does not support weights nor robust standard errors.

.
Can't tell how to overcome this.

Best Regards,

Pedro
(StataMP 16 user)
Comment

Announcement

diff command versus reg with interactions for Difference-in-differences estimation.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment