Difference-in-difference

Felix Chappuis

Join Date: Mar 2021
Posts: 36

Difference-in-difference

27 May 2021, 09:25

Hi Statalist friends

I want to do a diff-in-diff Regression, but I get always the following error: Model is not identified. The treatment variable treated was omitted because of collinearity. How can I fix that?

Code:

 didregress (Arbeitslosenrate) (treated), group(code) time(numvar)

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float code int numvar float(Anteil_Betroffene Arbeitslosenrate) str2 anos2 str3 estu str2(nuts1 sexo) float(Anzahl_Betroffene treated post)
12 226 23.6 39.9 "05" "2" "4" "6" 89 1 0
12 230 23.6 42.2 "05" "2" "4" "6" 89 1 0
12 234 23.6 39.9 "05" "2" "4" "6" 89 1 0
12 238 23.6   42 "05" "2" "4" "6" 89 1 1
12 240 23.6 44.4 "05" "2" "4" "6" 89 1 1
49 226   .2 12.7 "05" "5" "2" "1"  1 0 0
49 230   .2  9.2 "05" "5" "2" "1"  1 0 0
49 234   .2    7 "05" "5" "2" "1"  1 0 0
49 238   .2  6.4 "05" "5" "2" "1"  1 0 1
49 240   .2  6.5 "05" "5" "2" "1"  1 0 1
end

I would be very thankful if someone could help me.

Last edited by Felix Chappuis; 27 May 2021, 10:14.

Tags: difference-in-difference, econometry, regression

Justin Niakamal

Join Date: Aug 2017
Posts: 757

27 May 2021, 09:45

Next time, please share your data using dataex and not as an attachment (see FAQ for detail). I don't have Stata 17 so I can't use didregress but you're probably looking to do something along the lines of

Code:

reg Arbeitslosenrate i.treated##i.post

      Source |       SS           df       MS      Number of obs   =        10
-------------+----------------------------------   F(3, 6)         =    243.73
       Model |  2795.41785         3   931.80595   Prob > F        =    0.0000
    Residual |  22.9383336         6  3.82305561   R-squared       =    0.9919
-------------+----------------------------------   Adj R-squared   =    0.9878
       Total |  2818.35618         9  313.150687   Root MSE        =    1.9553

------------------------------------------------------------------------------
Arbeitslos~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   1.treated |   31.03333   1.596466    19.44   0.000     27.12692    34.93975
      1.post |  -3.183333   1.784903    -1.78   0.125    -7.550834    1.184168
             |
treated#post |
        1 1  |   5.716666   2.524234     2.26   0.064    -.4599131    11.89325
             |
       _cons |   9.633333   1.128872     8.53   0.000     6.871083    12.39558
------------------------------------------------------------------------------

Comment

Felix Chappuis

Join Date: Mar 2021

Posts: 36
#3

27 May 2021, 10:15

Thank you very much. Could someone say how it works on Stata 17 with the didregress command? Would be very thankful.
1 like
Comment

Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015
Posts: 213

27 May 2021, 12:54

Dear Felix,

The issue is that the variable -treated- indicates the treated group whereas -didregress- requires that what is in the second set of parentesis be a variable that indicates which individual observations are treated. In your example this is equivalent to -1.treated*1.post-

Code:

generate indicate = 1.treated*1.post
didregress (Arbeitslosenrate) (indicate), group(code) time(numvar)

This will get you the point estimate you want. Notice, however, that the default standard errors are cluster robust standard errors, clustered at the -code- level, and they are not well defined for the data you sent. Perhaps, you just sent a subset of your data. In any case, this is what I get:

Code:

. generate indicate = 1.treated*1.post

. didregress (Arbeitslosenrate) (indicate), group(code) time(numvar)

Number of groups and treatment time

Time variable: numvar
Control:       indicate = 0
Treatment:     indicate = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
        code |         1          1
-------------+---------------------
Time         |
     Minimum |       226        238
     Maximum |       226        238
-----------------------------------

Difference-in-differences regression                        Number of obs = 10
Data type: Repeated cross-sectional

                                   (Std. err. adjusted for 2 clusters in code)
------------------------------------------------------------------------------
             |               Robust
Arbeitslos~e | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
    indicate |
   (1 vs 0)  |   5.716666          .        .       .            .           .
------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

Comment

Felix Chappuis

Join Date: Mar 2021

Posts: 36
#5

28 May 2021, 03:30

Dear Enrique
Thank you very much. It was only a simple example. Here is the result with the full dataset. Is it true that I can do that with the panel-data commend? In the beginning I had a for every time period a cross sectional dataset. Than I merched the dataset and collapsed it by groups of equal variable combinations. So, my datas tell me, how the individuals with some characteres developed. Is it true to treat my datas as a panel? Would be very thankful for an answer. Have a great day.

. xtdidregress (Arbeitslosenrate) (indicate), group(code) time(numvar)

Number of groups and treatment time

Time variable: numvar
Control: indicate = 0
Treatment: indicate = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
code | 79 50
-------------+---------------------
Time |
Minimum | 226 234
Maximum | 240 234
-----------------------------------

Difference-in-differences regression Number of obs = 512
Data type: Longitudinal

(Std. err. adjusted for 129 clusters in code)
------------------------------------------------------------------------------
| Robust
Arbeitslos~e | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ATET |
indicate |
(1 vs 0) | -1.116403 .791332 -1.41 0.161 -2.682189 .4493825
------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 213
#6

28 May 2021, 06:29

Hi Felix,

You can use the new DID command for panel data sets or repeated cross-sections. From your description it is unclear to me if you have a panel data set or a repeated cross-section. In your example, it depends on the behavior of the variable code. For example, say -code- is a person and you have repeated observations of that person across time. You have a panel. Say -code- is like a country and every year you sample a different set of individuals from the country. You have a repeated cross section.

In terms of estimation the differences between using -didregress- or -xtdidregress- are equivalent to using -regress- or -areg- vs using -xtreg ..., fe-.
Comment
Felix Chappuis

Join Date: Mar 2021

Posts: 36
#7

28 May 2021, 06:37

Hi Enrique
Now, I understand. Thank you very much for your time.
Comment

Announcement

Difference-in-difference

Comment

Comment

Comment

Comment

Comment

Comment