Help with Classical DID, Parallel trend plot

lal mohan kumar

Join Date: May 2019
Posts: 265

Help with Classical DID, Parallel trend plot

11 Apr 2024, 00:28

Dear All
I would like to run a classical DID and below is a sample dataset for illustration purpose.

Code:

 Example generated by -dataex-. For more info, type help dataex
clear
input str1 id int year float depvar byte treat int(indep_var1 indep_var2) float post
"a" 2005     50 0  66 133 0
"a" 2006     53 0  55 120 0
"a" 2007     53 0  49 241 0
"a" 2008     58 1  33 217 1
"a" 2009     65 1  21 143 1
"a" 2010     69 1  97  45 1
"a" 2011     73 1 160 196 1
"b" 2005     33 0 157 226 0
"b" 2006     33 0 230 188 0
"b" 2007     36 0  68 126 0
"b" 2008     39 1 152 217 1
"b" 2009     44 1 236 219 1
"b" 2010     46 1 196 216 1
"b" 2011     52 1 216 160 1
"c" 2005     35 0 133  93 0
"c" 2006     36 0  96 190 0
"c" 2007     38 0 231 177 0
"c" 2008     36 0  42 138 1
"c" 2009     33 0 208 236 1
"c" 2010     31 0 104 163 1
"c" 2011     31 0  26  82 1
"d" 2005     26 0 103 206 0
"d" 2006     27 0  66 155 0
"d" 2007     24 0  30  61 0
"d" 2008     24 0 234  52 1
"d" 2009     23 0 145 139 1
"d" 2010     22 0 180  32 1
"d" 2011     21 0 129  87 1
"e" 2005     66 0  38 131 0
"e" 2006     69 0 243 233 0
"e" 2007     66 0  85 211 0
"e" 2008     65 0 115  66 1
"e" 2009     64 0 213  78 1
"e" 2010     64 0 224 144 1
"e" 2011     64 0 142 237 1
"f" 2005     18 0 143  31 0
"f" 2006     18 0  64  31 0
"f" 2007     18 0 233 166 0
"f" 2008     23 1 223 158 1
"f" 2009     26 1 135  82 1
"f" 2010     29 1 171 150 1
"f" 2011     32 1 240  46 1
"g" 2005  98.82 0  42  37 0
"g" 2006 101.82 0  95  60 0
"g" 2007 101.82 0  74  82 0
"g" 2008 106.82 1  65  40 1
"g" 2009 113.82 1 100  55 1
"g" 2010 117.82 1  86  77 1
"g" 2011 121.82 1  77  44 1
"h" 2005  81.82 0  45  15 0
"h" 2006  81.82 0  91  59 0
"h" 2007  84.82 0  73  19 0
"h" 2008  87.82 1  89  71 1
"h" 2009  92.82 1  33  60 1
"h" 2010  94.82 1  54  48 1
"h" 2011 100.82 1  63  36 1
"i" 2005  83.82 0  32  16 0
"i" 2006  84.82 0  48  98 0
"i" 2007  86.82 0  10  46 0
"i" 2008  84.82 0  19  88 1
"i" 2009  81.82 0  61  27 1
"i" 2010  79.82 0  21  91 1
"i" 2011  79.82 0  48  38 1
"j" 2005  74.82 0  36  22 0
"j" 2006  75.82 0  28  53 0
"j" 2007  72.82 0  36  94 0
"j" 2008  72.82 0  32  51 1
"j" 2009  71.82 0 100  11 1
"j" 2010  70.82 0  89  82 1
"j" 2011  69.82 0  54  92 1
"k" 2005 114.82 0  23  64 0
"k" 2006 117.82 0  45  14 0
"k" 2007 114.82 0  94  23 0
"k" 2008 113.82 0  25  13 1
"k" 2009 112.82 0  39  91 1
"k" 2010 112.82 0  36  35 1
"k" 2011 112.82 0  74  18 1
"l" 2005  66.82 0  69  47 0
"l" 2006  66.82 0  24  38 0
"l" 2007  66.82 0  44  24 0
"l" 2008  71.82 1  57  35 1
"l" 2009  74.82 1  72  60 1
"l" 2010  77.82 1  36  23 1
"l" 2011  80.82 1  55  97 1
end

[/CODE]

In the above data treat is given 1 for treated firms (a,b,f,g,h,l),and 0 otherwise. post is given for treatment year which is from 2008 onwards (till 2011). Given this I started with a parallel trend plot of my depvar and I ran the following command

Code:

ssc install lgraph
lgraph depvar year, by( treat )

L_Graph.gph

I am not sure whether this graph makes any sense or not (or is it really correct). I followed it from one of post of George Ford (https://www.statalist.org/forums/for...84#post1723184)

Next I ran a DID with below code and here is what I got

Code:

. encode id, gen (ID)

. xtset ID year

Panel variable: ID (strongly balanced)
 Time variable: year, 2005 to 2011
         Delta: 1 unit

. xtreg depvar i.post##i.treat i.year, fe vce (r)
note: 0b.post#1.treat identifies no observations in the sample.
note: 1.post#1.treat omitted because of collinearity.
note: 2011.year omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =         84
Group variable: ID                              Number of groups  =         12

R-squared:                                      Obs per group:
     Within  = 0.7678                                         min =          7
     Between = 0.0000                                         avg =        7.0
     Overall = 0.0119                                         max =          7

                                                F(5,11)           =          .
corr(u_i, Xb) = -0.0969                         Prob > F          =          .

                                    (Std. err. adjusted for 12 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
      depvar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      1.post |   .0833333   1.292325     0.06   0.950    -2.761054    2.927721
     1.treat |   14.83333   .8870762    16.72   0.000     12.88089    16.78577
             |
  post#treat |
        0 1  |          0  (empty)
        1 1  |          0  (omitted)
             |
        year |
       2006  |   1.333333   .3929874     3.39   0.006     .4683738    2.198293
       2007  |   1.166667   .6146741     1.90   0.084     -.186222    2.519555
       2008  |  -4.666667   2.505364    -1.86   0.089    -10.18094    .8476017
       2009  |         -3   1.397337    -2.15   0.055    -6.075519    .0755189
       2010  |         -2   .7929615    -2.52   0.028    -3.745296   -.2547036
       2011  |          0  (omitted)
             |
       _cons |      62.41   .4615509   135.22   0.000     61.39413    63.42587
-------------+----------------------------------------------------------------
     sigma_u |  31.013364
     sigma_e |  2.7728405
         rho |  .99206962   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg depvar i.post##i.treat i.year, re
note: 0.post#1.treat identifies no observations in the sample.
note: 1.post#1.treat omitted because of collinearity.
note: 2011.year omitted because of collinearity.

Random-effects GLS regression                   Number of obs     =         84
Group variable: ID                              Number of groups  =         12

R-squared:                                      Obs per group:
     Within  = 0.7678                                         min =          7
     Between = 0.0000                                         avg =        7.0
     Overall = 0.0119                                         max =          7

                                                Wald chi2(7)      =     217.05
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      depvar | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      1.post |   .0936147   1.279556     0.07   0.942    -2.414268    2.601498
     1.treat |   14.81277   1.215402    12.19   0.000     12.43063    17.19492
             |
  post#treat |
        0 1  |          0  (empty)
        1 1  |          0  (omitted)
             |
        year |
       2006  |   1.333333   1.126038     1.18   0.236    -.8736607    3.540327
       2007  |   1.166667   1.126038     1.04   0.300    -1.040327    3.373661
       2008  |  -4.666667   1.126038    -4.14   0.000    -6.873661   -2.459673
       2009  |         -3   1.126038    -2.66   0.008    -5.206994   -.7930059
       2010  |         -2   1.126038    -1.78   0.076    -4.206994    .2069941
       2011  |          0  (omitted)
             |
       _cons |      62.41   9.277165     6.73   0.000     44.22709    80.59291
-------------+----------------------------------------------------------------
     sigma_u |  32.188195
     sigma_e |  2.7728405
         rho |  .99263376   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.
What is happening here, and why my interaction related to DID gets omitted. What is the problem with the data and what is the way to go about this. Please help in this regard as I need some assistance with respect to plot and coefficients

Attached Files

Last edited by lal mohan kumar; 11 Apr 2024, 00:33.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#2

11 Apr 2024, 01:10

Ial:
why not starting off from -xtdidregress-?

Kind regards,
Carlo
(StataNow 18.5)
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#3

11 Apr 2024, 01:36

Dear Carlo Lazzaro Thank you very much for the swift response. I am learning -xtdidregress- as I havent used it before. However, to ensure that I understand basics very clearly, I would like to start with trend plots which I dont get that we usually see in articles. I tried to use

Code:

preserve collapse (mean) depvar, by(treat year) reshape wide depvar, i(year) j(treat) graph twoway connect depvar* year if year < 2008 restore

which was suggested here ( https://www.statalist.org/forums/for...18#post1601218), but that too is not helping me as the graph looks bizzare.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17606

11 Apr 2024, 01:49

Ial:
as Stata warns you about, interactions are omitted due to no observations and perfect collinearity, respectively.
In addition, your graphs does not show any treatment year to distinguish pre from post treatment period for the treated.
Again, provided that your dataset is set up is correctly spoecified for DID, switching to -xtdidregress- would give you what you're after via the -estat trendplots- option:

Code:

. use https://www.stata-press.com/data/r18/parallelt
(Simulated data to test parallel-trends assumption)

. xtset id1

Panel variable: id1 (unbalanced)

. xtdidregress (y1 c.x1##c.x2) (treated1), group(id1) time(t1)

Treatment and time information

Time variable: t1
Control:       treated1 = 0
Treatment:     treated1 = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
         id1 |       102         98
-------------+---------------------
Time         |
     Minimum |         1          6
     Maximum |         1          6
-----------------------------------

Difference-in-differences regression                     Number of obs = 2,000
Data type: Longitudinal

                                             (Std. err. adjusted for 200 clusters in id1)
-----------------------------------------------------------------------------------------
                        |               Robust
                     y1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
------------------------+----------------------------------------------------------------
ATET                    |
               treated1 |
(Treated vs Untreated)  |   .5069426   .0220218    23.02   0.000     .4635166    .5503686
-----------------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, panel effects, and time effects.

. estat trendplots

.

Kind regards,
Carlo
(StataNow 18.5)

Comment

lal mohan kumar

Join Date: May 2019

Posts: 265
#5

11 Apr 2024, 02:15

Dear Carlo Lazzaro Thanks for the sample data and illustration (I didnt know about them). However, I am unable to understand the data as it is not in the typical panel data long form. Do you know any such sample panel data that works for classical DID. I hope I am not troubling you but if there is a panel data that amenble for Classical DID illustration, it will be extremely helpful
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#6

11 Apr 2024, 02:19

Ial:
sorry, no.
But for DID guidance before -didregress- see PowerPoint Presentation (princeton.edu)

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment

lal mohan kumar

Join Date: May 2019
Posts: 265

11 Apr 2024, 11:08

Dear Carlo Lazzaro Thanks for providing me with an excellent reference and I tried to learn from it and Stata forum and I have used the following commands

Code:

use "http://www.princeton.edu/~otorres/WDI.dta", clear
* Fake event X happens in 2009 affecting all countries
* Creating the before/after dummy variable: 0 = before, 1 =after
gen after = (year >= 2009) if !missing(year)
merge m:1 country using"http://www.princeton.edu/~otorres/Treated.dta",gen(merge1)
*The untreated units will have a missing value (".")
replace treated = 0 if treated ==.
use "http://www.princeton.edu/~otorres/WDI.dta", clear
* Fake event X happens in 2009 affecting all countries
* Creating the before/after dummy variable: 0 = before, 1 =after
gen after = (year >= 2009) if !missing(year)
merge m:1 country using"http://www.princeton.edu/~otorres/Treated.dta",gen(merge1)
*The untreated units will have a missing value (".")
replace treated = 0 if treated ==.
gen did = after * treated
encode country, gen(country1)
xtset country1 year 

*Plotting for Parallel trend
lgraph gdppc year, by( treated ) xline(2009)
Graph_Way1.gph

xtreg gdppc did imports labor i.year , fe vce(cluster country1)

Fixed-effects (within) regression               Number of obs     =      2,772
Group variable: country1                        Number of groups  =        126

R-squared:                                      Obs per group:
     Within  = 0.3057                                         min =         22
     Between = 0.0925                                         avg =       22.0
     Overall = 0.0972                                         max =         22

                                                F(24,125)         =       9.91
corr(u_i, Xb) = -0.0849                         Prob > F          =     0.0000

                             (Std. err. adjusted for 126 clusters in country1)
------------------------------------------------------------------------------
             |               Robust
       gdppc | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         did |   1083.795   553.1513     1.96   0.052    -10.96023     2178.55
     imports |   1.62e-08   5.42e-09     2.98   0.003     5.44e-09    2.69e-08
       labor |  -.0001776   .0000453    -3.92   0.000    -.0002673   -.0000878
             |
        year |
       2001  |   192.3805   33.94936     5.67   0.000     125.1905    259.5705
       2002  |   383.6992   67.48982     5.69   0.000     250.1285    517.2699
       2003  |   596.9964   96.40343     6.19   0.000     406.2021    787.7908
       2004  |   1010.178   171.1637     5.90   0.000      671.424    1348.933
       2005  |   1317.479   210.0023     6.27   0.000      901.858    1733.099
       2006  |    1720.05   274.9326     6.26   0.000     1175.925    2264.176
       2007  |   2172.455   358.9161     6.05   0.000     1462.115    2882.795
       2008  |   2208.525   364.8852     6.05   0.000     1486.372    2930.678
       2009  |   1311.352   307.3992     4.27   0.000      702.971    1919.734
       2010  |   1563.268   352.8561     4.43   0.000     864.9218    2261.614
       2011  |   1798.775   419.9763     4.28   0.000     967.5901     2629.96
       2012  |   1915.791   456.5907     4.20   0.000     1012.142    2819.441
       2013  |   2084.955   512.9638     4.06   0.000     1069.736    3100.174
       2014  |    2234.28    499.513     4.47   0.000     1245.682    3222.878
       2015  |   2345.149   409.9833     5.72   0.000     1533.741    3156.557
       2016  |   2555.844   428.4298     5.97   0.000     1707.928     3403.76
       2017  |    2841.42   472.1121     6.02   0.000     1907.051    3775.788
       2018  |   3100.052   508.6039     6.10   0.000     2093.462    4106.642
       2019  |   3284.786   513.2284     6.40   0.000     2269.043    4300.529
       2020  |   2330.943   476.1929     4.89   0.000     1388.498    3273.387
       2021  |     3034.1   517.2743     5.87   0.000      2010.35     4057.85
             |
       _cons |   13832.09   529.9392    26.10   0.000     12783.28    14880.91
-------------+----------------------------------------------------------------
     sigma_u |  18555.692
     sigma_e |  2562.3242
         rho |  .98128842   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 


. xtdidregress (gdppc imports labor) (did), group(country1) time(year)

Number of groups and treatment time

Time variable: year
Control:       did = 0
Treatment:     did = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
    country1 |        58         68
-------------+---------------------
Time         |
     Minimum |      2000       2009
     Maximum |      2000       2009
-----------------------------------

Difference-in-differences regression                     Number of obs = 2,772
Data type: Longitudinal

                             (Std. err. adjusted for 126 clusters in country1)
------------------------------------------------------------------------------
             |               Robust
       gdppc | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATET         |
         did |
   (1 vs 0)  |   1083.795   553.1513     1.96   0.052    -10.96023     2178.55
------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, panel effects, and time effects.

Code:

estat trendplots, ytitle(GDP pc)

Graph_Estat plot.gph
.
On inspection it seems to me that both, parallel plot as per way 1 and estat trendplots are same. Is that true? Also how to interpret the estat graph

Attached Files

Graph_Estat plot.gph (19.1 KB, 2 views)

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#8

11 Apr 2024, 11:39

Ial:
yes, it seem so.
You can check whether the parallel trends hyposthesis is proved after -didregress- or -xtdidregress- via:

Code:

estat ptrends

The null of the test is that the trends are parallel in the pretreatement period, as it seems to be the case when visually inspecting your graphs.

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#9

11 Apr 2024, 12:08

Dear Carlo Lazzaro Thank you once again for -estat ptrends- also. However, there is a caveat with these estat plots as it works for Balanced panel only. For instance, in the same dataset if we remove one observation after the treatment (I removed Albania observation for the year 2009) then Stata shows-treatment assignment times vary; not allowed with estat ptrend. In such cases we have to use lgraph as estat trendplots wont work.
Comment

Announcement

Help with Classical DID, Parallel trend plot

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment