Parallel Trend Asumption in DIfference-in-Difference Estimation

Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#1

Parallel Trend Asumption in DIfference-in-Difference Estimation

28 Jan 2022, 10:02

Hi,
I am working on a DID model which data set is shown bellow. My DID model will be consist of a treatment area and a control area for the year from 2009 to 2019. The Independent variable is a total number of five which are loggrdp, logpopden, logvkt, cng, petrol and other variables are dependent variables except Diff, treat, post. The treatment occurred in 2016 and treat represent treatment area, post represent time of treatment, Diff is the dummy variable for treatment. In order to check the parallel trend assumption graphically and statistically i need some guidance for the command of STATA.

-

Last edited by Sakib Nazmus; 28 Jan 2022, 10:04.
Tags: difference-in-difference, parallel trends
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

28 Jan 2022, 11:01

Please read the FAQ which discusses how to ask a question.

I can't help you until I've a reproducible example right here on screen, one with context. I can't even tell if you have panel data or a unique identifier, and I also don't understand why your years are sorted the way they are. I'm not saying this to be mean, I'm saying this because I can't help you with decontextualized screenshots and until you've followed the FAQ instructions. This is an important question for your research, one I could give many thoughts on, but not with the question as its presently formatted.

If nobody's told you yet, welcome to Statalist.
2 likes
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#3

28 Jan 2022, 11:40

Sorry for the Incomplete info that i have shared. I am posting first time here. This is a panel data where DHK is treatment area and CTG is control area. STATA_DATASET.xlsx This is my data set uploaded as Excel file.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#4

28 Jan 2022, 16:02

You need to upload your data using the dataex command.
Comment

Sakib Nazmus

Join Date: Jan 2022
Posts: 19

29 Jan 2022, 03:10

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 area int year float(loggrdp logvkt logpopden cng petrol tti co2capita co2ac logdt logcost logco2 logcostac) byte(treat post diff)
"DHK" 2019 5.08289 4.89612 4.10358 .516 1.12 3.97927  .35639 .850572 3.26923  3.5905 3.85907 2.66114 1 1 1
"DHK" 2018 5.03987 4.88075 4.08819  .48 1.12 3.68113   .2818 .672554 3.14392 3.46519 3.74172  2.5512 1 1 1
"DHK" 2017  4.9995  4.8653 4.07275  .48 1.12   3.383 .229172  .54695 3.03242  3.3537 3.63649 2.45515 1 1 1
"DHK" 2016 4.94728 4.84986 4.05731  .42 1.12 3.08486 .189826 .453046 2.93002 3.25129 3.53924 2.36819 1 1 1
"DHK" 2015 4.89229 4.83441 4.04186  .42  1.3 2.78673 .158951 .379358 2.83308 3.15435 3.44671  2.2867 1 0 0
"DHK" 2014 4.83983 4.81897 4.02641 .201  1.3 2.68755 .142177 .339324 2.77097 3.09224 3.38283 2.24002 1 0 0
"DHK" 2013 4.77812 4.80352 4.01098 .201 1.15 2.58836 .127138 .303432 2.70863  3.0299 3.31883 2.19313 1 0 0
"DHK" 2012 4.72709 4.78808 3.99552 .201 1.15 2.48918 .113573 .271058 2.64576 2.96703 3.25438 2.14571 1 0 0
"DHK" 2011 4.71144 4.77263 3.98009 .201 1.09    2.39 .101267 .241686 2.58201 2.90328 3.18913  2.0974 1 0 0
"DHK" 2010 4.66381 4.75719 3.96466 .201 1.09   2.331 .091581 .218571 2.52446 2.84573 3.13002 2.05529 1 0 0
"DHK" 2009  4.6127 4.74174  3.9492 .201 1.17   2.272 .082753 .197501 2.46652 2.78779 3.07056 2.01281 1 0 0
"CTG" 2019 4.56001 3.90558 3.63007 .516 1.12   3.255 .351007 .482152 2.57161 2.88212 3.23684 2.32847 0 1 0
"CTG" 2018   4.517 3.89674 3.62123  .48 1.12 3.09642 .325438  .44703 2.53137 2.84188 3.19515 2.29706 0 1 0
"CTG" 2017 4.47662 3.88809 3.61258  .48 1.12 2.93784 .300237 .412414 2.48916 2.79967  3.1515 2.26351 0 1 0
"CTG" 2016  4.4244 3.87935 3.60385  .42 1.12 2.77925 .275274 .378123  2.4442 2.75472 3.10506 2.22728 0 1 0
"CTG" 2015 4.36941 3.87073 3.59522  .42  1.3 2.62067 .250414 .343975   2.396 2.70651 3.05533 2.18771 0 0 0
"CTG" 2014 4.31695 3.86212 3.58661 .201  1.3 2.49246 .225925 .310336  2.3427 2.65321 3.00203 2.14301 0 0 0
"CTG" 2013 4.25524 3.85344 3.57793 .201 1.15 2.36425 .202279 .277856 2.28601 2.59652 2.94534   2.095 0 0 0
"CTG" 2012 4.20421 3.84479 3.56928 .201 1.15 2.23604 .179331 .246334 2.22506 2.53557 2.88439  2.0427 0 0 0
"CTG" 2011 4.18856 3.83617 3.56066 .201 1.09 2.10783 .156944 .215583 2.15852 2.46904 2.81786 1.98479 0 0 0
"CTG" 2010 4.14094 3.82747 3.55197 .201 1.09 1.97962 .134985 .185419 2.08437 2.39488  2.7437 1.91933 0 0 0
"CTG" 2009 4.08982 3.81882 3.54331 .201 1.17 1.85141 .113321  .15566 1.99974 2.31025 2.65908 1.84336 0 0 0
end

Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

30 Jan 2022, 07:14

Okay thanks, this is so much more helpful.

Okay I see what you've got here. The issue is, you're doing a 2 by many DD study here... that is, you've only one treated unit and one comparison unit. Not bad, but not good either.

The reality of the matter is, you can't do very much here. Unless I'm mistaken, which is perfectly possible, the most you can rely on is a graphical validation of parallel trends. So I'll just do an example for CO2 per capita.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 area int year float(loggrdp logvkt logpopden cng petrol tti co2capita co2ac logdt logcost logco2 logcostac) byte(treat post diff)
"DHK" 2019 5.08289 4.89612 4.10358 .516 1.12 3.97927  .35639 .850572 3.26923  3.5905 3.85907 2.66114 1 1 1
"DHK" 2018 5.03987 4.88075 4.08819  .48 1.12 3.68113   .2818 .672554 3.14392 3.46519 3.74172  2.5512 1 1 1
"DHK" 2017  4.9995  4.8653 4.07275  .48 1.12   3.383 .229172  .54695 3.03242  3.3537 3.63649 2.45515 1 1 1
"DHK" 2016 4.94728 4.84986 4.05731  .42 1.12 3.08486 .189826 .453046 2.93002 3.25129 3.53924 2.36819 1 1 1
"DHK" 2015 4.89229 4.83441 4.04186  .42  1.3 2.78673 .158951 .379358 2.83308 3.15435 3.44671  2.2867 1 0 0
"DHK" 2014 4.83983 4.81897 4.02641 .201  1.3 2.68755 .142177 .339324 2.77097 3.09224 3.38283 2.24002 1 0 0
"DHK" 2013 4.77812 4.80352 4.01098 .201 1.15 2.58836 .127138 .303432 2.70863  3.0299 3.31883 2.19313 1 0 0
"DHK" 2012 4.72709 4.78808 3.99552 .201 1.15 2.48918 .113573 .271058 2.64576 2.96703 3.25438 2.14571 1 0 0
"DHK" 2011 4.71144 4.77263 3.98009 .201 1.09    2.39 .101267 .241686 2.58201 2.90328 3.18913  2.0974 1 0 0
"DHK" 2010 4.66381 4.75719 3.96466 .201 1.09   2.331 .091581 .218571 2.52446 2.84573 3.13002 2.05529 1 0 0
"DHK" 2009  4.6127 4.74174  3.9492 .201 1.17   2.272 .082753 .197501 2.46652 2.78779 3.07056 2.01281 1 0 0
"CTG" 2019 4.56001 3.90558 3.63007 .516 1.12   3.255 .351007 .482152 2.57161 2.88212 3.23684 2.32847 0 1 0
"CTG" 2018   4.517 3.89674 3.62123  .48 1.12 3.09642 .325438  .44703 2.53137 2.84188 3.19515 2.29706 0 1 0
"CTG" 2017 4.47662 3.88809 3.61258  .48 1.12 2.93784 .300237 .412414 2.48916 2.79967  3.1515 2.26351 0 1 0
"CTG" 2016  4.4244 3.87935 3.60385  .42 1.12 2.77925 .275274 .378123  2.4442 2.75472 3.10506 2.22728 0 1 0
"CTG" 2015 4.36941 3.87073 3.59522  .42  1.3 2.62067 .250414 .343975   2.396 2.70651 3.05533 2.18771 0 0 0
"CTG" 2014 4.31695 3.86212 3.58661 .201  1.3 2.49246 .225925 .310336  2.3427 2.65321 3.00203 2.14301 0 0 0
"CTG" 2013 4.25524 3.85344 3.57793 .201 1.15 2.36425 .202279 .277856 2.28601 2.59652 2.94534   2.095 0 0 0
"CTG" 2012 4.20421 3.84479 3.56928 .201 1.15 2.23604 .179331 .246334 2.22506 2.53557 2.88439  2.0427 0 0 0
"CTG" 2011 4.18856 3.83617 3.56066 .201 1.09 2.10783 .156944 .215583 2.15852 2.46904 2.81786 1.98479 0 0 0
"CTG" 2010 4.14094 3.82747 3.55197 .201 1.09 1.97962 .134985 .185419 2.08437 2.39488  2.7437 1.91933 0 0 0
"CTG" 2009 4.08982 3.81882 3.54331 .201 1.17 1.85141 .113321  .15566 1.99974 2.31025 2.65908 1.84336 0 0 0
end

egen id = group(area), label

xtset id year, y


graph close _all
graph drop _all


foreach v of var co2capita logcost {

tw ///
(line `v' year if id ==1, lcolor(pink)) /// Untreated
(line `v' year if id==2, lcolor(black) lwidth(medthick)), /// Treated
legend(order(1 "Untreated" 2 "Treated")) name(`v') xli(2016)
}

So, let's remind ourselves what basic PTA means: when we strip away the technical mathematics, PTA posits that the comparison unit is a good counterfactual for the treated unit, based on the idea that the units trends would continue in the same direction absent the intervention.

Looking at the graph of CO2, the untreated unit doesn't appear at all to be a good comparison unit for the treated one. The treated unit appears to have a quadratic pre-intervention trend, whereas the untreated unit appears to have a linear trend. This implies that the treated unit has a different data generating process than the untreated, and that it would not be a good comparison unit for the treated one. Similar comments might be made about logcost. The untreated unit's trend appears to be almost flattening, whereas the treated unit's trend is persistently rising. They diverge in different directions before the intervention, and that's not what we want.

Additionally, both outcomes trends for the treated unit begin to rise in 2015, a year before the intervention took place. This implies anticipation, especially when it's for both outcomes. Or put a little differently, it implies that there are dormant common factors which "wake up" in the year before the policy was passed; thus, how can we be sure it was the intervention that impacted the outcomes instead of something else? If you had more than one comparison unit, there'd be other ways to easily mitigate this issue like matching or synthetic controls, but in a two-unit setup, the most you could hope for aside from standard regression would be a simple two unit interrupted time series approach.

My advice to you is this: I don't know what topic you're studying, but you've a few options before you: either do the simple DD approach and acknowledge parallel trends violations. Or, if at all possible, get data on additional comparison units so that you can better justify PTA or get around the issue altogether by doing a simple synthetic control approach. Sakib Nazmus

Comment

Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#7

30 Jan 2022, 11:04

Thanks a Lot! Actually my treatment initially started in 2015. I am assuming it started to impact from the beginning. Actually i did a DD analysis before this PTA testing in which this treatment shows a significant positive result. The command which i used is

Code:

reg tti Diff loggrdp logpopden logvkt cng petrol i.post, robust cluster(treat)
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#8

30 Jan 2022, 11:09

Okay, so I was incorrect about anticipation then. Alright. Your specification for the DD equation is wrong, however. What you want is to interact the post variable with the treated variable, so

Code:

reg tti i.post##i.treat loggrdp logpopden logvkt cng petrol, robust cluster(treat)

accomplishes this in Stata. You don't need to create your own DD coefficient, Stata's interaction terms do that for you.
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#9

31 Jan 2022, 08:53

Would you let me know the command of ''xtdidregress'' for any of the dependent variable from my DID model?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#10

31 Jan 2022, 08:57

I don't understand the question. What do you mean?
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#11

31 Jan 2022, 09:08

I did the analysis with reg command before in stata 16. But in stata 17 there is a command xtdidregress which i want use for my analysis. It would be very helpful if you guide me in this matter.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#12

02 Feb 2022, 07:42

I've never used xtdidreg, but the help file should be pretty explicit about how to use it.
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#13

02 Feb 2022, 08:27

From your valuable advice and then after some research i realized i was missing a basic thing which was my dataset is basically a panel data but i was using cross-sectional data set command to do my analysis. which is

Code:

reg tti Diff loggrdp logpopden logvkt cng petrol i.post, robust cluster(treat)

Now i think for panel data the command should be:

Code:

gen diff =treat*post xtset treat year xtreg tti diff loggrdp logpopden logvkt cng petrol i.post, robust cluster(treat)

I want to apply time fixed effect here thats why i used i.post and and my independent variable is correlated so i used robust cluster(treat) part

let me know if my command is right or wrong for my analysis.
Thank you. Jared Greathouse

Last edited by Sakib Nazmus; 02 Feb 2022, 08:32.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#14

03 Feb 2022, 01:43

Both of these would work for DD, as would many others. You'll need to read the help files to know if one of these is right for you, I can't make that choice.
Comment

Announcement