DIfference-in-Difference Estimation using xtdidregress command for panel data

Sakib Nazmus

Join Date: Jan 2022
Posts: 19

DIfference-in-Difference Estimation using xtdidregress command for panel data

13 Feb 2022, 07:35

Hi,
I have a data-set of 11 years for 2 areas one is treatment area and another is control area. My policy changed occurs in 2016. I want to find out the effect of change on the treatment area by using DID. I want to apply DID by using -xtdidregress-.
It will be helpful if any one guide me to use the -xtdidregress- for my data-set.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str9 area int year float(loggrdp logvkt logpopden cng petrol tti) byte(treat_dummy_var post_time_dummy)
"CONTROL"   2009 4.08982 3.81882 3.54331 .201 1.17 1.85141 0 0
"CONTROL"   2010 4.14094 3.82747 3.55197 .201 1.09 1.97962 0 0
"CONTROL"   2011 4.18856 3.83617 3.56066 .201 1.09 2.10783 0 0
"CONTROL"   2012 4.20421 3.84479 3.56928 .201 1.15 2.23604 0 0
"CONTROL"   2013 4.25524 3.85344 3.57793 .201 1.15 2.36425 0 0
"CONTROL"   2014 4.31695 3.86212 3.58661 .201  1.3 2.49246 0 0
"CONTROL"   2015 4.36941 3.87073 3.59522  .42  1.3 2.62067 0 0
"CONTROL"   2016  4.4244 3.87935 3.60385  .42 1.12 2.77925 0 1
"CONTROL"   2017 4.47662 3.88809 3.61258  .48 1.12 2.93784 0 1
"CONTROL"   2018   4.517 3.89674 3.62123  .48 1.12 3.09642 0 1
"CONTROL"   2019 4.56001 3.90558 3.63007 .516 1.12   3.255 0 1
"TREATMENT" 2009  4.6127 4.74174  3.9492 .201 1.17   2.272 1 0
"TREATMENT" 2010 4.66381 4.75719 3.96466 .201 1.09   2.331 1 0
"TREATMENT" 2011 4.71144 4.77263 3.98009 .201 1.09    2.39 1 0
"TREATMENT" 2012 4.72709 4.78808 3.99552 .201 1.15 2.48918 1 0
"TREATMENT" 2013 4.77812 4.80352 4.01098 .201 1.15 2.58836 1 0
"TREATMENT" 2014 4.83983 4.81897 4.02641 .201  1.3 2.68755 1 0
"TREATMENT" 2015 4.89229 4.83441 4.04186  .42  1.3 2.78673 1 0
"TREATMENT" 2016 4.94728 4.84986 4.05731  .42 1.12    2.94 1 1
"TREATMENT" 2017  4.9995  4.8653 4.07275  .48 1.12   3.286 1 1
"TREATMNET" 2018 5.03987 4.88075 4.08819  .48 1.12    3.63 1 1
"TREATMENT" 2019 5.08289 4.89612 4.10358 .516 1.12 3.97927 1 1
end

Tags: Difference-in difference, fixed effects, panel data

Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#2

13 Feb 2022, 08:52

I want to add that my dependent variable is tti and independent variables are loggrdp logpopden logvkt petrol cng.
Thank you
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 211
#3

13 Feb 2022, 09:10

Hi Sakib,

Below is the code I would use. Note that there might be a typo on the second to last observation instead of TREATMENT there is a TREATMNET. I modified this assuming it was a typo, but you know your data best. Also, I do not know if this is just a subset of your data that you provide to obtain some guidance about the syntax or if it is your full dataset. I ask because with 2 panels and 22 observations I would be skeptical of the results.

Code:

encode area, generate(narea) xtset narea year generate did = treat_dummy_var*post_time_dummy xtdidregress (tti loggrdp logpopden logvkt petrol cng)(did), group(narea) time(year)
1 like
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#4

13 Feb 2022, 09:33

Thanks for your reply. Actually you modified that right. Also this is my full data-set. Would you let me know how can i use Time fixed effect and robust standard errors clustered at the areas to deal with potential issues of heteroscedasticity ?
while i am using xtreg command for this model the command was :

Code:

xtset treat_dummy_var year xtreg tti i.treat_dummy_var##i.post_time_dummy loggrdp logpopden logvkt petrol cng , cluster(treat_dummy_var)robust

I think the result seems different from my previous analysis which i did with -xtreg- command from -xtdidregress- command.

Last edited by Sakib Nazmus; 13 Feb 2022, 09:47.
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 211
#5

13 Feb 2022, 09:45

Hi Sakib,

-xtdidregress- automatically adds time fixed effects and gives you cluster-robust standard errors at the group level. The equivalent -xtreg- command would be:

Code:

xtreg tti loggrdp logpopden logvkt petrol cng i.year did, fe vce(cluster narea)

Last edited by Enrique Pinzon (StataCorp); 13 Feb 2022, 10:02.
1 like
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#6

16 Feb 2022, 09:05

The command i have written on #4 , is that wrong or something? if wrong then please enlighten me in this matter. Also is that dataset will be a problem for DID estimation as it is small?
Thank you.
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 211
#7

16 Feb 2022, 09:38

Hi Sakib,

I think more than the specification, the issue is that you have 22 observations and 2 panels. For the within estimator that is used by -xtreg- and -xtdidregress- to work as expected you need to have a large number of panels. What we mean by a large number of panels is an asymptotic statement, but I think in your case it is not met. I would not suggest you use DID estimation with these number of observations.
1 like
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#8

16 Feb 2022, 10:56

Hello,
Would you suggest me an analysis by which i can find out the effect of the policy in the treatment area? at least how much data should require for DID estimation?
Thank you.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#9

16 Feb 2022, 16:04

For DD to make sense you need (usually) many treated units. If you have only two groups, what you're interested in is decidedly not DD, but interrupted time series/segmented regression that Ariel Linden's ITSA command handles. Either way, you want many more than two panels.

I'm most curious: What question are you studying anyways?
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#10

16 Feb 2022, 23:11

Hello,
I am trying to find out the impact of ride-sharing service in our city. This is a new service that launched back in 2016. I am evaluating the impact of the service on traffic congestion. Where i am trying to make a model which includes independent variables which generally influence congestion and then added this dummy to check is it increasing or decreasing the congestion. Also i am using some dependent variable which measures congestion intensity.
Thank you.
Comment

Sakib Nazmus

Join Date: Jan 2022
Posts: 19

#11

21 Feb 2022, 08:07

Hello,
I want a clear suggestion and advice regarding my DD analysis. My data set is given bellow:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 area int year float(loggrdp logvkt logpopden cng petrol tti co2capita co2ac logdt logcost logco2 logcostac) byte(areaid treat post) float did
"DHK" 2009  4.6127 4.74174  3.9492 .201 1.17    2.272 .082753 .197501 2.46652 2.78779 3.07056 2.01281 1 1 0 0
"DHK" 2010 4.66381 4.75719 3.96466 .201 1.09    2.331 .091581 .218571 2.52446 2.84573 3.13002 2.05529 1 1 0 0
"DHK" 2011 4.71144 4.77263 3.98009 .201 1.09     2.39 .101267 .241686 2.58201 2.90328 3.18913  2.0974 1 1 0 0
"DHK" 2012 4.72709 4.78808 3.99552 .201 1.15  2.48918 .113573 .271058 2.64576 2.96703 3.25438 2.14571 1 1 0 0
"DHK" 2013 4.77812 4.80352 4.01098 .201 1.15  2.58836 .127138 .303432 2.70863  3.0299 3.31883 2.19313 1 1 0 0
"DHK" 2014 4.83983 4.81897 4.02641 .201  1.3  2.68755 .142177 .339324 2.77097 3.09224 3.38283 2.24002 1 1 0 0
"DHK" 2015 4.89229 4.83441 4.04186  .42  1.3  2.78673 .158951 .379358 2.83308 3.15435 3.44671  2.2867 1 1 0 0
"DHK" 2016 4.94728 4.84986 4.05731  .42 1.12 3.084865 .189826 .453046 2.93002 3.25129 3.53924 2.36819 1 1 1 1
"DHK" 2017  4.9995  4.8653 4.07275  .48 1.12    3.383 .229172  .54695 3.03242  3.3537 3.63649 2.45515 1 1 1 1
"DHK" 2018 5.03987 4.88075 4.08819  .48 1.12 3.681135   .2818 .672554 3.14392 3.46519 3.74172  2.5512 1 1 1 1
"DHK" 2019 5.08289 4.89612 4.10358 .516 1.12  3.97927  .35639 .850572 3.26923  3.5905 3.85907 2.66114 1 1 1 1
"CTG" 2009 4.08982 3.81882 3.54331 .201 1.17  1.85141 .113321  .15566 1.99974 2.31025 2.65908 1.84336 2 0 0 0
"CTG" 2010 4.14094 3.82747 3.55197 .201 1.09  1.97962 .134985 .185419 2.08437 2.39488  2.7437 1.91933 2 0 0 0
"CTG" 2011 4.18856 3.83617 3.56066 .201 1.09  2.10783 .156944 .215583 2.15852 2.46904 2.81786 1.98479 2 0 0 0
"CTG" 2012 4.20421 3.84479 3.56928 .201 1.15  2.23604 .179331 .246334 2.22506 2.53557 2.88439  2.0427 2 0 0 0
"CTG" 2013 4.25524 3.85344 3.57793 .201 1.15  2.36425 .202279 .277856 2.28601 2.59652 2.94534   2.095 2 0 0 0
"CTG" 2014 4.31695 3.86212 3.58661 .201  1.3  2.49246 .225925 .310336  2.3427 2.65321 3.00203 2.14301 2 0 0 0
"CTG" 2015 4.36941 3.87073 3.59522  .42  1.3  2.62067 .250414 .343975   2.396 2.70651 3.05533 2.18771 2 0 0 0
"CTG" 2016  4.4244 3.87935 3.60385  .42 1.12  2.77925 .275274 .378123  2.4442 2.75472 3.10506 2.22728 2 0 1 0
"CTG" 2017 4.47662 3.88809 3.61258  .48 1.12  2.93784 .300237 .412414 2.48916 2.79967  3.1515 2.26351 2 0 1 0
"CTG" 2018   4.517 3.89674 3.62123  .48 1.12  3.09642 .325438  .44703 2.53137 2.84188 3.19515 2.29706 2 0 1 0
"CTG" 2019 4.56001 3.90558 3.63007 .516 1.12    3.255 .351007 .482152 2.57161 2.88212 3.23684 2.32847 2 0 1 0
end

I am following a published article which is https://papers.ssrn.com/sol3/papers....act_id=2843301 to be exact described analysis in page 8.For that analysis with my dataset i am using the bellow command in stata:

Code:

xtset areaid year

xtreg Dependent_var i.treat##i.post loggrdp logpopden logvkt petrol cng , cluster (areaid) robust

As i am not very experienced with DD so i want to know whether i am in the right direction or not in term of my analysis. Expert guidance will be highly appreciated.
ThankYou

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#12

22 Feb 2022, 12:14

Yeah this looks okay. I mean, if I were you, I would use synthetic controls, since this is superior to DD (generally speaking), but so long as you're sure your untreated units are good comparisons to the treated unit, go ahead and use this model.

EDIT: I didn't see that you only had two panels. This makes synthetic controls impossible, so you must either use interrupted time series or a simple 2 by 2 DD.

No need to use xtreg, just use

Code:

xtdidregress (tti loggrdp logpopden logvkt petrol cng)(did), group(narea) time(year)

My honest advice to you is to get more panels, get more data on untreated units. What you can do is limited by your current data structure.

Last edited by Jared Greathouse; 22 Feb 2022, 12:18.
Comment
Sakib Nazmus

Join Date: Jan 2022

Posts: 19
#13

22 Feb 2022, 23:37

Hello,

First of all thank you for your valuable advice. In this case if i use xtreg as described #11 it gives me significant results and the relation between all dependent and independent variable can be explain but if i use xtdidregress as #12 then it gives me insignificant results for the same data set also the relation between variable seems abnormal. In this case should i use #11?
Also, should i collect data of another untreated area which will makes 3 panel in total or collect data of existing panel with more time period? Also if i need to use simple 2 by 2 DD then what will be the command as collecting more data will be difficult i guess?

Thankyou

Last edited by Sakib Nazmus; 22 Feb 2022, 23:49.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#14

24 Feb 2022, 08:02

Why does it matter if the results are significant or not? My advice to you is to collect data on as many untreated units as you possibly can. If you can get data on 20 other untreated units, use those.
Comment

Announcement

DIfference-in-Difference Estimation using xtdidregress command for panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment