Difference-in-Difference (DID) Analysis and more independent varibales

Arine Thongsathitya

Join Date: Jul 2016

Posts: 1
#1

Difference-in-Difference (DID) Analysis and more independent varibales

22 Jul 2016, 20:46

Hello everyone,

I've seen that many papers always show that the basic model for DID is Y=a + B1(POST) + B2(Treated) + B3(POST*Treated) + e, however, is it possible that I add more independent variables to this regression model in order to reduce the bias? I am a bit confused that I have only seen previous studies applying the basic DID model but not with additional variables.

In case I would like to see whether the introduction of the new accounting standard for leases (IFRS16) in 2013 will affect the amount of operating leases (OL), my treated group is then firms who are affected by IFRS (equal 1) and my control group is firms who are using their local GAAP (equal 0) and the post period will be years after 2013 (equal 1) otherwise 0.

Initially, my regression model will be: OL = a + B1(IFRS16) + B2(POST) + B3(IFRS16*POST) + e.

However, I also think that the amount of OL might also possibly be affect by other factors such as the tax rates or the debt-to-equity ratios. In this case, can I also add TAXRATE and DERATIO as additional independent variables as below?

OL = a + B1(IFRS16) + B2(POST) + B3(IFRS16*POST) + B4(TAXRATE) + B5(DERATIO) + e.

I have never done DID before so I am now struggling and am not sure how to start this in Stata.

I would really appreciate if someone can help me out.

Cheers,
Arine
Tags: difference-in-difference, regression, stata
Clyde Schechter

Join Date: Apr 2014

Posts: 29950
#2

23 Jul 2016, 10:23

Yes, you can add covariates to a DID analysis just as you would any other kind of regression. Evidently you want to choose your covariates with some care: they should be reasonably expected to be associated with your outcome variable, and it is best if they are also distributed differently in the intervention and control groups. Moreover if you are using panel data and a fixed effects model, the covariate has to vary within panels over time or it will be collinear with the fixed effects and will be dropped. But these are just the usual criteria for covariate selection; the fact that you are doing a DID analysis doesn't change any of that.
Comment
yunana arin

Join Date: Jan 2020

Posts: 4
#3

20 Jan 2020, 09:41

I am currently doing a study on the impact of the implementation of Basel III Net Stable Funding ratio (NSFR) on Return on Assets (ROA) of commercial banks. I am using the difference in difference design. The Dependent Variable is Return on Asset and the Independent Variable is NSFR. There are 6 Control Variables namely Size, Funding Structure, Cost-to=Income ratio, Overhead cost, Non-Interest-Income(NII) and HHI. The scope of the study is 2009 to 2018 with Pre-implementation period of 2009-2013 and post implementation period pf 2014-2018. The treatment group is made up of 8 banks with international banking licence and the control group is made of 6 banks with national licence.
what difference in difference regression command do I use to incorporate the dependent variable, independent variable and the control variables. Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29950
#4

20 Jan 2020, 09:47

The command will almost certainly be -xtreg-. (-help xtreg-) And you will be best off if you use factor-variable notation (-help fvvarlist-). Difference in differences analyses involve using an interaction between the treatment variable and the pre-post implementation variable. The -margins- command makes it easier to interpret the results. (See
https://www3.nd.edu/~rwilliam/stats/Margins01.pdf for Richard Williams' excellent introduction to -margins-.)

But the details depend on the names of your variables, and maybe on other details. If you need a more specific response, you need to show example data, using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

yunana arin

Join Date: Jan 2020
Posts: 4

01 Mar 2020, 07:44

Dear Stata users,
To check for parallel trend assumption, I need to execute a graphical representation of the pre-treatment trend of the dependent variable (ROA). I created 2 dummies variable to take care of time variable and treatment/control variable. The Pre-treatment period is 2009 t0 2013 and the post treatment period is 2014 to 2018. The treatment group is made up of Banks A-G (represented by 1) and Banks in the control group (represented by 1) are Banks H-N. I have generated the pre-treatment period (represented by 0) and post-treatment period (represented by 1). Can someone help me to do this in Stata. Below is my set of data.
Please it is urgent.
Thank you.
Yunana Arin

. edit

. *(5 variables, 140 observations pasted into data editor)

. drop roe

. tsset id year
panel variable: id (strongly balanced)
time variable: year, 2009 to 2018
delta: 1 unit

. gen post=(year>=2014) & !missing(year)

. gen treatment=(id<8) & !missing(id)

. gen did=treatment*post

. dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str1 bank int year float(roa post treatment did)
 1 "A" 2009  1.61 0 1 0
 1 ""  2010   .97 0 1 0
 1 ""  2011  1.05 0 1 0
 1 ""  2012  2.57 0 1 0
 1 ""  2013  1.97 0 1 0
 1 ""  2014  2.05 1 1 1
 1 ""  2015  2.54 1 1 1
 1 ""  2016  1.98 1 1 1
 1 ""  2017  1.46 1 1 1
 1 ""  2018  1.92 1 1 1
 2 "B" 2009   .12 0 1 0
 2 ""  2010  1.48 0 1 0
 2 ""  2011 -1.28 0 1 0
 2 ""  2012  1.66 0 1 0
 2 ""  2013   1.6 0 1 0
 2 ""  2014  1.89 1 1 1
 2 ""  2015   .41 1 1 1
 2 ""  2016  1.22 1 1 1
 2 ""  2017   .73 1 1 1
 2 ""  2018  1.05 1 1 1
 3 "C" 2009   .33 0 1 0
 3 ""  2010  1.17 0 1 0
 3 ""  2011   .53 0 1 0
 3 ""  2012  1.96 0 1 0
 3 ""  2013   .71 0 1 0
 3 ""  2014  1.16 1 1 1
 3 ""  2015  1.13 1 1 1
 3 ""  2016   .42 1 1 1
 3 ""  2017  1.29 1 1 1
 3 ""  2018  1.33 1 1 1
 4 "D" 2009   .17 0 1 0
 4 ""  2010  1.24 0 1 0
 4 ""  2011   .71 0 1 0
 4 ""  2012  2.28 0 1 0
 4 ""  2013  1.75 0 1 0
 4 ""  2014  2.06 1 1 1
 4 ""  2015   .07 1 1 1
 4 ""  2016   .26 1 1 1
 4 ""  2017   .87 1 1 1
 4 ""  2018  1.07 1 1 1
 5 "E" 2009  2.65 0 1 0
 5 ""  2010  3.39 0 1 0
 5 ""  2011  3.22 0 1 0
 5 ""  2012  5.03 0 1 0
 5 ""  2013  4.28 0 1 0
 5 ""  2014  4.01 1 1 1
 5 ""  2015  3.94 1 1 1
 5 ""  2016  4.16 1 1 1
 5 ""  2017  5.01 1 1 1
 5 ""  2018  5.62 1 1 1
 6 "F" 2009 -5.88 0 1 0
 6 ""  2010 10.68 0 1 0
 6 ""  2011 -7.88 0 1 0
 6 ""  2012   .39 0 1 0
 6 ""  2013   .62 0 1 0
 6 ""  2014  2.64 1 1 1
 6 ""  2015  1.35 1 1 1
 6 ""  2016  1.23 1 1 1
 6 ""  2017   .89 1 1 1
 6 ""  2018  1.24 1 1 1
 7 "G" 2009   .15 0 1 0
 7 ""  2010  -.12 0 1 0
 7 ""  2011  -.45 0 1 0
 7 ""  2012  2.26 0 1 0
 7 ""  2013  1.76 0 1 0
 7 ""  2014  1.73 1 1 1
 7 ""  2015  2.17 1 1 1
 7 ""  2016  2.06 1 1 1
 7 ""  2017  1.91 1 1 1
 7 ""  2018  1.61 1 1 1
 8 "H" 2009  1.56 0 0 0
 8 ""  2010  1.98 0 0 0
 8 ""  2011  2.09 0 0 0
 8 ""  2012  3.87 0 0 0
 8 ""  2013  3.03 0 0 0
 8 ""  2014  2.65 1 0 0
 8 ""  2015  2.64 1 0 0
 8 ""  2016  2.62 1 0 0
 8 ""  2017  3.11 1 0 0
 8 ""  2018  3.25 1 0 0
 9 "I" 2009  6.54 0 0 0
 9 ""  2010   3.5 0 0 0
 9 ""  2011  2.56 0 0 0
 9 ""  2012  4.03 0 0 0
 9 ""  2013  4.01 0 0 0
 9 ""  2014  3.86 1 0 0
 9 ""  2015  2.44 1 0 0
 9 ""  2016   4.2 1 0 0
 9 ""  2017  5.36 1 0 0
 9 ""  2018  4.16 1 0 0
10 "J" 2009 -1.29 0 0 0
10 ""  2010   .36 0 0 0
10 ""  2011  1.78 0 0 0
10 ""  2012   .59 0 0 0
10 ""  2013    .8 0 0 0
10 ""  2014  1.68 1 0 0
10 ""  2015   .63 1 0 0
10 ""  2016   .32 1 0 0
10 ""  2017   1.1 1 0 0
10 ""  2018  1.39 1 0 0
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 140 observations
Use the count() option to list more

.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29950

01 Mar 2020, 12:04

Code:

collapse (mean) roa (first) post, by(treatment year)
xtset treatment year
xtline roa if post == 0, overlay

Comment

yunana arin

Join Date: Jan 2020

Posts: 4
#7

02 Mar 2020, 06:12

Thank you for the advice. It was really helpful. However, how do I differentiate the line for treatment group from that for the control group (such as use of dotted line, dash line or continuous line)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29950
#8

02 Mar 2020, 11:47

-xtline- accepts most -graph twoway- options. In particular, the -lpattern()- option gives you control over the appearance of the line. See -help connect options- for details.
Comment
yunana arin

Join Date: Jan 2020

Posts: 4
#9

03 Mar 2020, 04:09

Thank you for the advice regarding the lpattern()- option. However despite going through the help connect options, I have problem stating the code to achieve the different lines differentiating line representing that of treatment group and that of control group on the graph. Kindly help me with the exact code to use. Thank you. Yunana Arin
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29950
#10

03 Mar 2020, 11:44

Code:

xtline roa if post == 0, overlay plot1opts(lpattern(solid)) plot2opts(lpattern(dash))
Comment
Guest
#11

04 Mar 2020, 02:48

hello Clyde Schechter i am also handling some analysis relating to RCT trials with dataset cashtransfers.dta, each observation is a household. The variable treat denotes whether a household received a transfer. The variable spillover denotes whether a household was a control household living in a treatment village. The variable purecontrol denotes whether a household was a control household in a control village.

a. Using the data in cashtransfers.dta, determine whether the household-level randomization was successful. 2. The primary outcome variables for the project at hand are 3 measures of household consumption, and 2 measures of psychological wellbeing (taken from an interview with the head of household.)
a. The measures of psychological wellbeing are questions taken from the World Values Survey.
How would you transform these variables so that they can be more readily interpreted?
b. Evaluate the effect of the cash transfers on each outcome.

any guide there, was thinking of doing a regression or ttest to evaluate whether successful
regress cons_total household_id if treat==1 or ttest cons_total if treat==1, by( gender ) or ttest cons_total if treat==0, by(gender)
pertaining transforming, i was doing
gen wellbeing = rowtotal (wvs_happiness wvs_life_sat) to combine the two variables to measure wellbeing, any guide.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29950
#12

04 Mar 2020, 11:03

The questions raised in #11 are unrelated to the topic of this thread. Please re-post your question in a new thread. It is important that threads stay on topic so that the many people who search for answers to specific questions, or who browse the Forum with interest in specific topics can find what they are looking for and don't waste time reading through extraneous posts.

Before you do re-post, please read the FAQ for excellent advice on how to ask questions that have a high probability of getting a timely and helpful response. In particular,

1. you are unlikely to get useful advice without using the -dataex- command to show example data.
2. explain which variable names in the example data correspond to the outcome and predictors of interest are
3. explain the meaning of the variables you are interested in working with (referring to the World Values Survey requires somebody to either already be familiar with that survey, or else acquire knowledge about it before responding to you--that will deter most Forum members from taking an interest in your question.)
4. do not address your question to any particular person. There are many Forum members who answer questions here, and you should be interested in getting a response from whoever can do it quickest and best.
Comment

Announcement