parallel trend assumption

Omar Shaher

Join Date: Feb 2019

Posts: 164
#1

parallel trend assumption

09 Aug 2021, 18:29

Dear Statalists,
I have a quick question, please.
I have an unbalanced panel dataset over a period from 2000 to 2010. A standard was issued, and accordingly, firms started to adopt these standards, but the adoption is not compulsory, I mean there is no specific cut-off point, where the adoption process was simultaneous. To clarify, a group of firms adopted in 2005, while others adopted in 2006, and so on.
I am using the generalized DID, and I have read that there is something called parallel trend assumption and based on what I understood that I can investigate the parallel trend assumption when the adoption at a specific cut off point, but I think I can’t use it in my case because there is no specific cut-off point. Do you think I am correct?
Thanks.
Tags: None
Maria Boutchkova

Join Date: May 2016

Posts: 103
#2

10 Aug 2021, 04:54

Hi Omar,
This is not a Stata question.
What you are asking is basic shock-based causal inference practice. There are numerous papers and textbooks explaining the steps required to ensure reliable results. See, for example:
Imbens, G., & Rubin, D. B. (2015). Causal inference: For statistics, social and biomedical sciences : An Introduction. https://doi-org.ezproxy.is.ed.ac.uk/...O9781139025751
On your case of gradual adoption, see the section discussing "encouragement" design in randomised trails in:
Atanasov, V., & Black, B. (2016). Shock-Based Causal Inference in Corporate Finance and Accounting Research. Critical Finance Review, 5(2), 207–304. http://dx.doi.org/10.1561/104.00000036
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#3

10 Aug 2021, 09:51

Dear Maria,
Thanks very much for the answer. Much appreciated.
I apologize if this isn't a Stata question, but the thing is, about a year ago, I explained my case to decide which test I should use, and Clyde Schechter suggested the following paper:
https://www.annualreviews.org/doi/pd...-040617-013507

Clyde Schechter helped me so much in this regard and I am so grateful to him for the rest of my life.

Accordingly, I have applied the generalized DID with a two-way fixed effect. The code was as below:

Code:

ge id= _n encode Companyname, gen(COMPANY) xtset COMPANY Year, yearly xtreg EM i.Event##(c.Size c.Leverage c.growth ) i.Year, fe cluster ( COMPANY)

Where:
Event: is a binary variable coded 1 for the firm-year observations after the adoption of the standards and zero for firm-year observations before the adoption of the standards.

Kindly note that all firms by the end of the period have complied with the standards, so there is no control group, and I had to consider firm-year observation before the adoption of the standards as the control group.

I have collected the data and run the analysis, and I have written the paper, and I have submitted it to a Journal. One of the reviewers suggested conducting the parallel trend assumption or what's called the common trends assumption.

I just saw on page 457 in the paper above that both the simple and generalized DID rely on the common trend assumption, and I have read about the parallel trend assumption and it seems that although there is no statistical test for this assumption, visual inspection is useful when you have observations over many time points.
So, I am wondering what is the code to graph that?

Many thanks in advance for your help.
Comment
Maria Boutchkova

Join Date: May 2016

Posts: 103
#4

11 Aug 2021, 02:42

Hi Omar,
jumping to run a diff-in-diff before ensuring covariate balance, common support and that the parallel trends assumption holds is like regressing heart desease on coffee drinking without controlling for smoking and concluing that coffee kills. The sources I gave you in #2 are the go-to guides on proper academic practice in causal inference.
On how to construct a graph to see if the parallel trends assumption holds, this is just a timeline of averages of all your covariates and the outcome by treated and control status. This code produces such time line graphs and saves them both as stata graphs and exports them as .png. Also it assumes that your time variable is year, and that all vars have labels and uses those as titles.

Code:

* put the name of your treated indicator below: local tr_var treated * put the list of your covariates and the outcome below in place of v1 v2 v3: local vars v1 v2 v3 local v_max: word count `vars' * put the start and end years you want to graph below: local yr_st 2010 local yr_end 2019 local cond year >= `yr_st' & year <= `yr_end' forvalues i= 1/`v_max' { local v: word `i' of `vars' local lab: var label `v' egen mean_`v' = mean(`v'), by(`tr_var' year) line mean_`v' year if `tr_var' == 1 & `cond', c(L) /// || line mean_`v' year if `tr_var' == 0 & `cond', c(L) /// legend(order(1 "Treated" 2 "Controls") size(small) ) scheme(sj) /// title("`lab'", size(small)) ylabel(, labsize(vsmall) ) xlabel(`yr_st'(1)`yr_end') /// xscale(r(`yr_st' `yr_end')) xtitle("") ytitle("") /// saving(par_tr_`v', replace) graph export par_tr_`v'.png, replace } *

Last edited by Maria Boutchkova; 11 Aug 2021, 03:14.
Comment

Omar Shaher

Join Date: Feb 2019
Posts: 164

11 Aug 2021, 07:38

Dear Maria Boutchkova , thanks a bunch for providing me with the code. Greatly appreciated.

Actually, I have used it as mentioned in #4, and as below:

Code:

ge id= _n
encode Companyname, gen(COMPANY)
xtset COMPANY Year, yearly

local Event treated
local vars EM Leverage SizeAssets GrowthinTurnover
local v_max: word count `vars'
local yr_st 2010
local yr_end 2019
local cond Year >= `yr_st' & Year <= `yr_end'
forvalues i= 1/`v_max' {
local v: word `i' of `vars'
local lab: var label `v'
egen mean_`v' = mean(`v'), by(`tr_var' Year )
line mean_`v' Year if `tr_var' == 1 & `cond', c(L) ///
    || line mean_`v' Year if `tr_var' == 0 & `cond', c(L) ///
    legend(order(1 "Treated" 2 "Controls") size(small) ) scheme(sj) ///
    title("`lab'", size(small)) ylabel(, labsize(vsmall) ) xlabel(`yr_st'(1)`yr_end') ///
    xscale(r(`yr_st' `yr_end')) xtitle("") ytitle("") ///
    saving(par_tr_`v', replace)
graph export par_tr_`v'.png, replace}*

But the STATA showed

Code:

==1 invalid name
r(198);

I don't know what's wrong, could you please help.

Many thanks in advance.

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4438
#6

11 Aug 2021, 07:45

the code in #5 does not include a command line defining the local "tr_var" which is later used but, since not defined, is empty; thus, you need to define that local (see the example in #4 which does define the local)
Comment

Omar Shaher

Join Date: Feb 2019
Posts: 164

11 Aug 2021, 07:53

Hi Rich Goldstein, Thanks very much for the response. I truly appreciate it.

Okay, I have used the below code and I think I defined it as below:

Code:

local tr_var Event 
local vars AP2 Leverage SizeAssets FCP Purchases GrowthinTurnover
local v_max: word count `vars'
local yr_st 2009
local yr_end 2019
local cond Year >= `yr_st' & Year <= `yr_end'
forvalues i= 1/`v_max' {
local v: word `i' of `vars'
local lab: var label `v'
egen mean_`v' = mean(`v'), by(`tr_var' Year )
line mean_`v' Year if `tr_var' == 1 & `cond', c(L) ///
    || line mean_`v' Year if `tr_var' == 0 & `cond', c(L) ///
    legend(order(1 "Treated" 2 "Controls") size(small) ) scheme(sj) ///
    title("`lab'", size(small)) ylabel(, labsize(vsmall) ) xlabel(`yr_st'(1)`yr_end') ///
    xscale(r(`yr_st' `yr_end')) xtitle("") ytitle("") ///
    saving(par_tr_`v', replace)
graph export par_tr_`v'.png, replace}*

The Stata showed me the following message:

Code:

option / not allowed
r(198);

What do you think?

Comment

Maria Boutchkova

Join Date: May 2016
Posts: 103

11 Aug 2021, 08:00

Stata does not like a / somewhere, get rid of the tripple slashes /// like so:

Code:

forvalues i= 1/`v_max' {
local v: word `i' of `vars'
local lab: var label `v'
egen mean_`v' = mean(`v'), by(`tr_var' Year )
line mean_`v' Year if `tr_var' == 1 & `cond', c(L) || line mean_`v' Year if `tr_var' == 0 & `cond', c(L) legend(order(1 "Treated" 2 "Controls") size(small) ) scheme(sj) title("`lab'", size(small)) ylabel(, labsize(vsmall) ) xlabel(`yr_st'(1)`yr_end') xscale(r(`yr_st' `yr_end')) xtitle("") ytitle("") saving(par_tr_`v', replace)
graph export par_tr_`v'.png, replace
}

And note that the closing curly parenthesis of the loop sits alone on a new line.

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4438
#9

11 Aug 2021, 08:04

returning to #7, the way to find which line is causing a problem is to use the -trace- command; see

Code:

help trace

in general, you also want to set -tracedepth- to a small number (I usually start with 1 and see if that gives me enough information)
Comment

Omar Shaher

Join Date: Feb 2019
Posts: 164

#10

11 Aug 2021, 08:41

Dear Maria Boutchkova,

Thanks a million for the codes and for the answers.

Actually, I have followed your recommendations and I have used the first code as below:

Code:

ge id= _n
encode Companyname, gen(COMPANY)
xtset COMPANY Year, yearly

local tr_var Event
local vars AP2 Leverage SizeAssets FCP Purchases GrowthinTurnover
local v_max: word count `vars'
local yr_st 2009
local yr_end 2019
local cond Year >= `yr_st' & Year <= `yr_end'
forvalues i= 1/`v_max' {
local v: word `i' of `vars'
local lab: var label `v'
egen mean_`v' = mean(`v'), by(`tr_var' Year )
line mean_`v' Year if `tr_var' == 1 & `cond', c(L) 
    || line mean_`v' Year if `tr_var' == 0 & `cond', c(L) 
    legend(order(1 "Treated" 2 "Controls") size(small) ) scheme(sj) 
    title("`lab'", size(small)) ylabel(, labsize(vsmall) ) xlabel(`yr_st'(1)`yr_end') 
    xscale(r(`yr_st' `yr_end')) xtitle("") ytitle("") 
    saving(par_tr_`v', replace)
graph export par_tr_`v'.png, replace
}
*

And the Stata showed me just one graph which is between Year and AP2

I will continue in next separate post to show you the results

Attached Files

Comment

Omar Shaher

Join Date: Feb 2019
Posts: 164

#11

11 Aug 2021, 08:45

Maria Boutchkova ,

Then, I have used the following code:

Code:

forvalues i= 1/`v_max' {
local v: word `i' of `vars'
local lab: var label `v'
egen mean_`v' = mean(`v'), by(`tr_var' Year )
line mean_`v' Year if `tr_var' == 1 & `cond', c(L) || line mean_`v' Year if `tr_var' == 0 & `cond', c(L) legend(order(1 "Treated" 2 "Controls") size(small) ) scheme(sj) title("`lab'", size(small)) ylabel(, labsize(vsmall) ) xlabel(`yr_st'(1)`yr_end') xscale(r(`yr_st' `yr_end')) xtitle("") ytitle("") saving(par_tr_`v', replace)
graph export par_tr_`v'.png, replace
}

And the Stata showed me a graph for each variable, I don't know if I should do that for all variables or the graph should be just one. I will show you an example as below:

Based on the graph, I think the assumption doesn't meet, or there might be something wrong, I don't know, what do you think?

Attached Files

Comment

Maria Boutchkova

Join Date: May 2016

Posts: 103
#12

11 Aug 2021, 09:13

You should show all graphs to your reviewer. You can combine them together to safe space. But first make sure your data is properly organised and/or the graphs show what you intend. From the 2nd graph it seems to me that there is no data for the controls after 2010. How is your event variable defined? Perhaps that is not the treated indicator we expect, but a pre-post indicator?
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#13

11 Aug 2021, 09:55

Well, thanks very much for the answer Maria Boutchkova
The thing is the time frame of my sample is from 2009-2019, and firms started to adopt standards from 2013, then other groups adopted in 2014, then other groups adopted in 2015, and so on, and by 2018 all firms included in the study sample adopted the standards.
So, I have defined the Event as a binary variable coded one for all firm-year observations after the adoption of the standards and zero for all firm-year observations before the adoption. For instance, firm A has data since 2009-2019, this firm adopted in 2014, so firm-year observations from 2009-2013 coded zero, and from 2014 till 2019 coded 1.
But, I am thinking from another perspective which is:
I have found that around 52 firms adopted in 2013, while 41 adopted in 2014, and the rest of the firms adopted in 2015. So, I am planning to do divide the sample into three subgroups, i.e., those adopted in 2013, and those adopted in 2014, and 2015. And, each firm in each group have data from 2009-2019.
For instance Firm X adopted in 2015, and this firm have data from 2009 until 2019. So, in this case, Event will be coded zero for all observations from 2009-2014, and 1 coded from 2015-2019. In that way, I can say that there is a cut-off point for this group which is 2015, it is like saying that the adoption is mandatory for this group in 2015, then I can perform the analysis to see the trend before and after 2015, but I am just wondering what is the usuall code for the parallel trend assumption in case we have just one cut-off point?
Comment

Announcement