Difference in difference example

Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#1

Difference in difference example

24 Dec 2015, 08:34

I want to apply DID method on a economic problem. I would like to estimate the effect of financial constraints (if there is any) on investment. I want to be sure I have done everything right so I will explain my procedure and regression equation. I would just like to hear your opinion about the procedure.

(YOU CAN SKIP THIS PART AND START TO READ BOLD TEXT )In 2012, the government in Croatia pass a law named "Pre-bankruptcy law" (in short). This enable companies that are illiquid and insolvent (by some parameters) to start Pre-bankraptacy process. During the process, firms try to make agreement with their creditors, restructure their business activities and make Pre-bankruptacy agreement. The main result of the agreement is debt forgiveness and grace period for debt repayments. So, after the process ends, the financial position of the firm is improved.

Now, I would like to test the impact of the Pre-bankruptacy agreement (companies that successfully finish the hole process and reduce their debt) on investment. I thing, the best method to test relationship between better financial position (lower debt) and investment would be DD estimation, where control group are firms that didn't experience improvement in financial position.

So, the treatment group are firms that have successfully finished the process (improve financial position) and the control group are firms that are illiquid and insolvent but didn't improve their financial position. I have data on dates when firms made Pre-bankruptacy agreement. Dates fall in time interval 23.04.2013 - 02.07.2014. I have chosen to take only the time period 23.04.2013 - 31.12.2013. I define variables like this:
treatment = 1 "if firm finish the process"
t = 0 (pretreatment periods: 2011, 2012)
replace t == 1 if year == 2013 | year == 2014 (post treatment periods)

As you can see I have two time periods before the treatment and two time periods after the treatment. This s is my DID equation (i is outcome variable - investment):

xtreg "(or reg?)" i treatment 2011 2012 2013 2014 treatment#t

where 2011 2012 2013 2014 are equal to 1 if year is 2011, 2012 ...

I'm not sure if this is the right specification because:
1) I have multiple time periods in DID model (not just 2)
2) treatment doesn't occur on one specific date but during the period 23.04.2013 - 02.07.2014 (I choose to include only observations in t h period 23.04.2013 - 31.12.2013 and treatment are firms that finish Pre-bankraptacy process in that period )
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

24 Dec 2015, 09:23

I don't understand this at all. You start out by saying you want to estimate the effect of financial constraints on investment. Then you never again mention either of those. So it is unclear how any of your variables are supposed to represent those constructs.

Your explanation of treatment vs control does not make sense to me. If the treatment is going through the pre-bankruptcy procedure, then the control is those who do not go through the pre-bankruptcy procedure. You seem to be defining the control on the basis of an (intermediate?) outcome, namely improvement in financial situation. I can't understand that.

It seems to me there are a couple of different things going on. First there is the passage of the law creating the pre-bankruptcy procedure. Then there is the actual use of that procedure by some (but presumably not all) eligible firms. You need to be clear whether you are trying to estimate the effect of the passage of the law, or the effect of using the procedure. The models for this would be rather different. (In fact, to study the effect of the passage of the law, it seems to me you would have to either just do a pre-post comparison, or you could do a difference-in-differences comparison with another country that you think is reasonably similar to Croatia but didn't pass a similar law. Either approach is, of course, fraught with potential for bias.)

Even assuming you are testing the effect of the procedure, you will need to deal with the possibility that firms that choose to use the procedure may not be comparable to those that do. That is where a difference-in-differences analysis comes in to play. At least it will enable you to estimate how similar the two groups of firms were beforehand.

I don't see any mention of an outcome variable in your post. It certainly isn't shown in your regression command, and if you mentioned it in your description of the problem, I didn't recognize it.
1 like
Comment
Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#3

24 Dec 2015, 10:15

Maybe my post is a little bit confusing. I will try to explain it again by adding additional comments:

Originally posted by Clyde Schechter View Post

I don't understand this at all. You start out by saying you want to estimate the effect of financial constraints on investment. Then you never again mention either of those. So it is unclear how any of your variables are supposed to represent those constructs.

Outcome variable in regression is investment (maybe, I will also estimate the equation with some other outcome variables like employment and profits, but the baseline model contains investment as outcome variable). The model should estimate if there is causal effect between investment and financial constraint and/or liquidity constraint. Firms are financialy constrained if banks doesn't want to lend them money but they have profitable project. Firms are liquidity constrained if they doesn't have enough cash to finance operative business operations and fixed investment.

As I wrote in first post, "Pre-bankruptcy law" enable firms to start Pre-bankruptcy procedure. If the firm make an agreement with creditors, the results include debt forgiveness and grace period for debt repayments. My assumption is that firms that started the Pre-bankruptcy process were finanicaly and liquidity constrained and after the process (if it was successful) they are not costrained (or at least much less constrained). I want to see if firms invest more after they improve the financial position (decrease debt ratios and improve liquidity)

[QUOTE=Clyde Schechter;n1321091Your explanation of treatment vs control does not make sense to me. If the treatment is going through the pre-bankruptcy procedure, then the control is those who do not go through the pre-bankruptcy procedure. You seem to be defining the control on the basis of an (intermediate?) outcome, namely improvement in financial situation. I can't understand that.[/QUOTE]

Treatment group are firms that have successfully finished the Pre-bankruptcy process (experience debt forgiveness). Control group are firms that are illiquid or insolvent (in that aspect similiar), but haven't start Pre-bankruptcy process. I would also add some controls like employment, profitability etc.

Originally posted by Clyde Schechter View Post

It seems to me there are a couple of different things going on. First there is the passage of the law creating the pre-bankruptcy procedure. Then there is the actual use of that procedure by some (but presumably not all) eligible firms. You need to be clear whether you are trying to estimate the effect of the passage of the law, or the effect of using the procedure. The models for this would be rather different. (In fact, to study the effect of the passage of the law, it seems to me you would have to either just do a pre-post comparison, or you could do a difference-in-differences comparison with another country that you think is reasonably similar to Croatia but didn't pass a similar law. Either approach is, of course, fraught with potential for bias.)

Even assuming you are testing the effect of the procedure, you will need to deal with the possibility that firms that choose to use the procedure may not be comparable to those that do. That is where a difference-in-differences analysis comes in to play. At least it will enable you to estimate how similar the two groups of firms were beforehand.

as you specified, I thing I have to estimate the effect of the procedure

Originally posted by Clyde Schechter View Post

I don't see any mention of an outcome variable in your post. It certainly isn't shown in your regression command, and if you mentioned it in your description of the problem, I didn't recognize it.

In my regression equation

xtreg "(or reg?)" i treatment 2011 2012 2013 2014 treatment#t

outcome variable is i (investment_t / capital_t-1)
I suppose I should use xthreg?
I am not sure how to include multiple periods (two before and two after the treatment). In my equation I add year dummies?
In the end, I am not sure if I would have a problem if I assume post treatment period is 2013 and 2014 while the end of the procedure for treatment firms are in the time period 23.04.2013 - 31.12.2013.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

24 Dec 2015, 10:57

Well, this is much clearer. I'm still not sure I understand about financial/liquidity constraints and the relationship to the pre-bankruptcy procedure. That is, I don't quite get whether you just assume that after the procedure the firms are no longer constrained, whereas pre-procedure they were, or whether you have to actually create a new variable for constrained that you calculate from both the pre-bankruptcy procedure variable and other variables not necessarily mentioned already.

Be that as it may, let's just say that you have a correct treatment variable to work with. It appears you are restricting your analysis to firms who underwent the procedure in 2013 (after 23 April of that year): this simplifies matters considerably in that you can take 2013 to be the start of the treatment era. Stata will not calculate a main effect of your variable t because it is collinear with the year variables, but you will still get the treatment#t interaction effect, which is what you are actually interested in.

As for -xtreg- vs -reg-, you have panel data: the observations are therefore presumably correlated within firms. So -reg- would be incorrect as it treats all observations as independent. If the output from -xtreg- says that rho is near 0 and that there is no difference between -xtreg- and OLS, then if you want to go back to -reg- for simplicity, you can. But it certainly would not be appropriate to start with -reg- or use it if rho is not effectively zero. Also, -xtreg- with no options is a random effects model. If you prefer fixed effects, don't forget the -fe- option.

Some might suggest it would be better to have investment_t by itself as the independent variable, and including the lag of capital as a covariate, rather than using the ratio is the outcome. You might want to discuss this with experienced colleagues in your field: if your variable i is conventional in your field, then I suppose you should stick with it. But using outcome investment_t with lagged capital as a covariate allows for a more flexible model that does not have an implicit constraint that may not actually hold.

Finally, from a coding perspective, assuming that you are using a current version of Stata, you should take advantage of factor variable notation (-help fvvarlist-) throughout (you already use it for your interaction term). This means not creating your own indicator variables for four years, but rather having a single year variable with values 2011 through 2014, and then the code becomes something like this:

Code:

xtreg outcome i.treatment##i.t i.year // and possibly L. capital and other variables

The fact that you have four years here shouldn't matter. You still have only two eras: before and after, and the treatment#t interaction will be a single degree of freedom, so you can interpret the results in the usual way for DID. The year variables will allow you to adjust for any annual shocks to investment that are extraneous to the effects of treatment.

As for making a causal inference--that is a lot to hope for from purely observational data. Do your best to include relevant covariates that distinguish the treatment and control groups. But the possibility that important unobserved variables are at work can never be dismissed altogether.

Hope this is helpful. Please note that I have no expertise in finance or economics, I am commenting from a rather generic statistical perspective. There are numerous people on this forum who do have finance and economics expertise and they may have other substantive suggestions.
1 like
Comment
Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#5

24 Dec 2015, 13:11

Originally posted by Clyde Schechter View Post

Well, this is much clearer. I'm still not sure I understand about financial/liquidity constraints and the relationship to the pre-bankruptcy procedure. That is, I don't quite get whether you just assume that after the procedure the firms are no longer constrained, whereas pre-procedure they were, or whether you have to actually create a new variable for constrained that you calculate from both the pre-bankruptcy procedure variable and other variables not necessarily mentioned already.

I just assume that firms are constrained before they start the procedure because a firm must be insolvent or illiquid to start the pre-procedure (the firm have to prove that using financial statements etc). It is hard to say if the companies are unconstrained after the pre-procedure, but for sure they are less constrained (on average 50% of debt is written off).

Originally posted by Clyde Schechter View Post

Be that as it may, let's just say that you have a correct treatment variable to work with. It appears you are restricting your analysis to firms who underwent the procedure in 2013 (after 23 April of that year): this simplifies matters considerably in that you can take 2013 to be the start of the treatment era. Stata will not calculate a main effect of your variable t because it is collinear with the year variables, but you will still get the treatment#t interaction effect, which is what you are actually interested in.

That's right. I assume that the treatment is applied in 2013. That is, 2011, 2012 are pre-treatment periods and 2013 and 2014 are post-treatment periods.

Originally posted by Clyde Schechter View Post

As for -xtreg- vs -reg-, you have panel data: the observations are therefore presumably correlated within firms. So -reg- would be incorrect as it treats all observations as independent. If the output from -xtreg- says that rho is near 0 and that there is no difference between -xtreg- and OLS, then if you want to go back to -reg- for simplicity, you can. But it certainly would not be appropriate to start with -reg- or use it if rho is not effectively zero. Also, -xtreg- with no options is a random effects model. If you prefer fixed effects, don't forget the -fe- option.

I think "xtreg, fe" best suits this problem.

Originally posted by Clyde Schechter View Post

Some might suggest it would be better to have investment_t by itself as the independent variable, and including the lag of capital as a covariate, rather than using the ratio is the outcome. You might want to discuss this with experienced colleagues in your field: if your variable i is conventional in your field, then I suppose you should stick with it. But using outcome investment_t with lagged capital as a covariate allows for a more flexible model that does not have an implicit constraint that may not actually hold.

Most times, authors use acceleration ratio (I/K), that's why I use it here.

Originally posted by Clyde Schechter View Post

Finally, from a coding perspective, assuming that you are using a current version of Stata, you should take advantage of factor variable notation (-help fvvarlist-) throughout (you already use it for your interaction term). This means not creating your own indicator variables for four years, but rather having a single year variable with values 2011 through 2014, and then the code becomes something like this:

Code:

xtreg outcome i.treatment##i.t i.year // and possibly L. capital and other variables

The fact that you have four years here shouldn't matter. You still have only two eras: before and after, and the treatment#t interaction will be a single degree of freedom, so you can interpret the results in the usual way for DID. The year variables will allow you to adjust for any annual shocks to investment that are extraneous to the effects of treatment.

As for making a causal inference--that is a lot to hope for from purely observational data. Do your best to include relevant covariates that distinguish the treatment and control groups. But the possibility that important unobserved variables are at work can never be dismissed altogether.

I use covariates that are common for this type of research.

I'm not sure I understand the code completely. i.year are dummies for every year (instead of writing 2011 2012 2013 2014 for evey year). but what about i.tratment##i.t? That are DID coefficients for every year? And where is just tratment variable in this equation (in two period we have outcome d t d#t). fe can be added for fixed effects?

Originally posted by Clyde Schechter View Post

Hope this is helpful. Please note that I have no expertise in finance or economics, I am commenting from a rather generic statistical perspective. There are numerous people on this forum who do have finance and economics expertise and they may have other substantive suggestions.

THAT WAS VERY HELPFUL. THANK YOU VERY MUCH FOR YOUR COMMENTS.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#6

24 Dec 2015, 13:21

I'm not sure I understand the code completely. i.year are dummies for every year (instead of writing 2011 2012 2013 2014 for evey year). but what about i.tratment##i.t? That are DID coefficients for every year? And where is just tratment variable in this equation (in two period we have outcome d t d#t). fe can be added for fixed effects?

Yes i.year gives dummies for every year (with one omitted as reference category). As for i.treatment##i.t that will generate dummies for treatment and t and also an interaction term between them. (As indicated, though, the dummy for t will be dropped due to collinearity with the year dummies.) You will not get DID coefficients for every year. You will get a DID coefficient that applies overall to the pre vs post-intervention era. The year dummies will then account for any "shocks" to the outcome occurring separately in each year independent of the other variables in the model. If you wanted to look at differences for all four years, you could replace i.treatment##i.t with i.treatment##i.year, and leave out the t variable altogether. That will get you three interaction terms and it's harder to interpret. You could then look at something like the difference between the 2013 & 2014 interaction terms from zero and the 2012 interaction term. But that's kind of complicated and it at least partially confounds the treatment effect with the annual shocks. So I think having just one interaction term for treatment#t and allowing for yearly shocks is cleaner.
1 like
Comment
Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#7

24 Dec 2015, 14:14

Now, I understand the code.

Thank you very much one more time.You helped me alot.
Comment
Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#8

26 Dec 2015, 06:31

Clyde, my finale code looks like this:

xtreg i treatment i.reportyear treatment##t L.(cf nva dr) dr if reportyear > 2010 , fe

where i is investemnt/capital, reportyear are years 2011,2012,2013,2014, treatment is equal to 1 if firm successfully finished the procedure, t is equal 1 for post-treatment period (2013, 2014) and 0 for pre-treatment period (2010, 2011), L(cf, nva, dr) and dr are controls.

Here are results:
. xtreg i treatment i.reportyear treatment##t L.(cf nva dr) dr if reportyear > 2010 , fe
note: 2014.reportyear omitted because of collinearity
note: 1.treatment omitted because of collinearity

Fixed-effects (within) regression Number of obs = 744
Group variable: id Number of groups = 186

R-sq: within = 0.0208 Obs per group: min = 4
between = 0.0029 avg = 4.0
overall = 0.0037 max = 4

F(9,549) = 1.29
corr(u_i, Xb) = -0.2601 Prob > F = 0.2365

------------------------------------------------------------------------------
i | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treatment | -.049018 .0233196 -2.10 0.036 -.0948246 -.0032114
|
reportyear |
2012 | -.0092632 .0161683 -0.57 0.567 -.0410224 .0224961
2013 | -.0012836 .0169428 -0.08 0.940 -.0345644 .0319971
2014 | 0 (omitted)
|
1.treatment | 0 (omitted)
1.t | -.0421396 .0237827 -1.77 0.077 -.0888558 .0045765
|
treatment#t |
1 1 | .0307058 .0263401 1.17 0.244 -.0210339 .0824454
|
cfl |
L1. | .027224 .0441681 0.62 0.538 -.0595352 .1139833
|
nva |
L1. | -.0309129 .0334381 -0.92 0.356 -.096595 .0347693
|
dr |
L1. | .1198437 .1755711 0.68 0.495 -.2250297 .4647171
--. | .245168 .1784219 1.37 0.170 -.1053053 .5956412
_cons | .0497327 .0226776 2.19 0.029 .0051873 .0942782
-------------+----------------------------------------------------------------
sigma_u | .07876878
sigma_e | .15323989
rho | .20899802 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(185, 549) = 0.90 Prob > F = 0.8019

I ma confused about omitted 2014. Why would 2014 be omitted?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

26 Dec 2015, 08:45

Mislav:
- 2014 was omitted due to collinearity (no particular wonder; this often happens in panel data analysis);
- your R2 are dramatically low and the F-test at the foot of the outcome table tells you that your -xtreg, fe- specification is not better than a pooled regression;
- please, as per FAQ, post what you typed and what Stata gave you back using code delimiters. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#10

26 Dec 2015, 09:04

Carlo, I am not sure if my code is right. I use DID estimator and use more than 2 time periods. I know 2014 is omitted due to collinearity, but which 2 variables are collinear, 2014 and ??

I am also confused because I am not getting the same answer when I use

Code:

diff

command.

How would I use pooled regression when I have 4 periods?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#11

26 Dec 2015, 09:44

2014.reportyear is collinear with t and 2013.reportyear. This is fine--don't worry about it.

-diff- is a user written command. The help file for diff explains that it uses pooled estimation, not panel estimation. That is why its results don't resemble what you are getting.
Comment
Mislav Sagovac

Join Date: Jul 2015

Posts: 39
#12

27 Dec 2015, 06:59

Thank you one more time. I suppose this is my baseline specification.

P. S. I have contacted the author of the diff code and he said I shoulkd include time dummies in cov() option. But I want use it in the end, because I'm not sure how it works. Also it is not possible to use time lags in cov() option (I don't know why).
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#13

27 Dec 2015, 13:17

Mislav, in future posts, please put all commands and results between CODE delimiters, as described in FAQ 12.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Difference in difference example

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment