Dear STATA experts,
I am writing my thesis using firm-level panel data ranging from 2007 to 2016.
Here is the simple representation of my dataset.
The firms fall into 24 industry categories.
I would like to examine the effect of explanatory variables X1 and X2 on the growth of firms.
I can identify with my data that the year 2011 to 2016 is a recession period so I formulated a recession dummy.
The purpose of my study is to figure out if the explanatory variable X2 has a significant positive or negative effect, especially during the recession period.
Therefore, I performed DID(difference-in-difference) estimation by using an interaction variable between X2 and the recession dummy under the framework of the panel fixed effect model.
I formulated the interaction term 'X2_recession' by manually multiplying X2 to the recession dummy variable and I also generated industry and year dummy variables.
I estimated two models, one with a recession dummy, and the other with individual year dummies.
I understand that industry dummies are time-invariant for most cases and therefore not estimated under the fixed effect model, but I still included them since I found some researches with firm-level fixed effects model considered industry dummies.
The following is the code I used and I got coefficients for industry dummies even though all of them were not significant.
I presented the result to my dissertation committee and got a comment to analyze the interaction term for each industry separately.
Therefore, what I am trying to do is to adopt interaction terms 'X2_recession_industry*' between three variables: X2, the recession dummy, and the industry dummies.
I manually multiplied X2 to 24 industry dummies, creating 'X2_industry*', and again, multiplied recession dummy variable to this, creating 'X2_recession_industry*'.
(Therefore, I have X2_industry1, X2_industry2, X2_industry3, X2_industry4, X2_industry5, X2_industry6, ... , X2_industry24
and X2_recession_industry1, X2_recession_industry2, X2_recession_industry3, X2_recession_industry4, X2_recession_industry5, X2_recession_industry6, ... , X2_recession_industry24.)
The following is the code I am trying to estimate.
For X2_industry* and X2_recession_industry*, all industry categories -24- are considered.
Most examples I found on DID estimation deal with binary variables only, and I think this may be different from my case.
Here are some questions about my models.
1) (main curiosity) In my revised model, is it okay to include all -24- interaction terms for X2_industry* and X2_recession_industry*? (no reference level?)
When I tried to estimate with all - 24 - interaction variables, STATA gave me coefficients for all the interaction terms.
However, it seems like one needs to include n-1 interaction terms when the interaction terms are the multiplication of binary variables.
If I can include all -24- interaction terms, how do I interpret the result without a base level?
2) Is it wise to include industry dummies just because preceding studies included them? (and also because they are not excluded for collinearity anyway?)
I am still doubting what I have done.
3) Should I include interaction term for all explanatory variables including X1 because a recession is a macroeconomic shock that might affect all economic variables?
The additional effect of X1 during the recession is not my interest and I don't want my model to become too complicated.
I found some literature that considers interaction term between a macro shock and all explanatory variables, but I want to know if it is a necessity for the model's integrity.
Any comment will be greatly appreciated.
Thank you in advance.
Hyeseon
I am writing my thesis using firm-level panel data ranging from 2007 to 2016.
Here is the simple representation of my dataset.
The firms fall into 24 industry categories.
Code:
year id industry growth X1 X2 recession 2007 1 10 0.22 0.42 112 0 2008 1 10 0.25 0.22 132 0 2009 1 10 0.21 0.65 128 0 2010 1 10 0.28 0.56 122 0 2011 1 10 0.19 0.47 128 1 2012 1 10 0.18 0.32 129 1 2013 1 10 0.18 0.65 132 1 2014 1 10 0.16 0.55 127 1 2015 1 10 0.19 0.45 122 1 2016 1 10 0.18 0.42 128 1 2007 2 11 0.22 0.21 501 0 2008 2 11 0.24 0.22 499 0 2009 2 11 0.29 0.24 489 0 2010 2 11 0.22 0.26 468 0 2011 2 11 0.24 0.20 496 1 2012 2 11 0.02 0.27 497 1 2013 2 11 0.02 0.18 501 1 2014 2 11 0.10 0.19 458 1 2015 2 11 0.13 0.21 456 1 2016 2 11 0.21 0.22 432 1
I can identify with my data that the year 2011 to 2016 is a recession period so I formulated a recession dummy.
The purpose of my study is to figure out if the explanatory variable X2 has a significant positive or negative effect, especially during the recession period.
Therefore, I performed DID(difference-in-difference) estimation by using an interaction variable between X2 and the recession dummy under the framework of the panel fixed effect model.
I formulated the interaction term 'X2_recession' by manually multiplying X2 to the recession dummy variable and I also generated industry and year dummy variables.
I estimated two models, one with a recession dummy, and the other with individual year dummies.
I understand that industry dummies are time-invariant for most cases and therefore not estimated under the fixed effect model, but I still included them since I found some researches with firm-level fixed effects model considered industry dummies.
The following is the code I used and I got coefficients for industry dummies even though all of them were not significant.
Code:
xtreg growth X1 X2 X2_recession recession industry_* fe xtreg growth X1 X2 X2_recession year_* industry_*, fe
Therefore, what I am trying to do is to adopt interaction terms 'X2_recession_industry*' between three variables: X2, the recession dummy, and the industry dummies.
I manually multiplied X2 to 24 industry dummies, creating 'X2_industry*', and again, multiplied recession dummy variable to this, creating 'X2_recession_industry*'.
(Therefore, I have X2_industry1, X2_industry2, X2_industry3, X2_industry4, X2_industry5, X2_industry6, ... , X2_industry24
and X2_recession_industry1, X2_recession_industry2, X2_recession_industry3, X2_recession_industry4, X2_recession_industry5, X2_recession_industry6, ... , X2_recession_industry24.)
The following is the code I am trying to estimate.
Code:
xtreg growth X1 X2_industry* X2_recession_industry* recession industry_*, fe xtreg growth X1 X2_industry* X2_recession_industry* year_* industry_*, fe
Most examples I found on DID estimation deal with binary variables only, and I think this may be different from my case.
Here are some questions about my models.
1) (main curiosity) In my revised model, is it okay to include all -24- interaction terms for X2_industry* and X2_recession_industry*? (no reference level?)
When I tried to estimate with all - 24 - interaction variables, STATA gave me coefficients for all the interaction terms.
However, it seems like one needs to include n-1 interaction terms when the interaction terms are the multiplication of binary variables.
If I can include all -24- interaction terms, how do I interpret the result without a base level?
2) Is it wise to include industry dummies just because preceding studies included them? (and also because they are not excluded for collinearity anyway?)
I am still doubting what I have done.
3) Should I include interaction term for all explanatory variables including X1 because a recession is a macroeconomic shock that might affect all economic variables?
The additional effect of X1 during the recession is not my interest and I don't want my model to become too complicated.
I found some literature that considers interaction term between a macro shock and all explanatory variables, but I want to know if it is a necessity for the model's integrity.
Any comment will be greatly appreciated.
Thank you in advance.
Hyeseon