Dear Statalistforum,
This is my first post here and I am relatively new to Stata. Hi everyone ! I have never used this forum before, but I am desperatly in need for answers on some confusions I have in my research. And after spending half a day searching I am only more confused so maybe you guys could give some guidance.
I am writing my final year thesis on the following:
I am checking for the effect of family control during the 2008-2009 crisis on a couple of dependent variables (investment, financing, employees).
- 1 country (so no country fixed effects)
- 383 firms (with Firm ID)
- 4 years (2006,2007=0 and 2008,2009=1) so around 1500 observations
- 33 industries
- 5 or 6 control variables
- family control dummy * crisis period dummy and non-family control dummy * crisis period dummy (to measure differences between pre-crisis and crisis)
I use a panel data regression with firm fixed effects and I use the following commands before my regression:
gen crisis=0
replace crisis=1 if Year==2008
replace crisis=1 if Year==2009
gen familycontrol_crisis= familycontrol* crisis
gen nonfamilycontrol_crisis= nonfamilycontrol* crisis
gen industry_year= Industry* Year
quietly tabulate industry_year , generate( industry_year )
tsset FirmID Year
I hope above commands and dummies are correct, and if so I have the following questions:
a. I use Industry*Year because my industry variable does not change over the years. However, this creates 132 dummy variables and my total observations are not that big. I always learned that you need around 10 observations per independent variable. Is this true? Does this matter? I am not really interested in the coefficient of the industry dummy but would including these dummies give me wrong results?
b. If so, can I exclude the industry dummy and assume that the industry effect is captured in the firm fixed effect (Firm ID)?
c. I use the following command for my regression: xtreg dep variable familycontrol_crisis nonfamilycontrol_crisis other independent variables industry_year_full_1- industry_year_full_132, fe
This gives me a couple of R^2, however it does not give me an adjusted R^2, if I type in the command to display the adj R^2 it shows me a negative one. Even when I exclude all industry effects it stays negative while I am sure that my model is correct (I follow another research paper). Is this bad? On what R^2 should I focus? It gives me a within, overall and between.
d. I am interested in the difference in coefficient between familycontrol_crisis and nonfamilycontrol_crisis so after the xtreg, fe regression I type: test familycontrol_crisis=nonfamilycontrol_crisis. Is this the correct method to test for this?
e. I also experimented by reducing the amount of industries, only using year effects etc. This all gives me somehow different results however I have no idea on what statistic to focus to say OK this is the model I go with. Do I need to focus on the R^2, the coefficients F-statistics or something else?
f. I also use some Log variables, for instance to measure Log(Employees). This gives me a very high overall R^2, how can this be? is this normal?
As you can probably see, I am not that experienced in statistics. However, I am very much struggling with these questions so any help would be highly appreciated. Many thanks in advance.
Best regards,
Daniel
This is my first post here and I am relatively new to Stata. Hi everyone ! I have never used this forum before, but I am desperatly in need for answers on some confusions I have in my research. And after spending half a day searching I am only more confused so maybe you guys could give some guidance.
I am writing my final year thesis on the following:
I am checking for the effect of family control during the 2008-2009 crisis on a couple of dependent variables (investment, financing, employees).
- 1 country (so no country fixed effects)
- 383 firms (with Firm ID)
- 4 years (2006,2007=0 and 2008,2009=1) so around 1500 observations
- 33 industries
- 5 or 6 control variables
- family control dummy * crisis period dummy and non-family control dummy * crisis period dummy (to measure differences between pre-crisis and crisis)
I use a panel data regression with firm fixed effects and I use the following commands before my regression:
gen crisis=0
replace crisis=1 if Year==2008
replace crisis=1 if Year==2009
gen familycontrol_crisis= familycontrol* crisis
gen nonfamilycontrol_crisis= nonfamilycontrol* crisis
gen industry_year= Industry* Year
quietly tabulate industry_year , generate( industry_year )
tsset FirmID Year
I hope above commands and dummies are correct, and if so I have the following questions:
a. I use Industry*Year because my industry variable does not change over the years. However, this creates 132 dummy variables and my total observations are not that big. I always learned that you need around 10 observations per independent variable. Is this true? Does this matter? I am not really interested in the coefficient of the industry dummy but would including these dummies give me wrong results?
b. If so, can I exclude the industry dummy and assume that the industry effect is captured in the firm fixed effect (Firm ID)?
c. I use the following command for my regression: xtreg dep variable familycontrol_crisis nonfamilycontrol_crisis other independent variables industry_year_full_1- industry_year_full_132, fe
This gives me a couple of R^2, however it does not give me an adjusted R^2, if I type in the command to display the adj R^2 it shows me a negative one. Even when I exclude all industry effects it stays negative while I am sure that my model is correct (I follow another research paper). Is this bad? On what R^2 should I focus? It gives me a within, overall and between.
d. I am interested in the difference in coefficient between familycontrol_crisis and nonfamilycontrol_crisis so after the xtreg, fe regression I type: test familycontrol_crisis=nonfamilycontrol_crisis. Is this the correct method to test for this?
e. I also experimented by reducing the amount of industries, only using year effects etc. This all gives me somehow different results however I have no idea on what statistic to focus to say OK this is the model I go with. Do I need to focus on the R^2, the coefficients F-statistics or something else?
f. I also use some Log variables, for instance to measure Log(Employees). This gives me a very high overall R^2, how can this be? is this normal?
As you can probably see, I am not that experienced in statistics. However, I am very much struggling with these questions so any help would be highly appreciated. Many thanks in advance.
Best regards,
Daniel
Comment