Hello everyone,
I'm running a probit regression with a binary dependent variable that reflects the choice of ownership structure for each firm. My data is a cross-sectional data from 5 countries, but the number of observations (firms) is not equally distributed, in which from a total of 4,021 firms, 90% are from one particular country.
I have 4 independent variables: 3 of them are continuous variables which measured at the firm level, 1 of them is a scaled variable (it's an index) measured at country level and has a range of value from 0 to 10.
My professor suggest me to use clustered standard errors, but using this method, I could not get the Wald chi2 and prob>chi2 to measure the goodness of fit. Moreover, when I include country fixed effect, 1 of the country is omitted by Stata due to collinearity.
Below are the commands and output for both probit and heteroscedastic probit model that I used:
My questions are:
1. Given the fact that the distribution of my sample is heavily concentrated in one country, should I still use clustered standard errors? If I replace the clustered standard error by using vce(robust) command, all the independent variables are statistically significant, but if I use clustered, some of them are insignificant.
2. Is the method that I used to include country fixed effects correct? I read somewhere that in probit model, we cannot use fixed effect by creating indicator dummies as we commonly used in OLS regression.
3. When I use country fixed effects, Stata omits one of the countries due to collinearity. Can I still use this output even though one dummy is omitted by Stata?
4. Is the command that I used for hetprob correct? I'm not sure which independent variable that I should use to model the variance, here I put all of my independent variables inside "het( )".
5. Some of the iterations in my hetprob output are not concave. What do they mean? Can I still use this output?
6. Is it true that Wald chi2 and prob>chi2 is missing because I use clustered standard errors? Then, is there any method to measure the goodness of fit for this model?
I hope somebody can help me.
Thanks in advance.
I'm running a probit regression with a binary dependent variable that reflects the choice of ownership structure for each firm. My data is a cross-sectional data from 5 countries, but the number of observations (firms) is not equally distributed, in which from a total of 4,021 firms, 90% are from one particular country.
I have 4 independent variables: 3 of them are continuous variables which measured at the firm level, 1 of them is a scaled variable (it's an index) measured at country level and has a range of value from 0 to 10.
My professor suggest me to use clustered standard errors, but using this method, I could not get the Wald chi2 and prob>chi2 to measure the goodness of fit. Moreover, when I include country fixed effect, 1 of the country is omitted by Stata due to collinearity.
Below are the commands and output for both probit and heteroscedastic probit model that I used:
Code:
. xi: probit ownstruct_3 m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc i.country, vce ( > cluster country) i.country _Icountry_1-5 (_Icountry_1 for country==France omitted) note: _Icountry_5 omitted because of collinearity Iteration 0: log pseudolikelihood = -2443.1828 Iteration 1: log pseudolikelihood = -2324.7114 Iteration 2: log pseudolikelihood = -2323.5639 Iteration 3: log pseudolikelihood = -2323.5623 Iteration 4: log pseudolikelihood = -2323.5623 Probit regression Number of obs = 4,021 Wald chi2(2) = . Prob > chi2 = . Log pseudolikelihood = -2323.5623 Pseudo R2 = 0.0490 (Std. Err. adjusted for 5 clusters in country) --------------------------------------------------------------------------------- | Robust ownstruct_3 | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- m_totalassets | .0460297 .0303607 1.52 0.129 -.0134761 .1055355 m_acc_capex_3y | .0609838 .0180626 3.38 0.001 .0255817 .0963859 m_acc_ebitda_3y | -.0691417 .0278767 -2.48 0.013 -.123779 -.0145044 sip_index_inc | -.5856417 .0836959 -7.00 0.000 -.7496827 -.4216007 _Icountry_2 | .0781847 .1147213 0.68 0.496 -.1466649 .3030343 _Icountry_3 | -.5212683 .019825 -26.29 0.000 -.5601246 -.4824121 _Icountry_4 | .2262234 .1854274 1.22 0.222 -.1372077 .5896545 _Icountry_5 | 0 (omitted) _cons | 3.23095 .4177767 7.73 0.000 2.412123 4.049778 --------------------------------------------------------------------------------- . margins, dydx(*) Average marginal effects Number of obs = 4,021 Model VCE : Robust Expression : Pr(ownstruct_3), predict() dy/dx w.r.t. : m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc _Icountry_2 _Icountry_3 _Icountry_4 _Icountry_5 --------------------------------------------------------------------------------- | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- m_totalassets | .0151098 .0097796 1.55 0.122 -.0040578 .0342775 m_acc_capex_3y | .0200187 .0056936 3.52 0.000 .0088595 .0311779 m_acc_ebitda_3y | -.0226966 .0093473 -2.43 0.015 -.0410169 -.0043764 sip_index_inc | -.1922443 .025003 -7.69 0.000 -.2412493 -.1432393 _Icountry_2 | .0256651 .0379884 0.68 0.499 -.0487908 .100121 _Icountry_3 | -.171113 .0086079 -19.88 0.000 -.187984 -.1542419 _Icountry_4 | .0742607 .0618231 1.20 0.230 -.0469103 .1954317 _Icountry_5 | 0 (omitted) --------------------------------------------------------------------------------- . xi: hetprobit ownstruct_3 m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc i.country, he > t(m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc) vce (cluster country) i.country _Icountry_1-5 (_Icountry_1 for country==France omitted) note: _Icountry_5 omitted because of collinearity Fitting full model: Iteration 0: log pseudolikelihood = -2787.1448 Iteration 1: log pseudolikelihood = -2326.2546 (not concave) Iteration 2: log pseudolikelihood = -2308.9978 Iteration 3: log pseudolikelihood = -2299.8337 (not concave) Iteration 4: log pseudolikelihood = -2294.9131 Iteration 5: log pseudolikelihood = -2292.7566 (not concave) Iteration 6: log pseudolikelihood = -2292.4218 Iteration 7: log pseudolikelihood = -2291.7139 Iteration 8: log pseudolikelihood = -2290.557 Iteration 9: log pseudolikelihood = -2289.6984 Iteration 10: log pseudolikelihood = -2288.2845 Iteration 11: log pseudolikelihood = -2287.2823 Iteration 12: log pseudolikelihood = -2285.9255 Iteration 13: log pseudolikelihood = -2285.3482 Iteration 14: log pseudolikelihood = -2284.5698 Iteration 15: log pseudolikelihood = -2283.603 Iteration 16: log pseudolikelihood = -2283.1531 Iteration 17: log pseudolikelihood = -2282.6975 Iteration 18: log pseudolikelihood = -2282.5959 Iteration 19: log pseudolikelihood = -2282.2646 Iteration 20: log pseudolikelihood = -2282.1775 Iteration 21: log pseudolikelihood = -2282.0261 Iteration 22: log pseudolikelihood = -2281.9917 Iteration 23: log pseudolikelihood = -2281.9588 Iteration 24: log pseudolikelihood = -2281.9321 Iteration 25: log pseudolikelihood = -2281.9247 Iteration 26: log pseudolikelihood = -2281.923 Iteration 27: log pseudolikelihood = -2281.9227 Iteration 28: log pseudolikelihood = -2281.9227 Heteroskedastic probit model Number of obs = 4,021 Zero outcomes = 2,830 Nonzero outcomes = 1,191 Wald chi2(0) = . Log pseudolikelihood = -2281.923 Prob > chi2 = . (Std. Err. adjusted for 5 clusters in country) --------------------------------------------------------------------------------- | Robust ownstruct_3 | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- ownstruct_3 | m_totalassets | 13.46958 4.512941 2.98 0.003 4.624374 22.31478 m_acc_capex_3y | 7.411253 3.778995 1.96 0.050 .004559 14.81795 m_acc_ebitda_3y | -3.111114 12.65374 -0.25 0.806 -27.91198 21.68976 sip_index_inc | -213.5485 230.1645 -0.93 0.354 -664.6626 237.5656 _Icountry_2 | -61.3819 71.2403 -0.86 0.389 -201.0103 78.24653 _Icountry_3 | 27.19912 50.43485 0.54 0.590 -71.65136 126.0496 _Icountry_4 | -33.3174 56.67617 -0.59 0.557 -144.4006 77.76585 _Icountry_5 | 0 (omitted) _cons | 1132.197 1224.025 0.92 0.355 -1266.848 3531.241 ----------------+---------------------------------------------------------------- lnsigma2 | m_totalassets | .1319477 .1229202 1.07 0.283 -.1089716 .3728669 m_acc_capex_3y | .0160137 .0874773 0.18 0.855 -.1554387 .1874661 m_acc_ebitda_3y | -.1268203 .3350114 -0.38 0.705 -.7834306 .52979 sip_index_inc | .7552447 .1390858 5.43 0.000 .4826416 1.027848 --------------------------------------------------------------------------------- Wald test of lnsigma2=0: chi2(4) = 21318.05 Prob > chi2 = 0.0000 . margins, dydx(*) Average marginal effects Number of obs = 4,021 Model VCE : Robust Expression : Pr(ownstruct_3), predict() dy/dx w.r.t. : m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc _Icountry_2 _Icountry_3 _Icountry_4 _Icountry_5 --------------------------------------------------------------------------------- | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- m_totalassets | .0751858 .0482783 1.56 0.119 -.0194379 .1698095 m_acc_capex_3y | .0317486 .0207763 1.53 0.126 -.0089721 .0724694 m_acc_ebitda_3y | -.0337452 .0124208 -2.72 0.007 -.0580896 -.0094009 sip_index_inc | -.7079649 .3023874 -2.34 0.019 -1.300633 -.1152966 _Icountry_2 | -.2404023 .1224559 -1.96 0.050 -.4804114 -.0003932 _Icountry_3 | .1065254 .1259398 0.85 0.398 -.140312 .3533628 _Icountry_4 | -.1304876 .1335276 -0.98 0.328 -.3921968 .1312216 _Icountry_5 | 0 (omitted) --------------------------------------------------------------------------------- .
1. Given the fact that the distribution of my sample is heavily concentrated in one country, should I still use clustered standard errors? If I replace the clustered standard error by using vce(robust) command, all the independent variables are statistically significant, but if I use clustered, some of them are insignificant.
2. Is the method that I used to include country fixed effects correct? I read somewhere that in probit model, we cannot use fixed effect by creating indicator dummies as we commonly used in OLS regression.
3. When I use country fixed effects, Stata omits one of the countries due to collinearity. Can I still use this output even though one dummy is omitted by Stata?
4. Is the command that I used for hetprob correct? I'm not sure which independent variable that I should use to model the variance, here I put all of my independent variables inside "het( )".
5. Some of the iterations in my hetprob output are not concave. What do they mean? Can I still use this output?
6. Is it true that Wald chi2 and prob>chi2 is missing because I use clustered standard errors? Then, is there any method to measure the goodness of fit for this model?
I hope somebody can help me.
Thanks in advance.
Comment