Probit, Heteroscedastic Probit, Clustered Standar Errors, Country Fixed Effects

Stephani Tatengkeng

Join Date: Jun 2018
Posts: 1

Probit, Heteroscedastic Probit, Clustered Standar Errors, Country Fixed Effects

12 Jul 2018, 03:11

Hello everyone,

I'm running a probit regression with a binary dependent variable that reflects the choice of ownership structure for each firm. My data is a cross-sectional data from 5 countries, but the number of observations (firms) is not equally distributed, in which from a total of 4,021 firms, 90% are from one particular country.

I have 4 independent variables: 3 of them are continuous variables which measured at the firm level, 1 of them is a scaled variable (it's an index) measured at country level and has a range of value from 0 to 10.

My professor suggest me to use clustered standard errors, but using this method, I could not get the Wald chi2 and prob>chi2 to measure the goodness of fit. Moreover, when I include country fixed effect, 1 of the country is omitted by Stata due to collinearity.

Below are the commands and output for both probit and heteroscedastic probit model that I used:

Code:

. xi: probit ownstruct_3 m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc i.country, vce (
> cluster country)
i.country         _Icountry_1-5       (_Icountry_1 for country==France omitted)

note: _Icountry_5 omitted because of collinearity
Iteration 0:   log pseudolikelihood = -2443.1828  
Iteration 1:   log pseudolikelihood = -2324.7114  
Iteration 2:   log pseudolikelihood = -2323.5639  
Iteration 3:   log pseudolikelihood = -2323.5623  
Iteration 4:   log pseudolikelihood = -2323.5623  

Probit regression                               Number of obs     =      4,021
                                                Wald chi2(2)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -2323.5623               Pseudo R2         =     0.0490

                                   (Std. Err. adjusted for 5 clusters in country)
---------------------------------------------------------------------------------
                |               Robust
    ownstruct_3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
  m_totalassets |   .0460297   .0303607     1.52   0.129    -.0134761    .1055355
 m_acc_capex_3y |   .0609838   .0180626     3.38   0.001     .0255817    .0963859
m_acc_ebitda_3y |  -.0691417   .0278767    -2.48   0.013     -.123779   -.0145044
  sip_index_inc |  -.5856417   .0836959    -7.00   0.000    -.7496827   -.4216007
    _Icountry_2 |   .0781847   .1147213     0.68   0.496    -.1466649    .3030343
    _Icountry_3 |  -.5212683    .019825   -26.29   0.000    -.5601246   -.4824121
    _Icountry_4 |   .2262234   .1854274     1.22   0.222    -.1372077    .5896545
    _Icountry_5 |          0  (omitted)
          _cons |    3.23095   .4177767     7.73   0.000     2.412123    4.049778
---------------------------------------------------------------------------------

. margins, dydx(*)

Average marginal effects                        Number of obs     =      4,021
Model VCE    : Robust

Expression   : Pr(ownstruct_3), predict()
dy/dx w.r.t. : m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc _Icountry_2 _Icountry_3
               _Icountry_4 _Icountry_5

---------------------------------------------------------------------------------
                |            Delta-method
                |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
  m_totalassets |   .0151098   .0097796     1.55   0.122    -.0040578    .0342775
 m_acc_capex_3y |   .0200187   .0056936     3.52   0.000     .0088595    .0311779
m_acc_ebitda_3y |  -.0226966   .0093473    -2.43   0.015    -.0410169   -.0043764
  sip_index_inc |  -.1922443    .025003    -7.69   0.000    -.2412493   -.1432393
    _Icountry_2 |   .0256651   .0379884     0.68   0.499    -.0487908     .100121
    _Icountry_3 |   -.171113   .0086079   -19.88   0.000     -.187984   -.1542419
    _Icountry_4 |   .0742607   .0618231     1.20   0.230    -.0469103    .1954317
    _Icountry_5 |          0  (omitted)
---------------------------------------------------------------------------------

. xi: hetprobit ownstruct_3 m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc i.country, he
> t(m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc) vce (cluster country)
i.country         _Icountry_1-5       (_Icountry_1 for country==France omitted)
note: _Icountry_5 omitted because of collinearity


Fitting full model:

Iteration 0:   log pseudolikelihood = -2787.1448  
Iteration 1:   log pseudolikelihood = -2326.2546  (not concave)
Iteration 2:   log pseudolikelihood = -2308.9978  
Iteration 3:   log pseudolikelihood = -2299.8337  (not concave)
Iteration 4:   log pseudolikelihood = -2294.9131  
Iteration 5:   log pseudolikelihood = -2292.7566  (not concave)
Iteration 6:   log pseudolikelihood = -2292.4218  
Iteration 7:   log pseudolikelihood = -2291.7139  
Iteration 8:   log pseudolikelihood =  -2290.557  
Iteration 9:   log pseudolikelihood = -2289.6984  
Iteration 10:  log pseudolikelihood = -2288.2845  
Iteration 11:  log pseudolikelihood = -2287.2823  
Iteration 12:  log pseudolikelihood = -2285.9255  
Iteration 13:  log pseudolikelihood = -2285.3482  
Iteration 14:  log pseudolikelihood = -2284.5698  
Iteration 15:  log pseudolikelihood =  -2283.603  
Iteration 16:  log pseudolikelihood = -2283.1531  
Iteration 17:  log pseudolikelihood = -2282.6975  
Iteration 18:  log pseudolikelihood = -2282.5959  
Iteration 19:  log pseudolikelihood = -2282.2646  
Iteration 20:  log pseudolikelihood = -2282.1775  
Iteration 21:  log pseudolikelihood = -2282.0261  
Iteration 22:  log pseudolikelihood = -2281.9917  
Iteration 23:  log pseudolikelihood = -2281.9588  
Iteration 24:  log pseudolikelihood = -2281.9321  
Iteration 25:  log pseudolikelihood = -2281.9247  
Iteration 26:  log pseudolikelihood =  -2281.923  
Iteration 27:  log pseudolikelihood = -2281.9227  
Iteration 28:  log pseudolikelihood = -2281.9227  

Heteroskedastic probit model                    Number of obs     =      4,021
                                                Zero outcomes     =      2,830
                                                Nonzero outcomes  =      1,191

                                                Wald chi2(0)      =          .
Log pseudolikelihood = -2281.923                Prob > chi2       =          .

                                   (Std. Err. adjusted for 5 clusters in country)
---------------------------------------------------------------------------------
                |               Robust
    ownstruct_3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
ownstruct_3     |
  m_totalassets |   13.46958   4.512941     2.98   0.003     4.624374    22.31478
 m_acc_capex_3y |   7.411253   3.778995     1.96   0.050      .004559    14.81795
m_acc_ebitda_3y |  -3.111114   12.65374    -0.25   0.806    -27.91198    21.68976
  sip_index_inc |  -213.5485   230.1645    -0.93   0.354    -664.6626    237.5656
    _Icountry_2 |   -61.3819    71.2403    -0.86   0.389    -201.0103    78.24653
    _Icountry_3 |   27.19912   50.43485     0.54   0.590    -71.65136    126.0496
    _Icountry_4 |   -33.3174   56.67617    -0.59   0.557    -144.4006    77.76585
    _Icountry_5 |          0  (omitted)
          _cons |   1132.197   1224.025     0.92   0.355    -1266.848    3531.241
----------------+----------------------------------------------------------------
lnsigma2        |
  m_totalassets |   .1319477   .1229202     1.07   0.283    -.1089716    .3728669
 m_acc_capex_3y |   .0160137   .0874773     0.18   0.855    -.1554387    .1874661
m_acc_ebitda_3y |  -.1268203   .3350114    -0.38   0.705    -.7834306      .52979
  sip_index_inc |   .7552447   .1390858     5.43   0.000     .4826416    1.027848
---------------------------------------------------------------------------------
Wald test of lnsigma2=0: chi2(4) = 21318.05               Prob > chi2 = 0.0000

. margins, dydx(*)

Average marginal effects                        Number of obs     =      4,021
Model VCE    : Robust

Expression   : Pr(ownstruct_3), predict()
dy/dx w.r.t. : m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc _Icountry_2 _Icountry_3
               _Icountry_4 _Icountry_5

---------------------------------------------------------------------------------
                |            Delta-method
                |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
  m_totalassets |   .0751858   .0482783     1.56   0.119    -.0194379    .1698095
 m_acc_capex_3y |   .0317486   .0207763     1.53   0.126    -.0089721    .0724694
m_acc_ebitda_3y |  -.0337452   .0124208    -2.72   0.007    -.0580896   -.0094009
  sip_index_inc |  -.7079649   .3023874    -2.34   0.019    -1.300633   -.1152966
    _Icountry_2 |  -.2404023   .1224559    -1.96   0.050    -.4804114   -.0003932
    _Icountry_3 |   .1065254   .1259398     0.85   0.398     -.140312    .3533628
    _Icountry_4 |  -.1304876   .1335276    -0.98   0.328    -.3921968    .1312216
    _Icountry_5 |          0  (omitted)
---------------------------------------------------------------------------------

.

My questions are:

1. Given the fact that the distribution of my sample is heavily concentrated in one country, should I still use clustered standard errors? If I replace the clustered standard error by using vce(robust) command, all the independent variables are statistically significant, but if I use clustered, some of them are insignificant.

2. Is the method that I used to include country fixed effects correct? I read somewhere that in probit model, we cannot use fixed effect by creating indicator dummies as we commonly used in OLS regression.

3. When I use country fixed effects, Stata omits one of the countries due to collinearity. Can I still use this output even though one dummy is omitted by Stata?

4. Is the command that I used for hetprob correct? I'm not sure which independent variable that I should use to model the variance, here I put all of my independent variables inside "het( )".

5. Some of the iterations in my hetprob output are not concave. What do they mean? Can I still use this output?

6. Is it true that Wald chi2 and prob>chi2 is missing because I use clustered standard errors? Then, is there any method to measure the goodness of fit for this model?

I hope somebody can help me.

Thanks in advance.

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

13 Jul 2018, 11:19

You'll generally increase your chances of a useful answer by offering a shorter, more focused question. Please read the FAQ on asking questions.
1. xi is no longer needed unless you explicitly want to create dummies. That things become insignificant is no reason to chose a particular estimator or options.
2. You're probably better off using xtlogit or xtprobit which are built for your model instead of probit.
3. Any time you have a variable that you convert into a series of dummies, you must omit one of the dummies - otherwise, the sum of the dummies is colinear with the intercept. But, with 5 countries, you only get 3 dummies estimated instead of 4. You need to figure out what is going on with the fifth country. How many usable observations do you have for that country? Do you have any variation in the dv within the usable observations on that country?
5. Having non-concave iterations in a maximum likelihood is not a problem if you don't have them at the end and you do get a maximum. You can ignore them.

I'd worry a lot about using financial outcomes to explain ownership structure - ownership structure should influence financial outcomes.
Comment

Announcement

Probit, Heteroscedastic Probit, Clustered Standar Errors, Country Fixed Effects

Comment