Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit, Heteroscedastic Probit, Clustered Standar Errors, Country Fixed Effects

    Hello everyone,

    I'm running a probit regression with a binary dependent variable that reflects the choice of ownership structure for each firm. My data is a cross-sectional data from 5 countries, but the number of observations (firms) is not equally distributed, in which from a total of 4,021 firms, 90% are from one particular country.

    I have 4 independent variables: 3 of them are continuous variables which measured at the firm level, 1 of them is a scaled variable (it's an index) measured at country level and has a range of value from 0 to 10.

    My professor suggest me to use clustered standard errors, but using this method, I could not get the Wald chi2 and prob>chi2 to measure the goodness of fit. Moreover, when I include country fixed effect, 1 of the country is omitted by Stata due to collinearity.

    Below are the commands and output for both probit and heteroscedastic probit model that I used:

    Code:
    . xi: probit ownstruct_3 m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc i.country, vce (
    > cluster country)
    i.country         _Icountry_1-5       (_Icountry_1 for country==France omitted)
    
    note: _Icountry_5 omitted because of collinearity
    Iteration 0:   log pseudolikelihood = -2443.1828  
    Iteration 1:   log pseudolikelihood = -2324.7114  
    Iteration 2:   log pseudolikelihood = -2323.5639  
    Iteration 3:   log pseudolikelihood = -2323.5623  
    Iteration 4:   log pseudolikelihood = -2323.5623  
    
    Probit regression                               Number of obs     =      4,021
                                                    Wald chi2(2)      =          .
                                                    Prob > chi2       =          .
    Log pseudolikelihood = -2323.5623               Pseudo R2         =     0.0490
    
                                       (Std. Err. adjusted for 5 clusters in country)
    ---------------------------------------------------------------------------------
                    |               Robust
        ownstruct_3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
      m_totalassets |   .0460297   .0303607     1.52   0.129    -.0134761    .1055355
     m_acc_capex_3y |   .0609838   .0180626     3.38   0.001     .0255817    .0963859
    m_acc_ebitda_3y |  -.0691417   .0278767    -2.48   0.013     -.123779   -.0145044
      sip_index_inc |  -.5856417   .0836959    -7.00   0.000    -.7496827   -.4216007
        _Icountry_2 |   .0781847   .1147213     0.68   0.496    -.1466649    .3030343
        _Icountry_3 |  -.5212683    .019825   -26.29   0.000    -.5601246   -.4824121
        _Icountry_4 |   .2262234   .1854274     1.22   0.222    -.1372077    .5896545
        _Icountry_5 |          0  (omitted)
              _cons |    3.23095   .4177767     7.73   0.000     2.412123    4.049778
    ---------------------------------------------------------------------------------
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =      4,021
    Model VCE    : Robust
    
    Expression   : Pr(ownstruct_3), predict()
    dy/dx w.r.t. : m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc _Icountry_2 _Icountry_3
                   _Icountry_4 _Icountry_5
    
    ---------------------------------------------------------------------------------
                    |            Delta-method
                    |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
      m_totalassets |   .0151098   .0097796     1.55   0.122    -.0040578    .0342775
     m_acc_capex_3y |   .0200187   .0056936     3.52   0.000     .0088595    .0311779
    m_acc_ebitda_3y |  -.0226966   .0093473    -2.43   0.015    -.0410169   -.0043764
      sip_index_inc |  -.1922443    .025003    -7.69   0.000    -.2412493   -.1432393
        _Icountry_2 |   .0256651   .0379884     0.68   0.499    -.0487908     .100121
        _Icountry_3 |   -.171113   .0086079   -19.88   0.000     -.187984   -.1542419
        _Icountry_4 |   .0742607   .0618231     1.20   0.230    -.0469103    .1954317
        _Icountry_5 |          0  (omitted)
    ---------------------------------------------------------------------------------
    
    . xi: hetprobit ownstruct_3 m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc i.country, he
    > t(m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc) vce (cluster country)
    i.country         _Icountry_1-5       (_Icountry_1 for country==France omitted)
    note: _Icountry_5 omitted because of collinearity
    
    
    Fitting full model:
    
    Iteration 0:   log pseudolikelihood = -2787.1448  
    Iteration 1:   log pseudolikelihood = -2326.2546  (not concave)
    Iteration 2:   log pseudolikelihood = -2308.9978  
    Iteration 3:   log pseudolikelihood = -2299.8337  (not concave)
    Iteration 4:   log pseudolikelihood = -2294.9131  
    Iteration 5:   log pseudolikelihood = -2292.7566  (not concave)
    Iteration 6:   log pseudolikelihood = -2292.4218  
    Iteration 7:   log pseudolikelihood = -2291.7139  
    Iteration 8:   log pseudolikelihood =  -2290.557  
    Iteration 9:   log pseudolikelihood = -2289.6984  
    Iteration 10:  log pseudolikelihood = -2288.2845  
    Iteration 11:  log pseudolikelihood = -2287.2823  
    Iteration 12:  log pseudolikelihood = -2285.9255  
    Iteration 13:  log pseudolikelihood = -2285.3482  
    Iteration 14:  log pseudolikelihood = -2284.5698  
    Iteration 15:  log pseudolikelihood =  -2283.603  
    Iteration 16:  log pseudolikelihood = -2283.1531  
    Iteration 17:  log pseudolikelihood = -2282.6975  
    Iteration 18:  log pseudolikelihood = -2282.5959  
    Iteration 19:  log pseudolikelihood = -2282.2646  
    Iteration 20:  log pseudolikelihood = -2282.1775  
    Iteration 21:  log pseudolikelihood = -2282.0261  
    Iteration 22:  log pseudolikelihood = -2281.9917  
    Iteration 23:  log pseudolikelihood = -2281.9588  
    Iteration 24:  log pseudolikelihood = -2281.9321  
    Iteration 25:  log pseudolikelihood = -2281.9247  
    Iteration 26:  log pseudolikelihood =  -2281.923  
    Iteration 27:  log pseudolikelihood = -2281.9227  
    Iteration 28:  log pseudolikelihood = -2281.9227  
    
    Heteroskedastic probit model                    Number of obs     =      4,021
                                                    Zero outcomes     =      2,830
                                                    Nonzero outcomes  =      1,191
    
                                                    Wald chi2(0)      =          .
    Log pseudolikelihood = -2281.923                Prob > chi2       =          .
    
                                       (Std. Err. adjusted for 5 clusters in country)
    ---------------------------------------------------------------------------------
                    |               Robust
        ownstruct_3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    ownstruct_3     |
      m_totalassets |   13.46958   4.512941     2.98   0.003     4.624374    22.31478
     m_acc_capex_3y |   7.411253   3.778995     1.96   0.050      .004559    14.81795
    m_acc_ebitda_3y |  -3.111114   12.65374    -0.25   0.806    -27.91198    21.68976
      sip_index_inc |  -213.5485   230.1645    -0.93   0.354    -664.6626    237.5656
        _Icountry_2 |   -61.3819    71.2403    -0.86   0.389    -201.0103    78.24653
        _Icountry_3 |   27.19912   50.43485     0.54   0.590    -71.65136    126.0496
        _Icountry_4 |   -33.3174   56.67617    -0.59   0.557    -144.4006    77.76585
        _Icountry_5 |          0  (omitted)
              _cons |   1132.197   1224.025     0.92   0.355    -1266.848    3531.241
    ----------------+----------------------------------------------------------------
    lnsigma2        |
      m_totalassets |   .1319477   .1229202     1.07   0.283    -.1089716    .3728669
     m_acc_capex_3y |   .0160137   .0874773     0.18   0.855    -.1554387    .1874661
    m_acc_ebitda_3y |  -.1268203   .3350114    -0.38   0.705    -.7834306      .52979
      sip_index_inc |   .7552447   .1390858     5.43   0.000     .4826416    1.027848
    ---------------------------------------------------------------------------------
    Wald test of lnsigma2=0: chi2(4) = 21318.05               Prob > chi2 = 0.0000
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =      4,021
    Model VCE    : Robust
    
    Expression   : Pr(ownstruct_3), predict()
    dy/dx w.r.t. : m_totalassets m_acc_capex_3y m_acc_ebitda_3y sip_index_inc _Icountry_2 _Icountry_3
                   _Icountry_4 _Icountry_5
    
    ---------------------------------------------------------------------------------
                    |            Delta-method
                    |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
      m_totalassets |   .0751858   .0482783     1.56   0.119    -.0194379    .1698095
     m_acc_capex_3y |   .0317486   .0207763     1.53   0.126    -.0089721    .0724694
    m_acc_ebitda_3y |  -.0337452   .0124208    -2.72   0.007    -.0580896   -.0094009
      sip_index_inc |  -.7079649   .3023874    -2.34   0.019    -1.300633   -.1152966
        _Icountry_2 |  -.2404023   .1224559    -1.96   0.050    -.4804114   -.0003932
        _Icountry_3 |   .1065254   .1259398     0.85   0.398     -.140312    .3533628
        _Icountry_4 |  -.1304876   .1335276    -0.98   0.328    -.3921968    .1312216
        _Icountry_5 |          0  (omitted)
    ---------------------------------------------------------------------------------
    
    .
    My questions are:

    1. Given the fact that the distribution of my sample is heavily concentrated in one country, should I still use clustered standard errors? If I replace the clustered standard error by using vce(robust) command, all the independent variables are statistically significant, but if I use clustered, some of them are insignificant.

    2. Is the method that I used to include country fixed effects correct? I read somewhere that in probit model, we cannot use fixed effect by creating indicator dummies as we commonly used in OLS regression.

    3. When I use country fixed effects, Stata omits one of the countries due to collinearity. Can I still use this output even though one dummy is omitted by Stata?

    4. Is the command that I used for hetprob correct? I'm not sure which independent variable that I should use to model the variance, here I put all of my independent variables inside "het( )".

    5. Some of the iterations in my hetprob output are not concave. What do they mean? Can I still use this output?

    6. Is it true that Wald chi2 and prob>chi2 is missing because I use clustered standard errors? Then, is there any method to measure the goodness of fit for this model?

    I hope somebody can help me.

    Thanks in advance.


  • #2
    You'll generally increase your chances of a useful answer by offering a shorter, more focused question. Please read the FAQ on asking questions.
    1. xi is no longer needed unless you explicitly want to create dummies. That things become insignificant is no reason to chose a particular estimator or options.
    2. You're probably better off using xtlogit or xtprobit which are built for your model instead of probit.
    3. Any time you have a variable that you convert into a series of dummies, you must omit one of the dummies - otherwise, the sum of the dummies is colinear with the intercept. But, with 5 countries, you only get 3 dummies estimated instead of 4. You need to figure out what is going on with the fifth country. How many usable observations do you have for that country? Do you have any variation in the dv within the usable observations on that country?
    5. Having non-concave iterations in a maximum likelihood is not a problem if you don't have them at the end and you do get a maximum. You can ignore them.

    I'd worry a lot about using financial outcomes to explain ownership structure - ownership structure should influence financial outcomes.

    Comment

    Working...
    X