Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hausman Test - "V_b-V_B is not positive definite" appears

    Background of question

    I am an economics student, currently writing my bachelor thesis, and quite inexperienced with Stata. I would be grateful for any help!

    The purpose of my research is to analyse the drivers of export sophistication of Malaysian exports.

    The dependent variable is the natural logarithm of the export sophistication index, more specifically the export sophistication of Malaysian exports to 171 countries.

    The independent variables are:
    • Foreign Direct Investment (FDI) proxied by the stock and flow of FDI inflow, FDIS and FDIF respectively
    • Research and Development (R&D) proxied by Gross Domestic Expenditure on R&D as a percentage of GDP and Number of researchers per thousand in the labour force, GDE and RES respectively
    Control variables are Malaysia’s GDP per capita PPP (current international $) proxying for the level of economic development (GDPc); Malaysia’s total population proxying for the country size (POPc); Malaysia’s gross enrolment ratio of the tertiary education segment proxying for Malaysia’s human capital (HCc); and the rule of law proxying for Malaysia's institutional quality (INSc).

    Important here is that the data for the independent and control variables do not vary between the countries (id), only throughout the years since the data is specific to Malaysia.

    My question

    To check whether I should use a fixed-effects or random-effects model, I did the Hausman test, but the output does not seem right.

    The coefficients in the random and fixed effects model are exactly the same. Furthermore, "V_b-V_B is not positive definite" appears.

    I also tried by adding "hausman fixed random, sigmamore", but that does not change anything to my results.

    What would you recommend me to do? Does that simply mean I will have to stick to the random-effects model? And I should simply ignore the "V_b-V_B is not positive definite" ?


    Please find a screenshot of the test attached.

    I would highly appreciate if you could help me.
    Please let me know if you need further clarification.
    Thank you and kind regards,
    Julie
    Attached Files

  • #2
    Julie:
    are you sure that your data support the evidence of a panel-wise effect?
    Can you please share (via CODE delimiters, not screenshots) what you typed and what Stata gave you back after -xtreg,fe- and -xtreg,re-?. Thanks.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hello Carlo,

      Thank you for your quick response!

      I am not sure on how to check whether the data supports evidence of panel-wise effect? I am sorry, but what do you exactly mean by "panel-wise effects"?

      Sorry for the screenshots! Please find my steps below:

      To set the data as panel data (where id are the export destinations - export sophistication of Malaysia to 171 destinations; and t is the time period from 1996 to 2016):
      Code:
      xtset $id $t
      Code:
      . xtset $id $t
             panel variable:  id (strongly balanced)
              time variable:  t, 1 to 21
                    delta:  1 unit
      For the Hausman test:
      Code:
       xtreg lnexpy GDE lnGDPc lnPOPc HCcfrac INSc, fe
      Code:
      Fixed-effects (within) regression               Number of obs     =      3,591
      Group variable: id                              Number of groups  =        171
      
      R-sq:                                           Obs per group:
           within  = 0.0012                                         min =         21
           between =      .                                         avg =       21.0
           overall = 0.0002                                         max =         21
      
                                                      F(5,3415)         =       0.84
      corr(u_i, Xb)  = -0.0000                        Prob > F          =     0.5213
      
      ------------------------------------------------------------------------------
            lnexpy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               GDE |   .0455878   .2640788     0.17   0.863    -.4721806    .5633562
            lnGDPc |   .3780056   .4112761     0.92   0.358    -.4283666    1.184378
            lnPOPc |  -.8991652    .773179    -1.16   0.245    -2.415105     .616775
           HCcfrac |   .3137872   1.139772     0.28   0.783    -1.920916    2.548491
              INSc |   .2890588   .2471716     1.17   0.242    -.1955603     .773678
             _cons |   48.53091   11.16886     4.35   0.000     26.63257    70.42924
      -------------+----------------------------------------------------------------
           sigma_u |  2.2553056
           sigma_e |  1.0113474
               rho |  .83257725   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(170, 3415) = 104.43                 Prob > F = 0.0000
      
      estimates store fixed
      Code:
      xtreg lnexpy GDE lnGDPc lnPOPc HCcfrac INSc, re
      Code:
      Random-effects GLS regression                   Number of obs     =      3,591
      Group variable: id                              Number of groups  =        171
      
      R-sq:                                           Obs per group:
           within  = 0.0000                                         min =         21
           between = 0.0000                                         avg =       21.0
           overall = 0.0002                                         max =         21
      
                                                      Wald chi2(5)      =       4.20
      corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.5212
      
      ------------------------------------------------------------------------------
            lnexpy |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               GDE |   .0455878   .2640788     0.17   0.863    -.4719971    .5631727
            lnGDPc |   .3780056   .4112761     0.92   0.358    -.4280808    1.184092
            lnPOPc |  -.8991652    .773179    -1.16   0.245    -2.414568    .6162377
           HCcfrac |   .3137872   1.139772     0.28   0.783    -1.920124    2.547699
              INSc |   .2890588   .2471716     1.17   0.242    -.1953885    .7735062
             _cons |   48.53091   11.17018     4.34   0.000     26.63775    70.42407
      -------------+----------------------------------------------------------------
           sigma_u |  2.2444816
           sigma_e |  1.0113474
               rho |  .83123174   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      estimates store random
      Code:
      hausman fixed random
      Code:
                       ---- Coefficients ----
                   |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                   |     fixed        random       Difference          S.E.
      -------------+----------------------------------------------------------------
               GDE |    .0455878     .0455878        6.62e-11        5.53e-07
            lnGDPc |    .3780056     .3780056        3.40e-10        2.51e-06
            lnPOPc |   -.8991652    -.8991652       -1.75e-09        .0000181
           HCcfrac |    .3137872     .3137872        1.40e-09        .0000146
              INSc |    .2890588     .2890588        7.36e-11        7.58e-07
      ------------------------------------------------------------------------------
                                 b = consistent under Ho and Ha; obtained from xtreg
                  B = inconsistent under Ha, efficient under Ho; obtained from xtreg
      
          Test:  Ho:  difference in coefficients not systematic
      
                        chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                =        0.00
                      Prob>chi2 =      1.0000
                      (V_b-V_B is not positive definite)
      Hausman test with sigmamore:
      Code:
      hausman fixed random, sigmamore
      
      Note: the rank of the differenced variance matrix (4) does not equal the number
              of coefficients being tested (5); be sure this is what you expect, or
              there may be problems computing the test.  Examine the output of your
              estimators for anything unexpected and possibly consider scaling your
              variables so that the coefficients are on a similar scale.
      
                       ---- Coefficients ----
                   |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                   |     fixed        random       Difference          S.E.
      -------------+----------------------------------------------------------------
               GDE |    .0455878     .0455878        6.62e-11        5.53e-07
            lnGDPc |    .3780056     .3780056        3.40e-10        2.51e-06
            lnPOPc |   -.8991652    -.8991652       -1.75e-09        .0000181
           HCcfrac |    .3137872     .3137872        1.40e-09        .0000146
              INSc |    .2890588     .2890588        7.36e-11        7.58e-07
      ------------------------------------------------------------------------------
                                 b = consistent under Ho and Ha; obtained from xtreg
                  B = inconsistent under Ha, efficient under Ho; obtained from xtreg
      
          Test:  Ho:  difference in coefficients not systematic
      
                        chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                =        0.00
                      Prob>chi2 =      1.0000
                      (V_b-V_B is not positive definite)
      Thank you very much, I would highly appreciate any comments.

      Kind regards,
      Julie

      Comment


      • #4
        You are in the happy condition that there is zero correlation between the unobserved panel level effects identified in the fixed effects estimation and the included right hand side variables (see correlation in the fixed effects estimation results) which results in random effects and fixed effects giving you essentially identical parameter estimates.

        So, you can present both and say it doesn't matter. Or, you can go with fixed effects and say it's more robust. I wouldn't waste a lot of time with this kind of test when your parameters are identical to 7 decimals. The Hausman test probably failed because the variances for the parameters in the two sets of estimates are so incredibly close.

        If you look at the bottom of the fixed effects results, it provides an F test that the unobserved panel level effects are zero and this is strongly rejected. So you do have panel effects. It just happens that they're not correlated with the included right hand side variables.

        Comment


        • #5
          Julie:
          as you can see from Prob > F = 0.5213 <(-fe- specification) and Prob > chi2 = 0.5212 (-re-specification) your regression models are not better than the average of the regreessand, as you do not seem to have either within or between variation.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Dear Phil and Carlo,

            Thank you so much. This helps a lot.

            Have a lovely weekend,
            Julie

            Comment


            • #7
              Julie:

              It's virtually impossible to have six coefficients agree to seven decimal places and not have to two estimators be identical, by construction. There are not two estimators here; there's only one.

              Here is my hunch: that all explanatory variables change only over time and not across id. In other words, lnexpy varies across i and t, but the right hand side variables only vary across t. Is this correct? If so, FE and RE are always numerically identical. I'm thinking this is also why you did not include time period dummies, as those would wipe out every explanatory variable.

              I will add that, if I have described the problem correctly, then you're not really using the panel nature of the data. You could aggregate into a single time series and use a time series regression. If I'm incorrect then I need to see more summary statistics about your data.

              JW

              Comment


              • #8
                Hello Jeff,

                Yes, your assumption is correct! All explanatory variables (GDE, lnGDPc, lnPOPc, HCcfrac, INSc) change only over time, as they are Malaysia specific, and not across id. Whilst lnexpy changes across i and t. And that is also why I did not include time dummies, because they would just remove any effect of the explanatory variables.

                I want to know whether these Malaysia-specific variables, such as Malaysia's R&D investment, influence the country's export sophistication.

                To recap:
                Explanatory variables are GDE, RES, FDIF, FDIS, lnGDPc, lnPOPc, HCcfrac, INSc (with the last 4 variables being control variables). And the response variables is lnexpy.


                Code:
                describe $id $t $ylist $xlist
                Code:
                              storage   display    value
                variable name   type    format     label      variable label
                ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                id              int     %10.0g                id
                t               byte    %10.0g                t
                lnexpy          double  %10.0g                lnexpy
                GDE             double  %10.0g                GDE
                RES             double  %10.0g                RES
                FDIF            double  %10.0g                FDIF
                FDIS            double  %14.2f                FDIS
                lnGDPc          double  %10.0g                lnGDPc
                lnPOPc          double  %10.0g                lnPOPc
                HCcfrac         double  %10.0g                HCcfrac
                INSc            double  %10.0g                INSc
                Code:
                 xtsum $id $t $ylist $xlist
                Code:
                Variable         |      Mean   Std. Dev.       Min        Max |    Observations
                -----------------+--------------------------------------------+----------------
                id       overall |        86   49.36948          1        171 |     N =    3591
                         between |             49.50758          1        171 |     n =     171
                         within  |                    0         86         86 |     T =      21
                                 |                                            |
                t        overall |        11   6.056144          1         21 |     N =    3591
                         between |                    0         11         11 |     n =     171
                         within  |             6.056144          1         21 |     T =      21
                                 |                                            |
                lnexpy   overall |  37.17059   2.456059   27.61441   43.77088 |     N =    3591
                         between |             2.255306   35.59304   43.72121 |     n =     171
                         within  |             .9869958   29.06605   38.40653 |     T =      21
                                 |                                            |
                GDE      overall |  .7771429   .3418306        .22       1.44 |     N =    3591
                         between |                    0   .7771429   .7771429 |     n =     171
                         within  |             .3418306        .22       1.44 |     T =      21
                                 |                                            |
                RES      overall |  3.214286   2.167379         .5        7.3 |     N =    3591
                         between |             4.45e-16   3.214286   3.214286 |     n =     171
                         within  |             2.167379         .5        7.3 |     T =      21
                                 |                                            |
                FDIF     overall |  3.599524    1.45384         .6       7.24 |     N =    3591
                         between |             4.45e-16   3.599524   3.599524 |     n =     171
                         within  |              1.45384         .6       7.24 |     T =      21
                                 |                                            |
                FDIS     overall |  41.03429   8.489167      30.97      62.44 |     N =    3591
                         between |                    0   41.03429   41.03429 |     n =     171
                         within  |             8.489167      30.97      62.44 |     T =      21
                                 |                                            |
                lnGDPc   overall |  9.863192   .1714436   9.602382   10.17209 |     N =    3591
                         between |                    0   9.863192   9.863192 |     n =     171
                         within  |             .1714436   9.602382   10.17209 |     T =      21
                                 |                                            |
                lnPOPc   overall |  17.07103   .1148203   16.86087   17.23928 |     N =    3591
                         between |                    0   17.07103   17.07103 |     n =     171
                         within  |             .1148203   16.86087   17.23928 |     T =      21
                                 |                                            |
                HCcfrac  overall |  .3107463   .0823402   .1453082    .467621 |     N =    3591
                         between |                    0   .3107463   .3107463 |     n =     171
                         within  |             .0823402   .1453082    .467621 |     T =      21
                                 |                                            |
                INSc     overall |  .4430952    .083855        .27        .59 |     N =    3591
                         between |                    0   .4430952   .4430952 |     n =     171
                         within  |              .083855        .27        .59 |     T =      21
                Thank you for your help, please let me know if you need further clarification of the dataset,
                Would be more than grateful for your advice,
                Julie
                Last edited by Julie Iloul; 02 May 2020, 07:17.

                Comment


                • #9
                  Julie: With your data, there is only one estimator. It can be computed many different ways. Pooled OLS using

                  Code:
                   
                   reg lnexpy GDE lnGDPc lnPOPc HCcfrac INSc, vce(cluster id)
                  will give you the same estimates (and valid standard errors). In fact, if you take each time period and compute the average of your y(i,t) across i, to get ybar(t), and then use a time series regression, you'll get exactly the same result. That's what I meant by the panel data is not doing anything for you. You might as well replace the dependent variable, which changes across i and t, with the average across i at every time period.

                  Are there any variables that change with i? That would be a more interesting analysis. For example, how to exports from Malaysia to other countries depend on the currency exchange rate, or some measure of trade openness. Of course, that would require collecting more data.

                  If you stick with what you have, there are no decisions to be made about estimators: they're all the same. You should cluster your standard errors by id.

                  As a general rule for the future, when using highly aggregated data it is almost always necessary to use fixed effects to account for differences across units. Here it does not matter.

                  JW

                  Comment


                  • #10
                    Dear Jeff,

                    This makes so much more sense now. I felt that it was a kind of wrong to have explanatory variables not changing throughout the id. Since I still have a few weeks before the deadline, I will try to find destination-specific variables and collect the data.

                    Thank you very much, Jeff.

                    Best regards,
                    Julie

                    Comment


                    • #11
                      Hi all,
                      I have the following after pmg and mg
                      Though, there is no problem of rank correlation with the number of variables tested, yet I have the following
                      hausman mg pmg, sigmamore

                      ---- Coefficients ----
                      | (b) (B) (b-B) sqrt(diag(V_b-V_B))
                      | mg pmg Difference S.E.
                      -------------+----------------------------------------------------------------
                      fpi | -65.95131 .7529812 -66.70429 80.96974
                      fdi | -.0560912 .0121663 -.0682575 .0810818
                      cbpd | -.055054 -.011595 -.043459 .0460314
                      lmscp |
                      L1. | 1.86557 .7804481 1.085122 1.343731
                      ------------------------------------------------------------------------------
                      b = consistent under Ho and Ha; obtained from xtpmg
                      B = inconsistent under Ha, efficient under Ho; obtained from xtpmg

                      Test: Ho: difference in coefficients not systematic

                      chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                      = 4.74
                      Prob>chi2 = 0.3153
                      (V_b-V_B is not positive definite)


                      The " not positive definite) is the issue. How can I solve the problem?

                      Comment


                      • #12
                        Opeyemi:
                        see https://www.stata.com/statalist/arch.../msg00085.html
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Dear Carlo,
                          I do appreciate your feedback .

                          Kind regards
                          Opeyemi

                          Comment

                          Working...
                          X