Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping or robust standard errors?

    Hi, I am new to stata and self-thought, so please have overlook if I have missunderstood anything. I am currently writing a theisis in finance and have a panel-data sample that is both heteroskedasic and autocorrlate. To fix it I have gotten the impression that I should use fixed effect (after a Hausman test) and then apply a vce for boot or robust. I get different result doing so, so I'm a bit uncertain which one to use. I will attach both of the regressions.

    Click image for larger version

Name:	stataforum1.PNG
Views:	1
Size:	27.0 KB
ID:	1530895


    Click image for larger version

Name:	stataforum2.PNG
Views:	1
Size:	26.0 KB
ID:	1530896

    Best regards, Clara


  • #2
    Dear Clara Simonsson,

    The two sets of results are quite similar, but there is no advantage in using bootstrap, so I would stick to the robust (clustered) standard errors.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Joao Santos Silva

      Thank you for your reply. Does bootstrapping helps against autocorrelation and heteroskedacity as well? Since my female variables significance changes if I use robust I'd really want it to make sure I use the correct one.

      Best regards,
      Clara
      Last edited by Clara Simonsson; 06 Jan 2020, 06:23.

      Comment


      • #4
        The results are asymptotically equivalent; both deal with hetero and serial correlation.

        Best wishes,

        Joao

        Comment


        • #5
          Clara:
          as an aside to Joao's helpful advice, please note that it not correct to run -hausman- and then invoke non-default standard errors.
          If you detected heteroskedasticity and/or autocorrelation in your dataset and you wisely invoked clustered robust standard errors to deal with both these nuisances, you should leave -hausman- and switch to the community-contributed command -xtoverid- to test which specification fits your data better.
          Just type -search xtoverid- to spot and install it.
          As you can see from the following toy-example:
          - being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. Hence, you should prefix your regression code with -xi:-;
          - there's no need to run both -xtreg,fe- and -xtreg, re- and save their estimates. You can simply run the latter and then invoke -xtoverid-. If the p-value gives no evidence of rejecting the null, go -re-; otherwise, switch to -fe-:
          Code:
          . use "http://www.stata-press.com/data/r16/nlswork.dta"
          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
          
          . xi: xtreg ln_wage i.race tenure, re vce(cluster idcode)
          i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)
          
          Random-effects GLS regression                   Number of obs     =     28,101
          Group variable: idcode                          Number of groups  =      4,699
          
          R-sq:                                           Obs per group:
               within  = 0.0972                                         min =          1
               between = 0.2079                                         avg =        6.0
               overall = 0.1569                                         max =         15
          
                                                          Wald chi2(3)      =    1797.00
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
                                       (Std. Err. adjusted for 4,699 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              _Irace_2 |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
              _Irace_3 |   .1039944    .062132     1.67   0.094     -.017782    .2257708
                tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
                 _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
          -------------+----------------------------------------------------------------
               sigma_u |  .33623102
               sigma_e |  .30357621
                   rho |  .55090591   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . xtoverid
          
          Test of overidentifying restrictions: fixed vs random effects
          Cross-section time-series model: xtreg re  robust cluster(idcode)
          Sargan-Hansen statistic 249.947  Chi-sq(1)    P-value = 0.0000
          
          .
          In the toy-example reported above, -xtoverid- outcome rejects the null and points towards -fe- specification.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Excellent point, Carlo Lazzaro.

            Comment


            • #7
              Thanks, Joao.
              Very flattering and really appreciated at the same time.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you both Carlo Lazzaro and Joao Santos Silva for excellent help. During my regressions I met another problem with my panel data. The data I am investigating looks at companies risk (stocks standard deviation) during eight years. As the regression is now, I haven't controlled for time-effects. Doing this by making year to a dummy makes the whole result change a lot. If I do a testparm I get Prob>F = 0.000 so I assume I should include year dummies and change my result. But I wonder if there is something else I can do to control for year other than the dummy variable?

                Kind regards,
                Clara

                Comment


                • #9
                  Clara:
                  no wonder that results change when a new predictor is plugged in the right-hand side of the regression equation.
                  I would stick with time as a categorical variable.
                  Another option to model time is looking for turning points, by adding a squared coefficient in addition to the linear one (all in all, it boils down to interact time with itself):
                  Code:
                  c.time##c.time
                  As an aside (and an amateur's advice, since my last experience with financial data is lost somewhere in the past millenium), since you're working on financial data, have you already ruled out cross-panel correlation due to any shock which is common to all the companies included in your datasets?
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Carlo, thank you for your quick response.
                    I am not sure I understand that I should use time as a categorial variable. As the xtreg code is right now it looks like:
                    Code:
                    xtreg Totalrisk FEMALE AGE INDEPENDENT EMPLOYEE BOARD_SIZE LOG_FIRM_SIZE WROA WDEBT_EQUITY i.YEAR, fe vce(robust)
                    As for cross-panel correlation, is that the same as multicollinarity? I'm at bachelor-level finance so it might be above my knowledge.

                    Comment


                    • #11
                      Clara:
                      - -i.year- is the correct -fvvarlist- notation for treating time as a categorical variable;
                      - not quite. Autocorrelation=serial correlation of the epsilon error;
                      - multicollinearity means that two predictors, non-technically speaking, give roughly the same information: As is difficult/impossible to disentangle the informative contribution of each one of them, one out of the two is kicked out;
                      -cross-panel correlation of the epsilon error. If a generalized credit-crunch affects all the companies of a given nation, no matter the industry they belong to, in all likelihood their epsilon errors will be correlated not only within but also between panels.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment

                      Working...
                      X