Bootstrapping or robust standard errors?

Clara Simonsson

Join Date: Jan 2020

Posts: 7
#1

Bootstrapping or robust standard errors?

06 Jan 2020, 03:47

Hi, I am new to stata and self-thought, so please have overlook if I have missunderstood anything. I am currently writing a theisis in finance and have a panel-data sample that is both heteroskedasic and autocorrlate. To fix it I have gotten the impression that I should use fixed effect (after a Hausman test) and then apply a vce for boot or robust. I get different result doing so, so I'm a bit uncertain which one to use. I will attach both of the regressions.

Best regards, Clara
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#2

06 Jan 2020, 06:09

Dear Clara Simonsson,

The two sets of results are quite similar, but there is no advantage in using bootstrap, so I would stick to the robust (clustered) standard errors.

Best wishes,

Joao
Comment
Clara Simonsson

Join Date: Jan 2020

Posts: 7
#3

06 Jan 2020, 06:14

Dear Joao Santos Silva

Thank you for your reply. Does bootstrapping helps against autocorrelation and heteroskedacity as well? Since my female variables significance changes if I use robust I'd really want it to make sure I use the correct one.

Best regards,
Clara

Last edited by Clara Simonsson; 06 Jan 2020, 06:23.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#4

06 Jan 2020, 06:49

The results are asymptotically equivalent; both deal with hetero and serial correlation.

Best wishes,

Joao
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17675

06 Jan 2020, 07:19

Clara:
as an aside to Joao's helpful advice, please note that it not correct to run -hausman- and then invoke non-default standard errors.
If you detected heteroskedasticity and/or autocorrelation in your dataset and you wisely invoked clustered robust standard errors to deal with both these nuisances, you should leave -hausman- and switch to the community-contributed command -xtoverid- to test which specification fits your data better.
Just type -search xtoverid- to spot and install it.
As you can see from the following toy-example:
- being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. Hence, you should prefix your regression code with -xi:-;
- there's no need to run both -xtreg,fe- and -xtreg, re- and save their estimates. You can simply run the latter and then invoke -xtoverid-. If the p-value gives no evidence of rejecting the null, go -re-; otherwise, switch to -fe-:

Code:

. use "http://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xi: xtreg ln_wage i.race tenure, re vce(cluster idcode)
i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.2079                                         avg =        6.0
     overall = 0.1569                                         max =         15

                                                Wald chi2(3)      =    1797.00
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,699 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Irace_2 |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
    _Irace_3 |   .1039944    .062132     1.67   0.094     -.017782    .2257708
      tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
       _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
-------------+----------------------------------------------------------------
     sigma_u |  .33623102
     sigma_e |  .30357621
         rho |  .55090591   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic 249.947  Chi-sq(1)    P-value = 0.0000

.

In the toy-example reported above, -xtoverid- outcome rejects the null and points towards -fe- specification.

Kind regards,
Carlo
(Stata 19.0)

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#6

07 Jan 2020, 07:44

Excellent point, Carlo Lazzaro.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17675
#7

07 Jan 2020, 07:46

Thanks, Joao.
Very flattering and really appreciated at the same time.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clara Simonsson

Join Date: Jan 2020

Posts: 7
#8

08 Jan 2020, 02:44

Thank you both Carlo Lazzaro and Joao Santos Silva for excellent help. During my regressions I met another problem with my panel data. The data I am investigating looks at companies risk (stocks standard deviation) during eight years. As the regression is now, I haven't controlled for time-effects. Doing this by making year to a dummy makes the whole result change a lot. If I do a testparm I get Prob>F = 0.000 so I assume I should include year dummies and change my result. But I wonder if there is something else I can do to control for year other than the dummy variable?

Kind regards,
Clara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17675
#9

08 Jan 2020, 02:55

Clara:
no wonder that results change when a new predictor is plugged in the right-hand side of the regression equation.
I would stick with time as a categorical variable.
Another option to model time is looking for turning points, by adding a squared coefficient in addition to the linear one (all in all, it boils down to interact time with itself):

Code:

c.time##c.time

As an aside (and an amateur's advice, since my last experience with financial data is lost somewhere in the past millenium), since you're working on financial data, have you already ruled out cross-panel correlation due to any shock which is common to all the companies included in your datasets?

Kind regards,
Carlo
(Stata 19.0)
Comment
Clara Simonsson

Join Date: Jan 2020

Posts: 7
#10

08 Jan 2020, 04:41

Carlo, thank you for your quick response.
I am not sure I understand that I should use time as a categorial variable. As the xtreg code is right now it looks like:

Code:

xtreg Totalrisk FEMALE AGE INDEPENDENT EMPLOYEE BOARD_SIZE LOG_FIRM_SIZE WROA WDEBT_EQUITY i.YEAR, fe vce(robust)

As for cross-panel correlation, is that the same as multicollinarity? I'm at bachelor-level finance so it might be above my knowledge.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17675
#11

08 Jan 2020, 06:38

Clara:
- -i.year- is the correct -fvvarlist- notation for treating time as a categorical variable;
- not quite. Autocorrelation=serial correlation of the epsilon error;
- multicollinearity means that two predictors, non-technically speaking, give roughly the same information: As is difficult/impossible to disentangle the informative contribution of each one of them, one out of the two is kicked out;
-cross-panel correlation of the epsilon error. If a generalized credit-crunch affects all the companies of a given nation, no matter the industry they belong to, in all likelihood their epsilon errors will be correlated not only within but also between panels.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement