Help_Bootstrap

redline zhang

Join Date: May 2015

Posts: 8
#1

Help_Bootstrap

18 May 2015, 19:57

Dear Stata-Community,

I want to bootstrap some of my FGLS regressions.

My FGLS regression code is

"xtset id year, yearly

set matsize 2000

xtgls wreturn wpercorr wretiree yd*, panels(hetero) corr(psar1) force"

This FGLS regression works well and I got my regression results.

When I try to bootstrap the FGLS regression, I use the following code:

"program fgls, eclass
quietly {
xtgls wreturn wpercorr wretiree yd*, panels(hetero) corr(psar1) force
}
end

bootstrap fgls _b, reps(1000) noesample cluster(stateid)"

However, Stata gave me the following error.

"insufficient observations to compute bootstrap standard errors
no results will be saved"

Could anyone tell me what is going on? How can I fix my errors?

Thanks and best regards,

Redline
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

18 May 2015, 20:27

bootstrap fgls _b, reps(1000) noesample cluster(stateid)"

That is the syntax for -bootstrap- many versions back. But no more. What's surprising is that anything happened at all other than a syntax error.

Try:

Code:

bootstrap _b, reps(1000) cluster(stateid): xtgls wreturn wpercorr wretiree yd*, panels(hetero) corr(psar1)

Also, probably to be on the safe side you should specify the idcluster() option as well.

There is no -noesample- option in the current Stata's version of bootstrap. I don't recall what it used to do, so I can't be sure if you need to do something else to accomplish that.

Secondarily, though it does no harm, wrapping your single -xtgls- command in a program is unnecessary.

I have also removed the -force- option from your -xtgls- command. If your data are equally spaced, then you don't need this option: the cluster sampling will maintain that. If your data are not equally spaced, -force- will simply conceal the fact that you are getting dubious results.

There are several commands in Stata that have a -force- option. But that option should really be named -force_at_your_own_peril-. It is only rarely wise to use it.
Comment
redline zhang

Join Date: May 2015

Posts: 8
#3

19 May 2015, 00:18

Hi Clyde,

Thank you very much for your reply. I tried the code you suggested, and Stata gave me the following error,

"repeated time values within panel
the most likely cause for this error is misspecifying the cluster(), idcluster(), or group() option"

I tried to put the code "xtset id year, yearly" before I bootstrap my regression, but Stata kept reminding me the same error.

I don't think there is anything wrong with the cluster(). The following code worked when I ran the FGLS regression without bootstraping.

"xtset id year, yearly
xtgls wreturn wpercorr wretiree yd*, panels(hetero) corr(psar1) force"

Is there anything I need to do to fix my mistake?

Thanks,

Redline
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

19 May 2015, 09:27

The message you got means exactly what it says. You have repeated observations for the same values of id and year. That is not allowed in -xtset- data. The solution to the problem depends on whether this means your data is corrupted or not. If there is only supposed to be one observation for each combination of id and year, then there is something wrong with your data and you need to go back and clean up your data set. -duplicates list id year- will help you identify the offending surplus observations. If they are completely identical observations, then you could just drop all but one, -duplicates drop-. If they conflict then you have to decide either which is correct or how to combine them and reduce to a single observation.

If, on the other hand, your data can legitimately have more than one observation for the same id and year, then you have different problem: in order to carry out a regression model with an autoregressive structure, you must have single observations for each id and year, and the years must be equally spaced within each id. Clearly that is not what you have, and if your data are in fact correct, then you need to figure out what other model to use.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

19 May 2015, 09:37

I think the implication is that the repeated observations arise when bootstrapping. Bootstrapping out of a time series or panel data context inevitably means that, in almost all the bootstrap samples, some observations will be repeated in the sample. (The exception is necessarily when the bootstrap sample happens to coincide one to one with the original dataset.)

In a time series or panel data context that clashes with information supplied to tsset or xtset.

Otherwise put, naive bootstrapping is not compatible with panel data modelling. Whether other kinds of bootstrap are a good idea here is a question on which I will let others opine.
Comment
redline zhang

Join Date: May 2015

Posts: 8
#6

19 May 2015, 10:31

Clyde,

Thank you for your response. I really appreciate it. I checked my data. I don't have duplicates for each id. My problem is my id is unbalanced. There are some missing observations in some years for some id.

When I run "xtset id year, yearly", Stata gave me the following result:

. xtset id year, yearly
panel variable: id (unbalanced)
time variable: year, 93 to 2008, but with gaps
delta: 1 year

The unbalanced id does not affect my xtgls regression. Does it cause the problem when I tried to bootstrap my regression?

Thanks,

Redline
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

19 May 2015, 10:34

redline: My post #5 already answered your question in #6. You would not have been successful in applying xtset had there been repeated observations. It's the bootstrapping that's problematic. I don't have advice on what to do instead.
Comment
redline zhang

Join Date: May 2015

Posts: 8
#8

19 May 2015, 10:56

Nick, Thank you for your reply. Does anyone have an idea to do other kinds of bootstrap to avoid the repeated observations problem? Thanks, Redline
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

19 May 2015, 11:29

From page 232 of [R]

Similarly, when you have panel (longitudinal) data, all resampled panels must be unique
in each of the bootstrap samples to obtain correct bootstrap estimates of statistics. Therefore,
both cluster(panelvar) and idcluster(newpanelvar) must be specified with bootstrap, and
i(newpanelvar) must be used with the main command. Moreover, you must clear the current xtset
settings by typing xtset, clear before calling bootstrap.

So I think you need to do this:

Code:

xtset, clear

Code:

bootstrap _b, reps(1000) cluster(stateid) idcluster(new_state_id): /// xtgls wreturn wpercorr wretiree yd*, panels(hetero) corr(psar1) i(new_state_id) t(year)

This should work. The changes to the original code are in bold face. Note that although the passage from the manual does not mention setting the t() option on xtgls, in this case it is necessary because a time variable must be set in order to estimate autoregressive models.
Comment
redline zhang

Join Date: May 2015

Posts: 8
#10

20 May 2015, 11:41

Hi Clyde,

I tried the code you suggested. It works! Thank you so much for your help. I appreciate it.

Redline
Comment
Majdala dakh

Join Date: Apr 2016

Posts: 6
#11

05 Apr 2016, 13:22

Dear All,

I would like to use the bootsrap estimation in order to avoid the problem of small sample size (N) but STATA displays the following error:

. bootstrap _b: xtgls ccp qa1 trans1 iqae taille lev roa age, panels (hetero) corr (ar1)
(running xtgls on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50
insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

I have tried several solutions but the problem is not solved. Can you help me? Especially how can I specify the cluster variable?

Many thanks!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment