Dear statalist-forum,
I have a hopefully straightforward question and would be very grateful for any help. I use Stata/SE 14.2.
I would like to draw with replacement from a subset of my data, where the subset is defined by a certain condition that is fulfilled by say n_1<_N observations. I would would to draw more than n_1 times with replacement from this subset of n_1 observations.
(Background: I have wealth data of size _N and a certain wealth value w that splits the wealth data into two halves. wealth < w is the condition that defines the above-mentioned subset. I want to create many synthetic datasets from these original wealth data, each of size _N, but I want to randomize in each synthetic dataset over whether I draw from the subset (probability n_1/N) in the first place, or from some theoretical distribution (probability 1-n_1/N). This is why it can occur that I want to draw more than n_1 times from the subset defined by wealth<w.)
I thought the command bsample with the if-condition for defining the subset to be be drawn from with replacement would be a handy option. However, a crucial restriction of bsample is that the number of draws must not be higher than the number of observations drawn from. To illustrate with a simple example, I get the following error:
In the above example (and against the background described above), what would be a short way to draw 8 times with replacement from the subset defined by index<3?
Looking forward to any advise,
Tom Storwitz
I have a hopefully straightforward question and would be very grateful for any help. I use Stata/SE 14.2.
I would like to draw with replacement from a subset of my data, where the subset is defined by a certain condition that is fulfilled by say n_1<_N observations. I would would to draw more than n_1 times with replacement from this subset of n_1 observations.
(Background: I have wealth data of size _N and a certain wealth value w that splits the wealth data into two halves. wealth < w is the condition that defines the above-mentioned subset. I want to create many synthetic datasets from these original wealth data, each of size _N, but I want to randomize in each synthetic dataset over whether I draw from the subset (probability n_1/N) in the first place, or from some theoretical distribution (probability 1-n_1/N). This is why it can occur that I want to draw more than n_1 times from the subset defined by wealth<w.)
I thought the command bsample with the if-condition for defining the subset to be be drawn from with replacement would be a handy option. However, a crucial restriction of bsample is that the number of draws must not be higher than the number of observations drawn from. To illustrate with a simple example, I get the following error:
Code:
. set obs 10 number of observations (_N) was 0, now 10 . gen index = _n . bsample 8 if index<3 resample size must not be greater than number of observations r(498);
Looking forward to any advise,
Tom Storwitz
Comment