Dear all,
I would like to kindly ask for suggestions on the following imputation/random assignment problem:
Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):
Now, lets have a look at my example data:
Further, let's assume that the frequency weights are distributed in the following (very unequal) way:
Now, I generate the new variable X and randomly assign its values to the observations, replicating the initial distribution:
Replicating the initial sample distribution has worked relatively well in the unweighted case:

However, when tabulating the weighted frequency distribution, the random assignment has not worked well:

Does anyone have a suggestion how I can randomly assign a categorical variable, taking weights into account when they are unequally distributed?
Any suggestion is greatly appreciated!
Andreas
-----------------------------------------------------------
Plain Stata code:
I would like to kindly ask for suggestions on the following imputation/random assignment problem:
Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):
X | % | Cum. % | |
1 | 20% | 20% | |
2 | 20% | 40% | |
3 | 30% | 70% | |
4 | 30% | 100% | |
Total | 100% |
Now, lets have a look at my example data:
Code:
sysuse nlsw88, clear
Code:
gen weight = (2.25/(_n^(2.5)))*20000
Code:
set seed 123 gen random = runiform() gen x=0 replace x=1 if random < .2 replace x=2 if inrange(random,.2,.4) replace x=3 if inrange(random,.4,.7) replace x=4 if random>.7
Code:
tabulate x
However, when tabulating the weighted frequency distribution, the random assignment has not worked well:
Code:
tabulate x [aw=weight]
Does anyone have a suggestion how I can randomly assign a categorical variable, taking weights into account when they are unequally distributed?
Any suggestion is greatly appreciated!
Andreas
-----------------------------------------------------------
Plain Stata code:
Code:
sysuse nlsw88, clear gen weight = (2.25/(_n^(2.5)))*20000 sum weight set seed 123 gen random = runiform() gen x=0 replace x=1 if random < .2 replace x=2 if inrange(random,.2,.4) replace x=3 if inrange(random,.4,.7) replace x=4 if random>.7 tab x tab x [aw=weight]
Comment