Random assignment of a categorical variable - with know frequency distribution - taking weights into account

Andreas Thiemann

Join Date: Feb 2017

Posts: 4
#1

Random assignment of a categorical variable - with know frequency distribution - taking weights into account

20 Mar 2017, 07:30

(Stata: Version 14.2)

Dear all,

I would like to kindly ask for suggestions on the following imputation/random assignment problem:

Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):
X % Cum. %

1 20% 20%

2 20% 40%

3 30% 70%

4 30% 100%

Total 100%

Now, lets have a look at my example data:

sysuse nlsw88, clear

// Further, let's assume that the frequency weights are distributed in the following (very unequal) way:

gen weight = (2.25/(_n^(2.5)))*20000

//Now, I generate the new variable X and randomly assign its values to the observations, replicating the initial distribution:

set seed 123
gen random = runiform()

gen x=0
replace x=1 if random < .2
replace x=2 if inrange(random,.2,.4)
replace x=3 if inrange(random,.4,.7)
replace x=4 if random>.7

//Replicating the initial sample distribution has worked relatively well in the unweighted case:

tabulate x

X Freq % Cum. %

1 439 19.55% 19.55%

2 440 19.59% 39.14%

3 695 30.94% 70.08%

4 672 29.92% 100%

Total 2246 100%

However, when tabulating the weighted frequency distribution, the random assignment has not worked well:

tabulate x [aw=weight]

X Freq % Cum. %

1 55.4565667 2.47% 2.47%

2 1,690.2808 75.26% 77.73%

3 312.920003 13.93% 91.66%

4 187.342635 8.34% 100%

Total 2,246 100%

Does anyone have a suggestion randomly assigning a categorical variable, taking frequency weights into account when weights are unequally distributed?

Any suggestion is greatly appreciated!

-----------------------------------------------------------
Plain Stata code:

sysuse nlsw88, clear

gen weight = (2.25/(_n^(2.5)))*20000
sum weight

set seed 123
gen random = runiform()

gen x=0
replace x=1 if random < .2
replace x=2 if inrange(random,.2,.4)
replace x=3 if inrange(random,.4,.7)
replace x=4 if random>.7

tab x
tab x [aw=weight]

Last edited by Andreas Thiemann; 20 Mar 2017, 07:50.
Tags: None
Andreas Thiemann

Join Date: Feb 2017

Posts: 4
#2

20 Mar 2017, 08:05

(Stata: Version 14.2)

Dear all,

I would like to kindly ask for suggestions on the following imputation/random assignment problem:

Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):

X % Cum. %

1 20% 20%

2 20% 40%

3 30% 70%

4 30% 100%

Total 100%

Now, lets have a look at my example data:

Code:

sysuse nlsw88, clear

// Further, let's assume that the frequency weights are distributed in the following (very unequal) way:

Code:

gen weight = (2.25/(_n^(2.5)))*20000

//Now, I generate the new variable X and randomly assign its values to the observations, replicating the initial distribution:

Code:

set seed 123 gen random = runiform() gen x=0 replace x=1 if random < .2 replace x=2 if inrange(random,.2,.4) replace x=3 if inrange(random,.4,.7) replace x=4 if random>.7

//Replicating the initial sample distribution has worked relatively well in the unweighted case:

Code:

tabulate x

However, when tabulating the weighted frequency distribution, the random assignment has not worked well:

Code:

tabulate x [aw=weight]

Does anyone have a suggestion randomly assigning a categorical variable, taking frequency weights into account when weights are unequally distributed?

Any suggestion is greatly appreciated!

-----------------------------------------------------------
Plain Stata code:

Code:

sysuse nlsw88, clear gen weight = (2.25/(_n^(2.5)))*20000 sum weight set seed 123 gen random = runiform() gen x=0 replace x=1 if random < .2 replace x=2 if inrange(random,.2,.4) replace x=3 if inrange(random,.4,.7) replace x=4 if random>.7 tab x tab x [aw=weight]

Last edited by Andreas Thiemann; 20 Mar 2017, 08:20.
Comment

X	%	Cum. %
1	20%	20%
2	20%	40%
3	30%	70%
4	30%	100%
Total	100%

X	Freq	%	Cum. %
1	439	19.55%	19.55%
2	440	19.59%	39.14%
3	695	30.94%	70.08%
4	672	29.92%	100%
Total	2246	100%

X	Freq	%	Cum. %
1	55.4565667	2.47%	2.47%
2	1,690.2808	75.26%	77.73%
3	312.920003	13.93%	91.66%
4	187.342635	8.34%	100%
Total	2,246	100%

X	%	Cum. %
1	20%	20%
2	20%	40%
3	30%	70%
4	30%	100%
Total	100%

Announcement

Random assignment of a categorical variable - with know frequency distribution - taking weights into account

Comment