Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random assignment of a categorical variable - with know frequency distribution - taking weights into account

    (Stata: Version 14.2)


    Dear all,

    I would like to kindly ask for suggestions on the following imputation/random assignment problem:

    Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):
    X % Cum. %
    1 20% 20%
    2 20% 40%
    3 30% 70%
    4 30% 100%
    Total 100%
    Now, lets have a look at my example data:

    sysuse nlsw88, clear

    // Further, let's assume that the frequency weights are distributed in the following (very unequal) way:

    gen weight = (2.25/(_n^(2.5)))*20000

    //Now, I generate the new variable X and randomly assign its values to the observations, replicating the initial distribution:

    set seed 123
    gen random = runiform()

    gen x=0
    replace x=1 if random < .2
    replace x=2 if inrange(random,.2,.4)
    replace x=3 if inrange(random,.4,.7)
    replace x=4 if random>.7

    //Replicating the initial sample distribution has worked relatively well in the unweighted case:

    tabulate x
    X Freq % Cum. %
    1 439 19.55% 19.55%
    2 440 19.59% 39.14%
    3 695 30.94% 70.08%
    4 672 29.92% 100%
    Total 2246 100%
    However, when tabulating the weighted frequency distribution, the random assignment has not worked well:

    tabulate x [aw=weight]
    X Freq % Cum. %
    1 55.4565667 2.47% 2.47%
    2 1,690.2808 75.26% 77.73%
    3 312.920003 13.93% 91.66%
    4 187.342635 8.34% 100%
    Total 2,246 100%
    Does anyone have a suggestion randomly assigning a categorical variable, taking frequency weights into account when weights are unequally distributed?

    Any suggestion is greatly appreciated!




    -----------------------------------------------------------
    Plain Stata code:

    sysuse nlsw88, clear

    gen weight = (2.25/(_n^(2.5)))*20000
    sum weight

    set seed 123
    gen random = runiform()

    gen x=0
    replace x=1 if random < .2
    replace x=2 if inrange(random,.2,.4)
    replace x=3 if inrange(random,.4,.7)
    replace x=4 if random>.7

    tab x
    tab x [aw=weight]
    Last edited by Andreas Thiemann; 20 Mar 2017, 08:50.

  • #2
    (Stata: Version 14.2)


    Dear all,

    I would like to kindly ask for suggestions on the following imputation/random assignment problem:

    Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):

    X % Cum. %
    1 20% 20%
    2 20% 40%
    3 30% 70%
    4 30% 100%
    Total 100%
    Now, lets have a look at my example data:

    Code:
    sysuse nlsw88, clear
    // Further, let's assume that the frequency weights are distributed in the following (very unequal) way:

    Code:
    gen weight = (2.25/(_n^(2.5)))*20000
    //Now, I generate the new variable X and randomly assign its values to the observations, replicating the initial distribution:

    Code:
    set seed 123
    gen random = runiform()
    
    gen x=0
    replace x=1 if random < .2
    replace x=2 if inrange(random,.2,.4)
    replace x=3 if inrange(random,.4,.7)
    replace x=4 if random>.7
    //Replicating the initial sample distribution has worked relatively well in the unweighted case:

    Code:
    tabulate x
    Click image for larger version

Name:	tabx.png
Views:	2
Size:	4.3 KB
ID:	1379249




    However, when tabulating the weighted frequency distribution, the random assignment has not worked well:

    Code:
    tabulate x [aw=weight]
    Click image for larger version

Name:	tabxw.png
Views:	2
Size:	5.0 KB
ID:	1379250



    Does anyone have a suggestion randomly assigning a categorical variable, taking frequency weights into account when weights are unequally distributed?

    Any suggestion is greatly appreciated!




    -----------------------------------------------------------
    Plain Stata code:

    Code:
    sysuse nlsw88, clear
    
    gen weight = (2.25/(_n^(2.5)))*20000
    sum weight
    
    set seed 123
    gen random = runiform()
    
    gen x=0
    replace x=1 if random < .2
    replace x=2 if inrange(random,.2,.4)
    replace x=3 if inrange(random,.4,.7)
    replace x=4 if random>.7
    
    tab x
    tab x [aw=weight]
    Last edited by Andreas Thiemann; 20 Mar 2017, 09:20.

    Comment

    Working...
    X