Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • imputation for missing values and point estimate when you have a variable which is completely bracketed

    lm a new user of stata and forced to jump in to bigger issues already. lm not sure about the imputations l did so just need someone to confirm for me its its correct. this is how my variable is.
    Income Freq. Percent Cum.
    1. no income 1,136,644 49.06 49.06
    2. R1 - R400 213,710.9 9.22 58.28
    3. R401 - R800 362,154.39 15.63 73.91
    4. R801 - R1600 199,897.98 8.63 82.54
    5. R1601 - R3200 172,102.47 7.43 89.97
    6. R3201 - R6400 125,421.5 5.41 95.38
    7. R6401 - R12 800 66,210.959 2.86 98.24
    8. R12 801 - R25 600 25,469.541 1.10 99.34
    9. R25 601 - R51 200 8,683.8753 0.37 99.72
    10. R51 201 - R102 400 3,364.3909 0.15 99.86
    11. R102 401 - R204 800 2,186.1542 0.09 99.96
    12. R204 801+ 1,028.276 0.04 100.00
    Total 2,316,874 100.00
    To impute for missing this is what l did.
    /*Redefing zero income: set equal to
    1. missing if individual is age>=15, zero if age<15
    2. missing if age<15& income>6
    3. missing if income==0& employed==1
    my actual imputation

    mi set mlong
    mi register impute income
    mi impute ologit income age i.province i.industry hours_worked i.occupation ///
    i.educ_grpd gender race residence employed i.married i.citizen, add(10) force
    mi estimate, saving(miest,replace): ologit income age i.province i.industry ///
    hours_worked i.occupation i.educ_grpd gender race residence employed ///
    i.married i.citizen
    mi predictnl income_hat = predict(xb) using miest if employed==1
    gen income_imputed=1 if employed==1&income==.&income_hat!=.
    replace income_imputed=0 if income_imputed!=1
    replace income=income_hat if income_imputed==1

    After this l want to impute for point estimate for income to get a continuous variable and lm not sure how to go about it.
    from the literature this is what l picked
    1. generate a CDF for different distributions like normal, pareto,uniform and lognormal for each band.
    2. then generate random probabilities for each individual
    3. then assign income such that the cumulative probability of observing such a value from the distributions is >= to generated probability.
    lm not sure about these 3 steps.

    lm not doing a study on this income variable going to use it as my dependent variable.

    Sorry its a bit long
    thank you
Working...
X