lm a new user of stata and forced to jump in to bigger issues already. lm not sure about the imputations l did so just need someone to confirm for me its its correct. this is how my variable is.
To impute for missing this is what l did.
/*Redefing zero income: set equal to
1. missing if individual is age>=15, zero if age<15
2. missing if age<15& income>6
3. missing if income==0& employed==1
my actual imputation
mi set mlong
mi register impute income
mi impute ologit income age i.province i.industry hours_worked i.occupation ///
i.educ_grpd gender race residence employed i.married i.citizen, add(10) force
mi estimate, saving(miest,replace): ologit income age i.province i.industry ///
hours_worked i.occupation i.educ_grpd gender race residence employed ///
i.married i.citizen
mi predictnl income_hat = predict(xb) using miest if employed==1
gen income_imputed=1 if employed==1&income==.&income_hat!=.
replace income_imputed=0 if income_imputed!=1
replace income=income_hat if income_imputed==1
After this l want to impute for point estimate for income to get a continuous variable and lm not sure how to go about it.
from the literature this is what l picked
1. generate a CDF for different distributions like normal, pareto,uniform and lognormal for each band.
2. then generate random probabilities for each individual
3. then assign income such that the cumulative probability of observing such a value from the distributions is >= to generated probability.
lm not sure about these 3 steps.
lm not doing a study on this income variable going to use it as my dependent variable.
Sorry its a bit long
thank you
Income | Freq. Percent | Cum. |
1. no income | 1,136,644 49.06 | 49.06 |
2. R1 - R400 | 213,710.9 9.22 | 58.28 |
3. R401 - R800 | 362,154.39 15.63 | 73.91 |
4. R801 - R1600 | 199,897.98 8.63 | 82.54 |
5. R1601 - R3200 | 172,102.47 7.43 | 89.97 |
6. R3201 - R6400 | 125,421.5 5.41 | 95.38 |
7. R6401 - R12 800 | 66,210.959 2.86 | 98.24 |
8. R12 801 - R25 600 | 25,469.541 1.10 | 99.34 |
9. R25 601 - R51 200 | 8,683.8753 0.37 | 99.72 |
10. R51 201 - R102 400 | 3,364.3909 0.15 | 99.86 |
11. R102 401 - R204 800 | 2,186.1542 0.09 | 99.96 |
12. R204 801+ | 1,028.276 0.04 | 100.00 |
Total | 2,316,874 100.00 |
/*Redefing zero income: set equal to
1. missing if individual is age>=15, zero if age<15
2. missing if age<15& income>6
3. missing if income==0& employed==1
my actual imputation
mi set mlong
mi register impute income
mi impute ologit income age i.province i.industry hours_worked i.occupation ///
i.educ_grpd gender race residence employed i.married i.citizen, add(10) force
mi estimate, saving(miest,replace): ologit income age i.province i.industry ///
hours_worked i.occupation i.educ_grpd gender race residence employed ///
i.married i.citizen
mi predictnl income_hat = predict(xb) using miest if employed==1
gen income_imputed=1 if employed==1&income==.&income_hat!=.
replace income_imputed=0 if income_imputed!=1
replace income=income_hat if income_imputed==1
After this l want to impute for point estimate for income to get a continuous variable and lm not sure how to go about it.
from the literature this is what l picked
1. generate a CDF for different distributions like normal, pareto,uniform and lognormal for each band.
2. then generate random probabilities for each individual
3. then assign income such that the cumulative probability of observing such a value from the distributions is >= to generated probability.
lm not sure about these 3 steps.
lm not doing a study on this income variable going to use it as my dependent variable.
Sorry its a bit long
thank you