Some variables with continuous measurement of my original (observed) data have missing observation but no zero values. But the Stata with multiple imputation method imputed some zero values for the missing cases. The zero values is awkward in real life cases. For example, in my case, all respondents get or make some amount of income for living. I applied both predicted moment matching and regression methods and imputed a large number of data sets to get the values for missing data problem but the zero values could not be avoided. The result of regression method was slightly better than that of PMM method. The summary result of PMM method is given below where you see 202 0 cases in a imputation set (20th set). The zero cases affects mean of the estimation. I had asked this question last times too. Can you please advise the solution.
.................................................. .................................................. .................................................. ..........................................
mi impute chained ( pmm, knn(200)) myinc0 myinc2 myinl0 educ0 educ1 educ2 =i.female , add(20) rseed(213) replace
(output truncated)
..............
Performing chained iterations ...
Multivariate imputation Imputations = 40
Chained equations added = 20
Imputed: m=1 through m=40 updated = 20
Initialization: monotone Iterations = 400 burn-in = 10
myinc0: predictive mean matching
(output truncated)
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
myinc0 | 3962 2363 2363 | 6325
myinc2 | 5028 1297 1297 | 6325
myinl0 | 4554 1771 1771 | 6325
educ0 | 3770 2555 2555 | 6325
educ1 | 4169 2156 2156 | 6325
educ2 | 2035 4290 4290 | 6325
------------------------------------------------------------------
. tabulate _20_myinc0
_20_myinc0 | Freq. Percent Cum.
------------+-----------------------------------
0 | 202 3.19 3.19
.066 | 2 0.03 3.23
.121 | 1 0.02 3.24
.16 | 1 0.02 3.26
.198 | 4 0.06 3.32
.22 | 1 0.02 3.34
.253 | 3 0.05 3.38
.308 | 8 0.13 3.51
(output truncated)
.................................................. .................................................. .................................................. ..........................................
mi impute chained ( pmm, knn(200)) myinc0 myinc2 myinl0 educ0 educ1 educ2 =i.female , add(20) rseed(213) replace
(output truncated)
..............
Performing chained iterations ...
Multivariate imputation Imputations = 40
Chained equations added = 20
Imputed: m=1 through m=40 updated = 20
Initialization: monotone Iterations = 400 burn-in = 10
myinc0: predictive mean matching
(output truncated)
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
myinc0 | 3962 2363 2363 | 6325
myinc2 | 5028 1297 1297 | 6325
myinl0 | 4554 1771 1771 | 6325
educ0 | 3770 2555 2555 | 6325
educ1 | 4169 2156 2156 | 6325
educ2 | 2035 4290 4290 | 6325
------------------------------------------------------------------
. tabulate _20_myinc0
_20_myinc0 | Freq. Percent Cum.
------------+-----------------------------------
0 | 202 3.19 3.19
.066 | 2 0.03 3.23
.121 | 1 0.02 3.24
.16 | 1 0.02 3.26
.198 | 4 0.06 3.32
.22 | 1 0.02 3.34
.253 | 3 0.05 3.38
.308 | 8 0.13 3.51
(output truncated)
Comment