Getting non-zero values in missing data cells by doing multiple imputation for continuous variable

Bhuban Dha

Join Date: Apr 2018

Posts: 5
#1

Getting non-zero values in missing data cells by doing multiple imputation for continuous variable

30 May 2018, 15:58

Some variables with continuous measurement of my original (observed) data have missing observation but no zero values. But the Stata with multiple imputation method imputed some zero values for the missing cases. The zero values is awkward in real life cases. For example, in my case, all respondents get or make some amount of income for living. I applied both predicted moment matching and regression methods and imputed a large number of data sets to get the values for missing data problem but the zero values could not be avoided. The result of regression method was slightly better than that of PMM method. The summary result of PMM method is given below where you see 202 0 cases in a imputation set (20th set). The zero cases affects mean of the estimation. I had asked this question last times too. Can you please advise the solution.

.................................................. .................................................. .................................................. ..........................................
mi impute chained ( pmm, knn(200)) myinc0 myinc2 myinl0 educ0 educ1 educ2 =i.female , add(20) rseed(213) replace
(output truncated)
..............
Performing chained iterations ...

Multivariate imputation Imputations = 40
Chained equations added = 20
Imputed: m=1 through m=40 updated = 20
Initialization: monotone Iterations = 400 burn-in = 10

myinc0: predictive mean matching
(output truncated)
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
myinc0 | 3962 2363 2363 | 6325
myinc2 | 5028 1297 1297 | 6325
myinl0 | 4554 1771 1771 | 6325
educ0 | 3770 2555 2555 | 6325
educ1 | 4169 2156 2156 | 6325
educ2 | 2035 4290 4290 | 6325
------------------------------------------------------------------

. tabulate _20_myinc0

_20_myinc0 | Freq. Percent Cum.
------------+-----------------------------------
0 | 202 3.19 3.19
.066 | 2 0.03 3.23
.121 | 1 0.02 3.24
.16 | 1 0.02 3.26
.198 | 4 0.06 3.32
.22 | 1 0.02 3.34
.253 | 3 0.05 3.38
.308 | 8 0.13 3.51
(output truncated)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

30 May 2018, 16:27

It is not necessary that the imputed data sets used for multiple imputation estimation contain realistic values for the missing data. The mathematical theory that underlies multiple imputation neither contains nor implies any such condition. The purpose of multiple imputation estimation is to reduce the bias associated with missing data. There are some conditions that apply to the imputation method to make that work. But those conditions can be met with imputation models that produce data that are unrealistic, or even impossible in the real world. As long as the imputation procedure itself is admissible, the bias reduction will happen, and that is all you can ask of multiple imputation.

tl;dr This is a non-problem. Don't waste time thinking about it.
1 like
Comment
Bhuban Dha

Join Date: Apr 2018

Posts: 5
#3

30 May 2018, 18:54

Thanks Prof. Clyde for your excellent constructive suggestion. Does it make bias or unprofessional if I replace the zero cases with average value?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

30 May 2018, 22:03

Single imputation is definitely prone to introducing bias. It's definitely inferior to the multiple imputation you've done.

As I said, the zero values are not a problem. Just use them, and don't give it a second thought.
1 like
Comment
Bhuban Dha

Join Date: Apr 2018

Posts: 5
#5

31 May 2018, 21:51

Prof. Clyde Schechter, Many thanks for your valuable suggestions.
Comment

Announcement

Getting non-zero values in missing data cells by doing multiple imputation for continuous variable

Comment

Comment

Comment

Comment