Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting non-zero values in missing data cells by doing multiple imputation for continuous variable

    Some variables with continuous measurement of my original (observed) data have missing observation but no zero values. But the Stata with multiple imputation method imputed some zero values for the missing cases. The zero values is awkward in real life cases. For example, in my case, all respondents get or make some amount of income for living. I applied both predicted moment matching and regression methods and imputed a large number of data sets to get the values for missing data problem but the zero values could not be avoided. The result of regression method was slightly better than that of PMM method. The summary result of PMM method is given below where you see 202 0 cases in a imputation set (20th set). The zero cases affects mean of the estimation. I had asked this question last times too. Can you please advise the solution.

    .................................................. .................................................. .................................................. ..........................................
    mi impute chained ( pmm, knn(200)) myinc0 myinc2 myinl0 educ0 educ1 educ2 =i.female , add(20) rseed(213) replace
    (output truncated)
    ..............
    Performing chained iterations ...

    Multivariate imputation Imputations = 40
    Chained equations added = 20
    Imputed: m=1 through m=40 updated = 20
    Initialization: monotone Iterations = 400 burn-in = 10

    myinc0: predictive mean matching
    (output truncated)
    ------------------------------------------------------------------
    | Observations per m
    |----------------------------------------------
    Variable | Complete Incomplete Imputed | Total
    -------------------+-----------------------------------+----------
    myinc0 | 3962 2363 2363 | 6325
    myinc2 | 5028 1297 1297 | 6325
    myinl0 | 4554 1771 1771 | 6325
    educ0 | 3770 2555 2555 | 6325
    educ1 | 4169 2156 2156 | 6325
    educ2 | 2035 4290 4290 | 6325
    ------------------------------------------------------------------

    . tabulate _20_myinc0

    _20_myinc0 | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 202 3.19 3.19
    .066 | 2 0.03 3.23
    .121 | 1 0.02 3.24
    .16 | 1 0.02 3.26
    .198 | 4 0.06 3.32
    .22 | 1 0.02 3.34
    .253 | 3 0.05 3.38
    .308 | 8 0.13 3.51
    (output truncated)

  • #2
    It is not necessary that the imputed data sets used for multiple imputation estimation contain realistic values for the missing data. The mathematical theory that underlies multiple imputation neither contains nor implies any such condition. The purpose of multiple imputation estimation is to reduce the bias associated with missing data. There are some conditions that apply to the imputation method to make that work. But those conditions can be met with imputation models that produce data that are unrealistic, or even impossible in the real world. As long as the imputation procedure itself is admissible, the bias reduction will happen, and that is all you can ask of multiple imputation.

    tl;dr This is a non-problem. Don't waste time thinking about it.

    Comment


    • #3
      Thanks Prof. Clyde for your excellent constructive suggestion. Does it make bias or unprofessional if I replace the zero cases with average value?

      Comment


      • #4
        Single imputation is definitely prone to introducing bias. It's definitely inferior to the multiple imputation you've done.

        As I said, the zero values are not a problem. Just use them, and don't give it a second thought.

        Comment


        • #5
          Prof. Clyde Schechter, Many thanks for your valuable suggestions.

          Comment

          Working...
          X