Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple imputation on multiple variables

    Hello. I am trying to do multiple imputation to compare its regression output to listwise deletion.

    I was chiefly concerned with the p_educ variable because, as you can see from my data below, it has by far the most missings.

    So I started with that variable, and this code seemed to work well:

    Code:
    mi set mlong
    mi register imputed health p_educ p_income gender race age
    mi impute regress p_educ health p_income gender race age, add(20) rseed (1234) force
    mi estimate: regress health p_educ p_income gender race age convinced_level
    But, as you can also see, my other variables have a handful of missings as well. Is it possible for me to do multiple imputation for all of those other variables? (except for convinced_level, which is my primary predictor variable and, for a number of reasons, shouldn't be imputed)

    Is it as simple as just repeating the "mi impute regress var1 var2 etc." line over and over for each variable that I want to impute, before doing "mi estimate: regress"? Or is there a more efficient way to impute multiple variables all in one go?

    Here is some toy data that resembles the structure of my actual data (which is identifiable and unable to be shared publicly). Thank you much!!

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(health p_educ) int p_income byte(gender race age) long convinced_level
    3 3  23 1 . 13 2
    4 .   . 2 2 14 1
    2 5  45 1 3 15 .
    3 .  65 2 4 14 2
    4 .  23 1 1 14 1
    5 3  45 1 3 12 3
    3 4  88 . 2 12 1
    2 . 132 1 3 12 3
    3 5  34 2 2 14 2
    4 3  54 1 1 14 1
    3 2  23 2 3 18 3
    2 .  52 1 2 18 2
    2 4  47 1 2 18 1
    3 3   . 2 2 15 3
    4 2  78 . 1 15 2
    2 2  43 2 4  . 1
    3 3  23 1 1 14 3
    4 4  65 . 4 17 .
    5 . 143 2 1 17 3
    3 3   5 1 . 17 2
    2 4   . 2 1 19 1
    3 .  23 1 1  . 2
    4 3  34 1 1 18 1
    3 4  45 2 2 15 3
    2 3  76 1 3 15 2
    2 2   6 2 2 15 1
    3 .  33 2 2 15 3
    4 3   . 1 1 12 1
    2 4  23 2 1 12 3
    3 2  52 1 1 12 2
    2 .  47 1 2 14 1
    2 .  84 2 2 14 2
    3 .  78 1 . 18 1
    4 .  43 2 3 18 3
    2 2  23 2 4 18 2
    3 4  34 . 1 15 1
    4 5  45 2 1 15 3
    5 3  76 1 1 14 1
    3 4  61 1 2 14 3
    2 3  90 2 1 14 2
    3 2  27 1 4 18 1
    4 .  63 2 1 18 3
    3 2  34 2 4 18 2
    3 3  63 1 1 15 1
    3 2  52 2 2 13 3
    3 4  15 1 2 14 2
    2 1  62 . 1 15 2
    2 1  73 2 1 14 2
    end
    label values convinced_level label
    label def label 1 "no", modify
    label def label 2 "yes", modify
    label def label 3 "maybe", modify

  • #2
    Originally posted by Anne Todd View Post
    Is it possible for me to do multiple imputation for all of those other variables?
    Yes. See

    Code:
    help mi impute chained

    Originally posted by Anne Todd View Post
    (except for convinced_level, which is my primary predictor variable and, for a number of reasons, shouldn't be imputed)
    If this is your primary predictor, it must be included in the imputation model. You might or might not want to restrict the imputed datasets to observations with initially non-missing values on the predictor later in the analyses step. If you tell us more about the "number of reasons", we can comment on that, too.

    Comment


    • #3
      Yes it makes sense to impute all variables in one step mutually. This should do the trick I think:

      Code:
      mi set mlong
      mi register imputed health p_educ p_income gender race age convinced_level
      mi impute chained (regress) p_educ health p_income race age ///
          (logit) gender ///
          (ologit) convinced_level ///
          , add(20) rseed (1234) force
      mi estimate: regress health p_educ p_income gender race age convinced_level
      But I think the scaling is not fine. For example, you do not want to impute a binary variable like gender with regress, just use logit. See my short example and adapt as you think fits best for your vars. Check the generated values afterwards using summarize.
      Best wishes

      (Stata 16.1 MP)

      Comment


      • #4
        Thank you Felix Bittmann , it seems the "chained" option was the main thing I was missing, and this is very helpful. Your point about scaling is something I was wondering about in trying to do them all at once--since, as you say, imputing a binary variable alongside continuous variables wouldn't make much sense. Thanks for your assistance!

        Comment


        • #5
          Originally posted by daniel klein View Post

          Yes. See

          Code:
          help mi impute chained



          If this is your primary predictor, it must be included in the imputation model. You might or might not want to restrict the imputed datasets to observations with initially non-missing values on the predictor later in the analyses step. If you tell us more about the "number of reasons", we can comment on that, too.
          Sorry, I was typing too fast there and didn't mean what I wrote--the actual data I have is restricted to non-missing values on the predictor, so there is nothing to impute!

          Comment


          • #6
            Originally posted by Anne Todd View Post
            Sorry, I was typing too fast there and didn't mean what I wrote--the actual data I have is restricted to non-missing values on the predictor, so there is nothing to impute!
            Your example data does not reflect that fact either. Note that restricting the sample before imputation will result in bias if the restricted sample does not represent a random subset, which is very likely. Anyway, if you have variables that do not have missing values, those go to the right-hand side of the equals sign:

            Code:
            mi impute chained ... = non_missing_variables , add(20)
            Also, get rid of the force option; you never want that.

            Comment


            • #7
              Originally posted by daniel klein View Post

              Your example data does not reflect that fact either. Note that restricting the sample before imputation will result in bias if the restricted sample does not represent a random subset, which is very likely. Anyway, if you have variables that do not have missing values, those go to the right-hand side of the equals sign:

              Code:
              mi impute chained ... = non_missing_variables , add(20)
              Also, get rid of the force option; you never want that.
              I just made the example data manually, I must have put in missings for that variable that I didn't intend...thank you for the explanation on putting them on the right-hand side of the =.

              For the force option, it was my understanding that the impute wouldn't work (when I was just doing it on the p_educ variable alone) when other variables were missing, unless the force option was specified. But I see now that when I'm imputing this group of variables simultaneously, that shouldn't be an issue.

              Comment


              • #8
                Originally posted by Anne Todd View Post
                For the force option, it was my understanding that the impute wouldn't work (when I was just doing it on the p_educ variable alone) when other variables were missing, unless the force option was specified.
                Yes and no. Technically, force makes the imputation "work". I have yet to hear an argument that justifies this practice on theoretical/statistical grounds.

                Comment

                Working...
                X