Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation - how to deal with newly created dichotomous variable, winsorized variables, and the augment function

    Hi fellow Statalisters,

    I am currently trying to take a deep dive into Multiple Imputation, and while doing so, some rather specific questions have come up that I'd be super grateful to get some advice on.

    1. Question: How to impute a newly created dichotomous variable
    I have created a dichotomous variable "dieting vs. no dieting" within the past year from a categorical item describing how many times a person has dieted within the past year. If I understand the documentation correctly, any variable than is created via gen/egen from an existing variable is supposed to be registered via
    HTML Code:
    mi register passive
    and then imputed via
    HTML Code:
    mi passive: generate
    . However, I have only found examples in which the imputed variable was created based on two or more variables, e.g.,
    HTML Code:
    mi passive: generate new_Var = Var_1 + Var_2
    . I have not yet found a way how to implement the creation of my dichotomous diet-Variable, which is after all only one variable, and it is categorical as well. Does anyone know who to procede here?

    2. Question: Dealing with winsorized variables
    There is another continuous variable I would like to impute. With this variable, I would like to conduct sensitivity analyses after winsorizing outliers in the observed, non-missing values. I am really uncertain how to implement this in the context of imputation: Am I supposed to conduct imputation based on the winsorized observed values, e.g.,
    HTML Code:
    mi register imputed Var_1 Var_2 Var_3_winsorized
    mi impute chained (ologit) Var_1, (logit) Var_2 (regress) Var_3_winsorized, add(15) savetrace(trace1,replace)
    or is there even a way to winsorize imputed values? Or is there a different procedure or shouldn't I use winsorizing in this context at all? Any insight is appreciated.

    3. Question: Augment function in the case of "mi impute logit: perfect predictor(s) detected"
    With one binary predictor, I have encountered the error message "mi impute logit: perfect predictor(s) detected". I have been able to resolve the issue by including the option
    HTML Code:
    augment
    :
    HTML Code:
    mi impute chained (ologit) Var_1, (logit) Var_2 (regress) Var_3_winsorized, add(15) augment savetrace(trace1,replace)
    However, in the documentation this procedure is called "ad hoc" and it results in th warning message
    Warning: the sets of predictors of the imputation model vary across imputations or iterations
    which I am unsure how to interpret this. Does anyone have advice on whether using the
    HTML Code:
    augment
    -function is appropriate here?


    Sorry for asking all these questions, I'm very new to MI any help is appreciated, thank you very much in advance!

    Last edited by Stephanie Peschel; 31 Mar 2023, 13:07.

  • #2
    Stephanie,

    I'll give a shot at your questions...

    1. A passive variable is one that is created from already-imputed data. That forces Stata to make the original and imputed variables compatible. From the way you framed the question the dichotomous variable is created before imputation so you wouldn't need to worry about registering it as passive. Whether you should impute the original or recoded variable is a different question entirely.

    2. I think the advice that you would find about winsorizing variables on Statalist is that you shouldn't do it. If there are problems with outlying values, I think you should probably address those based on any model diagnostics. A related question is what should you do if you get predictions beyond the bounds of the original variable. For example, what do you do if you get imputed values of income that are negative. My understanding of the MI literature is that you shouldn't worry about those values because, as I think of them (which probably isn't technically correct), they aren't really data but just placeholders so that you don't have to drop the data that you have and you should keep them as imputed so that your standard errors from the imputations are correct.

    3. I don't think that there is a connection between your use of augment and the predictors varying across imputations. If you double check the documentation for the augment option, it tries to avoid the issue of prefect prediction by simulating some additional observations and including them in the imputation but then weighting them down so that the predictions aren't biased. Usually, the problem of different predictors is because you have factor variables in the imputation equations somewhere and a different category gets excluded in different imputations. An example would be if you have race in the model and in one imputation Stata excludes the white category but in another imputation it excludes black. That means that the coefficients will likely be downwardly biased because you're combining a non-zero coefficient in one imputation with a zero coefficient in the other. If this is the problem then you can use mi fvset. Another possibility is that you just have sparse data and in one of the iterations of the model, there are no observations in a categorical variable and so the category is excluded. You could also have just misspecified the model and have some issue with collinearity that you didn't capture and Stata will drop any categories that face that problem. If your problem is that you haven't done mi fvset then you can give that a try. If that's not the problem, you may have to use the noisily option and look at the models to see where the problem arises.

    That's my take on your questions. I hope something there is useful.

    Best,
    Lance

    Comment


    • #3
      Lance,

      thank you for your insightful reply. I didn't realize before that you can also use the -rreg- command with mi, so that definitely rules out the winsorizing.

      I have one follow-up question/clarification regarding the dieting-variable (would be very grateful if you would find the time):
      I don't think I clearly stated my problem. It would be the categorical variable (how many times a person has dieted) that I would create my dichotomous variable (has dieted vs. has not dieted) from - however, from my understanding, it's not the dichotomous variable, but perferably the categorical variable that should be imputed here in the case of missing values. And now I am unsure about how I would create the passive dichotomous variable based on the imputed values of the categorical one. Are there any ideas?

      Thank you so much in advance!

      Comment


      • #4
        Stephanie,

        I would probably try to impute the original variable as well and then create the dichotomous one after imputation, if the analytical model required a dichotomy. The safest way to do it would be using the mi passive command, something like...

        Code:
        mi passive: gen dichotomous = categorical != 0
        The categorical != 0 part evaluates to either true or false for each observation and assigns true to be 1 in dichotomous and 0 if it evaluates to false. Since these would be imputed data, you wouldn't have to worry about any missing values in categorical, which you would have to worry about in non-imputed data.

        Best,
        Lance

        Comment


        • #5
          Sorry - late to the thread, here. Question: Would creating the dichotomous variable via the "mi passive: gen".... step happen after all original variables/items are imputed via mi impute? Or, do these steps happen concurrently?

          Comment


          • #6
            John,

            I would impute the variable with its original distribution. Then, *after* the imputation process is over and you have multiple datasets, you would use mi passive. So, the steps do not happen concurrently.

            Best,
            Lance

            Comment


            • #7
              Lance - thank you so much!

              To make sure I am tracking this correctly:

              Step 1: mi set ....
              Step 2: mi reg imputed ....
              Step 3: mi impute..... , add (.)...

              then,

              Step 4: mi passive: egen/gen....(using the imputed variables above)

              Final question - do the original variables need to be imputed in step 2-3 before step 4 - or does step 4 automatically impute the vars used to gen the new measure?

              Thank you once more!

              Comment


              • #8
                I *think* I understand your question. mi passive does not do any imputations. Any imputations need have to be done with mi impute. I suggest that you take a look at the mi passive help page. In particular, I suggest the "mi passive basics" section.

                Comment


                • #9
                  Thank you.

                  Comment

                  Working...
                  X