Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation

    Good afternoon,

    I´m using multiple imputation on three variables:
    Code:
    Females_hh
    Number_Employed_hh
    AtLeastSecondary_hh
    The first and the last one are binary variables and the Number_Employed_hh is a continuous one.
    Code:
    mi set mlong
    
    . mi register imputed Females_hh
    (162 m=0 obs now marked as incomplete)
    
    mi impute logit Females_hh i.Health_Number YSM, add(20) rseed(1234)
    
    Univariate imputation                       Imputations =       20
    Logistic regression                               added =       20
    Imputed: m=1 through m=20                       updated =        0
    
    ------------------------------------------------------------------
                       |               Observations per m             
                       |----------------------------------------------
              Variable |   Complete   Incomplete   Imputed |     Total
    -------------------+-----------------------------------+----------
            Females_hh |        657          162       162 |       819
    ------------------------------------------------------------------
    (Complete + Incomplete = Total; Imputed is the minimum across m
     of the number of filled-in observations.)
    
    Note: Right-hand-side variables (or weights) have missing values;
          model parameters estimated using listwise deletion.
    Even though it gives me a total for the Females_hh of 819 observations, which is what I want, when I tab this variable the number of observations is bigger. Can someone explain me what I´m doing wrong? Thanks

    Code:
    tab Females_hh
    
     Females_hh |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |      1,693       31.72       31.72
              1 |      3,644       68.28      100.00
    ------------+-----------------------------------
          Total |      5,337      100.00

  • #2
    note that your data set now includes multiple versions of the data; if you su the variable that Stata added to the data called "_mi_m" you will see it runs to a max of 20 as it identifies each imputed data set; if you just want to see what is in the original data, you need to use "if _mi_m==0" as 0 means the unimputed data;

    Comment


    • #3
      Ok thanks! I´m sorry I´ve never used multiple imputation before.
      I want to use Females_hh as a control variable in a OLS regression but with the 819 observations, how do I do it?
      Do I do
      Code:
      keep if _mi_m==1
      ??

      Thanks for all the help

      Comment


      • #4
        Code:
        h mi estimate

        Comment


        • #5
          Thank you! Still didnt figure it out though how to do the OLS Regressions with just the 819 observations.

          I did this:
          Code:
          mi estimate, ni(40): regress Y X1 X2 X3 X4 X5 Females_hh AtLeastSecondary_hh NumberEmployed_hh
          but it still gives me the same number of observations plus the 40 imputed....not reaching to the 819 observations of the other variables.


          I already imputed the
          Code:
          Females_hh
          AtLeastSecondary_hh
          which are dummy variables (taking the values of 1 or 0)
          And also
          Code:
          NumberEmployed_hh
          that is a continuous variable.

          And I want to include them on a OLS Regression as control variables and wanted the regression table to have this format I coded previously but with the 819 observations that the other variable have:

          Code:
          regress Y X1
          outreg2 using Regression2, excel append ctitle(Basic) dec(3)
          regress Y X1 X2
          outreg2 using Regression2, excel append ctitle(Model 2) dec(3)
          regress Y X1 X2 X3
          outreg2 using Regression2, excel append ctitle(Model 3) dec(3)
          regress Y X1 X2 X3 X4 X5 Females_hh AtLeastSecondary_hh NumberEmployed_hh
          outreg2 using Regression2, excel append ctitle(All the Control Variables) dec(3)
          Thank you for the help
          Last edited by Beatriz Gomes; 10 Dec 2022, 06:13.

          Comment


          • #6
            I believe there are many misunderstandings here. For one thing, the very idea of multiple imputations is to produce multiple plausible values for every missing value. You do not want to only use 819 cases picking one imputed value; you want to use all imputed values. You also do not want to impute missing values in multiple variables successively; you want one imputation model that imputes all missing values in all variables simultaneously. Your final syntax should probably look something like this:

            Code:
            mi set mlong
            mi register imputed Females_hh AtLeastSecondary_hh NumberEmployed_hh
            mi impute chained (logit) Females_hh AtLeastSecondary_hh (regress) NumberEmployed_hh = Y X1 X2 X3 X4 X5 , add(20)
            mi estimate : regress Y X1 X2 X3 X4 X5 Females_hh AtLeastSecondary_hh NumberEmployed_hh
            Do not(!) blindly copy this code!

            In the second line of the code, add all variables that have missing values.
            In the third line of code, include, before the equals sign, all variables that have missing values, choosing an appropriate model. After the equals sign, include only those (but all) variables that do not have missing values. It is very important that the outcome (dependent variable, response, ...) is included in the imputation model.
            In the third line, you probably want to include the post option so that outreg2 (probably from SSC) finds the estimates where it expects them.

            Having said all that, you might want to consider full information maximum likelihood (FIML) as an alternative to multiple imputation. FIML is much easier to implement for linear models and has similar properties to MI. The code would be something along the lines

            Code:
            sem  Y <- X1 X2 X3 X4 X5 Females_hh AtLeastSecondary_hh NumberEmployed_hh , method(mlmv)
            Last edited by daniel klein; 10 Dec 2022, 06:05.

            Comment

            Working...
            X