Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mean centering predictor using multiply imputed data (with mi passive)

    Hi Statalist,

    I've been puzzling over “mi” commands today, trying to understand what I can and can't do with imputed data, and would very much appreciate any insight into mean centering a predictor using imputed data. The model will examine predictors for work-related injuries, and the variable "day hours" (hours worked per day) was has 36% missing. My code is -

    mi set wide
    mi register imputed dayhours
    mi register regular daysweek injuries ohr Cash_y threats vio sector Fluent Nodoc agecentred age2 monthscentred months3 Exp dsm_symptomatic sit_break any_memprob
    mi impute chained (truncreg, ul(24) ll(1)) dayhours = daysweek injuries ohr Cash_y threats vio sector Fluent Nodoc agecentred age2 monthscentred months3 Exp dsm_symptomatic sit_break any_memprob, add(20) rseed(4409) force
    **Create new var "day2" to create new hours worked/week var based on daysweek (days worked/week)
    mi passive: gen day2=dayhours
    mi passive: gen hoursweek2=.
    mi xeq: replace hoursweek2=day2*5 if daysweek==1
    mi xeq: replace hoursweek2=day2*6 if daysweek==2
    mi xeq: replace hoursweek2=day2*7 if daysweek==3
    mi xeq: replace hoursweek2=day2*7 if daysweek==4
    **Rescale hours worked/week to 10
    mi passive: gen hour10mi=hoursweek2/10

    Now, I would like to create a variable, "c_hour10mi", that is mean centered, for each dataset. This is where I run into a problem -

    mi xeq: sum hour10mi, meanonly
    mi passive: gen c_hour10mi = hour10mi - r(mean)

    When I try to create the mean centered variable, the command above generates all missing observations for “c_hour10mi” in all the imputed datasets. I’m not sure where I’m going wrong.. is there a command that works with mi suite that could take care of mean centering that works with?

    With many thanks, Nicola

  • #2
    I think that when you run -mi xeq: sum hour10mi, meanonly-, you are, in effect, looping over the imputed data sets, calculating the mean of hour10mi in each--but each iteration of that wipes out the results from the previous one. So when you get to the -mi passive:gen c_hour10mi...- command, r(mean) is long gone. I think the following will work:

    Code:
    mi xeq: egen mean_hour10mi = mean(hour10mi)
    mi passive: gen c_hour10mi = hour10mi - mean_hour10mi

    Comment


    • #3
      Thank you Clyde, this worked! I had a follow up question. I’ve decided it will be more sensible to impute dayhours by group (ff) which indicates which of 3 work sectors the person was in. For full information for the MI model, I include the var “sector” which has the full 12 work sectors instead of the collapsed category “ff”. I note I can run the command below just fine -

      Code:
      mi impute chained (pmm, knn(5))dayhours = daysweek injuries ohr Cash_y threats vio sector Fluent Nodoc agecentred age2 monthscentred months3 Exp dsm_symptomatic sit_break any_memprob sitscore freedom gear_no, add(36) rseed(4409) by(ff) force
      But I would like to check for convergence. I note that I cannot use “savetrace (mitrace,replace)” when using the “by” option to impute. Are there any workarounds so I can get the means and SDs to graph, that don’t involve me manually recording this from each imputed dataset? Looking at the means for each imputed dataset –

      Code:
      mi xeq: sum dayhours
      My imputation means are centered around the complete case mean which reassures me. Is there anything else I can check to make sure my MI model didn’t go wrong? I’m not worried about Monte Carlo error with 36 imputations (as I have 36% missing data, recommended rule of thumb in that White, Royston and Wood propose as cited in this guide - https://www.ssc.wisc.edu/sscc/pubs/stata_mi_intro.htm). But maybe I should check Monte Carlo error anyway?

      Thanks much in advance for any insight on the above.

      Comment


      • #4
        I've never actually used the -by- option with multiple imputation myself, so I can't claim to have a deep understanding of its implications. But what about dividing your unimputed data set into separate data sets by the grouping variable ff. Then do MI in each separately, saving the trace information as you go. Then at the end, you can -mi append- the imputed data sets. I think that would be equivalent to doing the imputation with the -by- option and would allow you to capture detailed convergence data, though, as I say, I've never done this myself and am not completely sure of all the ramifications.

        Comment


        • #5
          Thanks much Clyde, novel idea.. but as you say, I'd also be cautious about the implications of imputing separately, appending and then looking at collective trace file.. being the first time I've tried mi impute. Looking at the convergence data from the imputation model that does not group by(ff), it seems to converge and I'm wondering if that would suffice. I also just discovered "midiagplots" to do MI diagnostics, and looking at kdensity plots for the 36 imputations (its abit small, attached) using code below, none of the imputations seem to have gone too wrong.

          Code:
          midiagplots dayhours, combine
          With this evidence, alongside the imputed means which seem to cluster around the original (although only n=6/36 below the original mean) and convergence from the MI model without grouping, I think it should be OK for me to proceed with analysis. I would welcome other diagnostic ideas if you have any though! (imputation is not common in my field and I would like to be sure the MI model is as "correct" as it can be)

          Many thanks, Nicola
          Attached Files

          Comment


          • #6
            The plots look good. You've been quite thorough; I can't think of anything to add.

            Comment


            • #7
              Thank you again Clyde, really appreciate it!

              Comment


              • #8
                The discussion in this thread was really helpful in solving an issue I had. I might be wrong, but it seems that the code from #2 may not center variables correctly. One can try using this instead:

                Code:
                mi xeq: egen mean_hour10mi = mean(hour10mi)
                mi xeq: gen c_hour10mi = hour10mi - mean_hour10mi
                Additionally, this only seems to work when:

                Code:
                mi set flong
                One can then verify whether variables were centered correctly using:

                Code:
                mi xeq: summarize c_hour10mi

                Comment

                Working...
                X