Adding multiply imputed data using Rubin's rules into registered multiple imputation variables

Matthew Howard

Join Date: Jul 2018

Posts: 3
#1

Adding multiply imputed data using Rubin's rules into registered multiple imputation variables

14 Jul 2018, 05:01

Hello,

I am currently performing a survival analysis project for melanoma (a form of skin cancer). I am reasonably new to Stata having only started using in past 4 months.
I have been using a Cox proportional hazard model thus far in my analyses.
Within the dataset of approximately 3,600 observations there are up to 20% missing variables.
I have explored exclusion and other missing variable methods however too many of my failures would be lost for my analysis (currently total 400 failures which are melanoma specific deaths)
I have ended up choosing the utility of multiple imputation using chained equations (MICE) given that some of the key prognostic variables are not normally distributed and heavily skewed.
To begin with I have selected key prognostic values recorded within the dataset for melanoma being Breslow thickness of melanoma (continuous), ulceration status (binary) and mitotic rate (classified as ordinal categorical variable). I have selected independent variables where data is complete (no missing observations) - age, melanoma subtype, sex, subsite location as well as outcome indicator and survival hazard function.

Below is my code thus far for imputation, I am fairly happy with the mi estimate coefficients very closely mirroring the coefficients estimated from non-imputed dataset.
My question to the forum is what would be the appropriate process/syntax to incoporate the imputed values into the incomplete/missing datapoints to allow continuation of my survival analysis models with a 'complete' dataset? (apologies if I have not worded this correctly and if this is a basic question- I have trawled through the Statalist forums and other useful sites such as UCLA and various MI lectures as well as the Stata manual but could not find this process described; I have also found the MI menu interface tricky to follow)

Code:

mi stset timem, failure(censor2==1) scale(1) mi set mlong mi register imputed breslow ulcer mitosescat4 mi impute chained (regress) breslow (logit) ulcer (ologit) mitosescat4 = agecat2 subtype sex subsitecat4 matthews_haz censor2, add(10) mi estimate: regress breslow i.ulcer i.mitosescat4

Many thanks in advance,

Last edited by Matthew Howard; 14 Jul 2018, 05:11.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#2

14 Jul 2018, 05:34

Matthew:
welcome to this forum.
Via -mi- you obtain a number of complete datasets (if I'm not mistaken, various contributions advise something like 5-50 complete datasets) and -mi- allows you to re-run your regression model taking poist estimates, within and between variances into account (as per Rubin's rule, as you mention).
If, after -mi- we want an unique dataset (if I got you correctly, you mean something like a mix of original and imputed data), we should probably consider something like -append- and then -collapse- with the -mean- function of the complete datasets (by the way, I do not really know whether Stata allows this procedure) and re-run the regression model on this made-up dataset. However, this approach, if feasible, will cause the loss of part of the variance that -mi- creates. Hence, even if what above was technically feasible, the regression outcome would be probably flawed.
At the risk of being late to the party, I would recommend you the following article, which, in my opinion, gives one of the best example of dealing with missing values via multiple imputation in biostatistics: https://www.ncbi.nlm.nih.gov/pubmed/12589867.

Last edited by Carlo Lazzaro; 14 Jul 2018, 05:39.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Matthew Howard

Join Date: Jul 2018

Posts: 3
#3

14 Jul 2018, 06:54

Thanks for your time Carlo
That article was very helpful to read, certainly what you have mentioned makes sense to me.
If I read it correctly it suggests rerunning my initial survival models (in my case Cox PH regression models) with the imputed datasets and determining their mean value?
This sounds rather tricky to complete in Stata, have you had any experience with converting this type of theory into practical code?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#4

14 Jul 2018, 07:12

Matthew:
I meant that you should follow the -mi estimate- approach after multiple imputation.
That is:

Code:

mi estimate: stcox <indepvars>

See also example #3, -mi estimate- entry, Stata .pdf manual.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Matthew Howard

Join Date: Jul 2018

Posts: 3
#5

15 Jul 2018, 07:41

Many thanks again Carlo,
I had somehow made the assumption that the mi estimate command was purely only for diagnostic purposes rather than obtaining post imputation estimates- this certainly makes analysis much more efficient and straightforward!

Best wishes
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17672

15 Jul 2018, 08:01

Matthew:
what you mean can be easily checked via the following toy-example:

Code:

. webuse mheart1s20
(Fictional heart attack data; bmi missing)

. mi describe

  Style:  mlong
          last mi update 20jan2017 14:52:04, 216 days ago

  Obs.:   complete          132
          incomplete         22  (M = 20 imputations)
          ---------------------
          total             154

  Vars.:  imputed:  1; bmi(22)

          passive:  0

          regular:  5; attack smokes age female hsgrad

          system:   3; _mi_m _mi_id _mi_miss

         (there are no unregistered variables)

. mi estimate, dots: logit attack smokes age bmi hsgrad female *this is the outcome of -logit- after -mi- (20 complete datasets created)*
Imputations (20):
  .........10.........20 done

Multiple-imputation estimates                   Imputations       =         20
Logistic regression                             Number of obs     =        154
                                                Average RVI       =     0.0312
                                                Largest FMI       =     0.1355
DF adjustment:   Large sample                   DF:     min       =   1,060.38
                                                        avg       = 223,362.56
                                                        max       = 493,335.88
Model F test:       Equal FMI                   F(   5,71379.3)   =       3.59
Within VCE type:          OIM                   Prob > F          =     0.0030

------------------------------------------------------------------------------
      attack |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      smokes |   1.198595   .3578195     3.35   0.001     .4972789    1.899911
         age |   .0360159   .0154399     2.33   0.020     .0057541    .0662776
         bmi |   .1039416   .0476136     2.18   0.029      .010514    .1973692
      hsgrad |   .1578992   .4049257     0.39   0.697    -.6357464    .9515449
      female |  -.1067433   .4164735    -0.26   0.798    -.9230191    .7095326
       _cons |  -5.478143   1.685075    -3.25   0.001    -8.782394   -2.173892
------------------------------------------------------------------------------

. logit attack smokes age bmi hsgrad female if _mi_m==0 *this is the outcome of -logit- when Stata applies listwise deletion*

Iteration 0:   log likelihood = -91.359017 
Iteration 1:   log likelihood = -79.374749 
Iteration 2:   log likelihood = -79.342218 
Iteration 3:   log likelihood =  -79.34221 

Logistic regression                             Number of obs     =        132
                                                LR chi2(5)        =      24.03
                                                Prob > chi2       =     0.0002
Log likelihood =  -79.34221                     Pseudo R2         =     0.1315

------------------------------------------------------------------------------
      attack |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      smokes |   1.544053   .3998329     3.86   0.000     .7603945    2.327711
         age |    .026112    .017042     1.53   0.125    -.0072898    .0595137
         bmi |   .1129938   .0500061     2.26   0.024     .0149837     .211004
      hsgrad |   .4048251   .4446019     0.91   0.363    -.4665786    1.276229
      female |   .2255301   .4527558     0.50   0.618    -.6618549    1.112915
       _cons |  -5.408398   1.810603    -2.99   0.003    -8.957115    -1.85968
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(StataNow 18.5)

Announcement

Adding multiply imputed data using Rubin's rules into registered multiple imputation variables

Comment

Comment

Comment

Comment

Comment