Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multivariate multiple imputation - Imputation versus analytic model

    Hello all,

    I have the following data for which I am trying to impute the missing values for two outcome variables y and z. The design is a three-level hierarchical design that involves repeated measures (level 1) of individuals (level 2) in clusters (level 3) at discrete time intervals during the study.

    Code:
    id    time    y    z    cluster    intervention
    1    1    0.5    0.23    1    0
    1    2    .    0.11    1    1
    1    3    .    .    1    1
    2    1    0.15    .    2    0
    2    2    0.05    0.05    2    0
    2    3    .    .    2    1
    3    1    0.90    0.90    1    0
    3    2    0.23    0.81    1    1
    3    3    0.22    0.22    1    1
    To impute this hierarchical data, it is recommended that the data first be transformed from long to wide format (https://www.stata.com/support/faqs/s...and-mi-impute/) under strategy 3: use a multivariate normal model to impute all clusters simultaneously. I have been able to do that successfully like so

    Code:
    reshape wide y z intervention, i(id) j(time) string
    Code:
    id    y3    z3    intervention3    y2    z2    intervention2    y1    z1    intervention1 cluster
    1    .    .    1    .    .11    1    .5    .23    0 1
    2    .    .    1    .05    .05    0    .15    .    0 2
    3    .45    .    1    .23    .81    1    .9    .9    0 1
    My questions are as follows:

    1. The link shows an example in strategy 3 of a two-level model, but I have a three-level model. Would it be correct in principle to reshape the data twice and then impute? If so, how would I reshape a second time?

    2. Once I reshaped the data it is evident that each variable in the dataset (except for id and cluster) was multiplied by 3 to reflect the three time points (e.g. y1, y2, y3). The code for my imputation should look like so.

    Code:
    mi set wide
    mi register imputed y1 y2 y3 z1 z2 z3 
    mi impute mvn y1 y2 y3 z1 z2 z3 intervention1 intervention2 intervention3 cluster id, add(100) noisily    --Fixed effects for cluster and id added here because they are added as random effects parameters in the analytic model
    mi reshape long y z intervention, i(id) j(time) string
    mi estimate: mixed y intervention time ||cluster: ||id:
    mi estimate: mixed z intervention time ||cluster: ||id:
    In theory the imputation model and the analytic model should contain the same variables (including the dependent variable). However, I need to include time as a fixed effect variable in my analytic model with mi estimate above which has evidently been 'removed' during the reshaping process. How do I reconcile this?

    Observe as well that in the analytic model I entered a single variable for y, intervention, and time, but in the imputation model (mi impute) I have each of these variables repeated 3 times (e.g. y1 y2, y3). Is this permissible?

    Is it okay to run the mi estimate command twice since I have two outcome variables?

    Thanks!






    Last edited by CEdward; 30 May 2020, 08:06.
Working...
X