Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • passive MI variables

    why does registering a variable as passive in an MI dataset change the results of an estimation (and apparently the data itself). I am working with a large, publically available dataset that contains several variables (calculated by the survey team) which depend on imputed variables. As I understand it this meets Stata's definition of a "passive" variable. Based on the documentation I do not expect registering these variables as passive to change the data internally or to affect the results of estimation commands. However, in my data this is clearly the case. I run the following commands:

    use DATASET
    mi est: mean VAR [gives result X, based on N_1 observations]
    mi register passive VAR [reports "N_2 observations of passive variable VAR in m>0 updated to match values in m=0]
    mi est: mean VAR [gives result Y, based on N_3 observations]

    There are two things I do not understand: 1) why is mi register passive updating my imputed data based on the non-imputed data, thus seemingly wiping out the imputation; 2) why SPECIFICALLY is the result X based on a different number of observations than Z. I note that N_1 + N_2 does not equal N_3...so it is not the case that the change in observations from the first estimation to the second is equivalent to the changes made by mi register...

  • #2
    I will see whether I find the time to look into it and try an answer but this is just to quickly cross-reference the initial post which might give more background information.

    Edit:

    Ok, I guess 1) is rather simple to answer. The values are updated with the non-missing values in m=0 because Stata sees no point in imputing values that are not missing and regards this as a likely unintended coding error. Stata is probably right about this, although I had situations where I wanted similar things. Anyway, by registering the variable Stata will check for such (apparent) inconsistencies. If you are sure this is what you want, then either do not register the variables or find a suitable work-around for the problem (I have done something similar, but would need to dig out the code).

    Concerning 2) notice that you ask for Z but never use it before. You used X and Y to describe your problem. I do not state this to be pedantic or get on your nerves. I do so because as I have mentioned in my response to your earlier questions it is often pivotal that we see as many details as possible. If you cannot share data, at least show the output you get using code-delimiters as explained in the FAQ. This is not a guarantee for better answers but it increases the likelihood. Sorry, that I cannot give better advice on this now.


    Edit 2:

    By the way, this is likely related to the super-varying variables that I have mentioned before (but never got feedback, whether you really have those). Form the help file

    A variable is said to be super varying if its values in the complete observations differ across m. The existence of super-varying variables is usually an indication of error. It makes no sense for a variable to have different values in, say, m=0 and m=2 in the complete observations -- in observations that contain no missing values. That is, it makes no sense unless the values of the variable is a function of the values of other variables across multiple observations. If variable sumx is the sum of x across observations, and if x is imputed, then sumx will differ across m in all observations after the first observation in which x is imputed.
    If you have the situation of sumx described above then (again form the help)

    Super-varying variables, which rarely occur and can be stored only in flong and flongsep data, should never be registered.
    Best
    Daniel
    Last edited by daniel klein; 24 Jul 2017, 13:01.

    Comment


    • #3
      just to close this thread, the problem is indeed related to super-varying variables as proposed by Daniel (more accurately: variables that mistakenly appear super-varying due to a problem with the records in m=0). For posterity, anyone looking at this post for solutions to similar problems could consider checking whether their passive variables have complete observations in m=0. If they do not, registering the variable as passive can result in changes. Also my apologies for the confusion of Y and Z (any reference to Z should be to Y...apparently I can no longer edit my post)

      Comment

      X