Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fiml (mlmv) with basic regressions

    Hi. Has anybody tried things like this?

    Code:
    sysuse auto
    sem price <- mpg foreign rep78, method(mlmv)
    ​I am sort of excited about using fiml for simple regressions, and sem can do this. It seems a lot simpler than multiple imputation when you are running linear regressions. Are there reasons I shouldn't do things like the above, or cautions I should consider first?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

  • #2
    No clear answer on that, but a few thoughts.

    Above all, we know that multiple imputation and full information ML are asymptotically equivalent.

    You are probably aware of Paul Allison's recent discussion of the topic.

    If I remember correctly, and I would need to check, he viewed the two distinct models in MI as an advantage back in 2001, stating that this two-step process adds some kind of robustness. The argument was in the direction of even if the imputation model is not correctly specified, you can still obtain correct answers, whereas in ML there is only one model. If you get this model wrong, the answers will be wrong.

    I have not heard or read a lot on FIML, but I think this method crucially depends on the choice and quality of the auxiliary variables, and I have no clear idea if and how these are handled in Stata's sem suit. Do just write them all down as predictors?

    FIML, as all likelihood estimators, is probably more dependent on the normality assumption, as this is the starting point for such models. The manual says

    MLMV takes the assumption of joint normality seriously in most cases. If your observed variables do not follow a joint normal distribution, you will be better off using ML, QML, or ADF and simply omitting observations with missing values. The assumption of conditional normality, however, will work well with MLMV when the missing values occur only in endogenous variables.
    If you are willing to accept multivariate normal distribution, then you could impute using mvn which is both easier syntax and much faster than chained-equation approaches.

    Looking forward to more comments.
    Daniel


    Allison, P. (2001). Missing Data. SAGE.
    Last edited by daniel klein; 14 Jan 2015, 03:05.

    Comment


    • #3
      If I remember correctly, and I would need to check, he viewed the two distinct models in MI as an advantage back in 2001, stating that this two-step process adds some kind of robustness. The argument was in the direction of even if the imputation model is not correctly specified, you can still obtain correct answers,
      Better not quote me on that. After re-reading the chapter in Allison, this is all but a general conclusion. It is true only under specific circumstances.

      Best
      Daniel

      Comment


      • #4
        I have not heard or read a lot on FIML, but I think this method crucially depends on the choice and quality of the auxiliary variables
        Thanks! Interesting point about the auxiliary variables. I wonder if you need to specify equations with auxiliary variables for the other variables in your main equation.

        The fact that SEM can't do factor variables is another concern. mi lets you specify all these imputation methods, like logit, mlogit, etc. If you are supposed to specify auxiliary variables for your independent variables, then I imagine you have a problem if your variables are anything other than continuous.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment

        Working...
        X