Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • incorporating full information maximum likelihood function into the multiple regression

    Hi everyone,
    I am trying to handle missing values in a multiple regression model described below. I was wondering to know how can I use full information maximum likelihood (FIML) function in the code below to handle missing values.
    Thanks,
    Nader


    svy, regress overallsatis i.BQ4 i.BQ5 BQ1 BQ2 i.Race i.BQ6 i.BQ8 i.BQ7 ADLs i.BQ10 Help i.size_cat form_payement i.Ownership i.urban


  • #2
    I see that virtually all of your predictors are categorical variables. Note that FIML usually assumes that variables are normally distributed. If you can live with that, see

    Code:
    help sem
    which implements FIMEL in the option method(mlmv) and which also supports the svy prefix. sem does not (at least used not to) support factor variable notation, though.

    As an alternative, you might consider multiple imputation.

    Code:
    help mi impute chained
    The chained equations approach handles categorical variables well. mi might get a bit tricky with complex survey data, though.

    Comment


    • #3
      Thanks Daniel. Does "mi impute mvn" handle missing values for both continuous and categorical variables together?

      Comment


      • #4
        Originally posted by Nader Mehri View Post
        Does "mi impute mvn" handle missing values for both continuous and categorical variables together?
        No. Well, yes but under the same assumption as FIML. Which of the two is more robust against violations, I cannot tell. Which is better suited for survey data, I cannot tell either. Sorry.

        Comment


        • #5
          This perfectly makes sense; thanks for the explanation. How about "mi impute chained"? It appears that it handles missing values for binary, categorical, and continuous variables. May I have your thoughts on the code below. "overallsatis" and "Help" are continuous but fairly skewed.

          mi impute chained (logit) form_payement (mlogit) BQ10 (reg) overallsatis (reg) Help= BQ4 BQ5 BQ1 BQ2 Race BQ6 BQ8 BQ7 ADLs size_cat Ownership urban, add(5) by(NH_RCF) force

          Comment


          • #6
            Why force? I see 9 out of 10 posts on Statalist specifying force when they show imputation models; 9 out of 10 times you do not want this! As general advice: never ever specify any option just to supress an error message. Do that if and only if you fully understand why that error arose in the first place and what the option you are specifying does to prevent that error.

            I think you should probably use factor-variable notation for categorical variables at the right-hand side of the equals sign, i.e., instead of Race type i.Race.

            Concerning skewed variables: they are not necessarily problematic. If you are comfortable using regress in your final (analysis) model, regress should also be fine for the imputation model. You might, however, want to consider transforming the respective variable before imputation (and perhaps in the final model also).

            If you have any weights, you might want to include them as an additional predictor to your imputation model.

            Last, 5 imputed datasets are often not enough. Modern computers make it possible to go with a (much) larger number of imputations.

            Comment


            • #7
              Thanks for your helpful comment. How many imputed datasets do you think are often good enough?

              Comment


              • #8
                See this blog entry for a recent discussion. You might want to start with 5 imputations and see how well you are doing.

                Comment


                • #9
                  Thanks. Do you know how to compare the regression coefficients of the two regression models using mi estimate? I got the error below:

                  test [NH_RCF0]2.BQ4=[NH_RCF1]2.BQ4

                  Adjusted Wald test
                  requested action not valid after most recent estimation command


                  Comment

                  Working...
                  X