incorporating full information maximum likelihood function into the multiple regression

Nader Mehri

Join Date: Jun 2019

Posts: 189
#1

incorporating full information maximum likelihood function into the multiple regression

06 Jul 2020, 07:42

Hi everyone,
I am trying to handle missing values in a multiple regression model described below. I was wondering to know how can I use full information maximum likelihood (FIML) function in the code below to handle missing values.
Thanks,
Nader

svy, regress overallsatis i.BQ4 i.BQ5 BQ1 BQ2 i.Race i.BQ6 i.BQ8 i.BQ7 ADLs i.BQ10 Help i.size_cat form_payement i.Ownership i.urban
Tags: multiple imputation, regression
daniel klein

Join Date: Mar 2014

Posts: 3849
#2

06 Jul 2020, 09:31

I see that virtually all of your predictors are categorical variables. Note that FIML usually assumes that variables are normally distributed. If you can live with that, see

Code:

help sem

which implements FIMEL in the option method(mlmv) and which also supports the svy prefix. sem does not (at least used not to) support factor variable notation, though.

As an alternative, you might consider multiple imputation.

Code:

help mi impute chained

The chained equations approach handles categorical variables well. mi might get a bit tricky with complex survey data, though.
1 like
Comment
Nader Mehri

Join Date: Jun 2019

Posts: 189
#3

15 Jul 2020, 20:26

Thanks Daniel. Does "mi impute mvn" handle missing values for both continuous and categorical variables together?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3849
#4

15 Jul 2020, 21:42

Originally posted by Nader Mehri View Post

Does "mi impute mvn" handle missing values for both continuous and categorical variables together?

No. Well, yes but under the same assumption as FIML. Which of the two is more robust against violations, I cannot tell. Which is better suited for survey data, I cannot tell either. Sorry.
1 like
Comment
Nader Mehri

Join Date: Jun 2019

Posts: 189
#5

16 Jul 2020, 05:39

This perfectly makes sense; thanks for the explanation. How about "mi impute chained"? It appears that it handles missing values for binary, categorical, and continuous variables. May I have your thoughts on the code below. "overallsatis" and "Help" are continuous but fairly skewed.

mi impute chained (logit) form_payement (mlogit) BQ10 (reg) overallsatis (reg) Help= BQ4 BQ5 BQ1 BQ2 Race BQ6 BQ8 BQ7 ADLs size_cat Ownership urban, add(5) by(NH_RCF) force
Comment
daniel klein

Join Date: Mar 2014

Posts: 3849
#6

16 Jul 2020, 06:19

Why force? I see 9 out of 10 posts on Statalist specifying force when they show imputation models; 9 out of 10 times you do not want this! As general advice: never ever specify any option just to supress an error message. Do that if and only if you fully understand why that error arose in the first place and what the option you are specifying does to prevent that error.

I think you should probably use factor-variable notation for categorical variables at the right-hand side of the equals sign, i.e., instead of Race type i.Race.

Concerning skewed variables: they are not necessarily problematic. If you are comfortable using regress in your final (analysis) model, regress should also be fine for the imputation model. You might, however, want to consider transforming the respective variable before imputation (and perhaps in the final model also).

If you have any weights, you might want to include them as an additional predictor to your imputation model.

Last, 5 imputed datasets are often not enough. Modern computers make it possible to go with a (much) larger number of imputations.
1 like
Comment
Nader Mehri

Join Date: Jun 2019

Posts: 189
#7

16 Jul 2020, 07:09

Thanks for your helpful comment. How many imputed datasets do you think are often good enough?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3849
#8

16 Jul 2020, 07:23

See this blog entry for a recent discussion. You might want to start with 5 imputations and see how well you are doing.
Comment
Nader Mehri

Join Date: Jun 2019

Posts: 189
#9

16 Jul 2020, 09:32

Thanks. Do you know how to compare the regression coefficients of the two regression models using mi estimate? I got the error below:

test [NH_RCF0]2.BQ4=[NH_RCF1]2.BQ4

Adjusted Wald test
requested action not valid after most recent estimation command
Comment

Announcement

incorporating full information maximum likelihood function into the multiple regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment