Missing variables in multi-level logistic regression

Mara deVries

Join Date: Mar 2024

Posts: 5
#1

Missing variables in multi-level logistic regression

13 Mar 2024, 07:48

Hi!

I want to use multi-level logistic regression as I have data on both the individual level and country level and a binary dependent variable.

My independent, dependent , and moderating variables have no missing variables, but my control variables on both individual and country level do have missing variables.
Since the country controls are missing due to merging with other databases, I figured I could conclude that these are missing completely at random. Therefore, I want to use a binary indicating whether the variable is missing and include this in the analysis.

However, the individual level controls are not MCAR. I read that this means that deleting these observations or using a binary indicating missing creates bias. Does anyone know how to deal with these missing variables?

Kind regards,
Mara
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

13 Mar 2024, 08:35

Mara:
welcome to this forum.
If your data are MCAR, you should do nothing, as the resulting sample is an (inefficient) random sample from your original one. If this were the case, you may want to go -mi- to rule the aforementioned inefficiency (but this is not mandatory).
If your data are MAR, you should go -mi-.
Avoid following half-way fixes, such as categorical variable to indicate missingness.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mara deVries

Join Date: Mar 2024

Posts: 5
#3

14 Mar 2024, 03:30

Dear Carlo Lazzaro,

Thank you for the reply!

I a not really familiar with multiple imputation, but I will definitely try to see if I can use it.
Do you happen to have any tips or things to be aware of for using this technique?
I have a large dataset (about 145.000) observations, so I hope it is still feasible to go -mi-.

Kind regards,
Mara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

14 Mar 2024, 03:37

Mara:
take a look at -mi- entry, Stata .pdf manual, and related references.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mara deVries

Join Date: Mar 2024

Posts: 5
#5

18 Mar 2024, 06:07

Hi!

I have used multiple imputation for the missing values of one binary variable and for four categorical variables.
The results show that all Incomplete have been Imputed.

However, when I tried estimating:
mi estimate: mixed TEAyy age gender_bin easystartL UNEDUC KNOWENyy opportL suskillL fearfailL GDP Innovation_index || country_id:, mle

I received the message:
" estimation sample varies between m=1 and m=20 "

I have tried the noisily option and it seems to be running fine at first, but then stops after (running mixed on m=20).
I also do not have any passive variables.

Does anyone know how to fix this?

Thank you!
Mara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

18 Mar 2024, 08:15

Mara:
investigate the consistency of the sample size in -m1- up to -m20-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mara deVries

Join Date: Mar 2024

Posts: 5
#7

18 Mar 2024, 08:34

Dear Carlo Lazzaro ,

Thank you for the reply.
I have looked at the sample sizes by using
count if _mi_m==1

And it showed that for 1 up to 25 there were 18,595 observations.

Do you know anything else that could cause the error?

Kind regards,
Mara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

18 Mar 2024, 11:15

Mara:
are you sure that your -mi- procedure did not leave missing values?

Kind regards,
Carlo
(Stata 19.0)
Comment
Mara deVries

Join Date: Mar 2024

Posts: 5
#9

18 Mar 2024, 11:42

Dear Carlo Lazzaro ,

For all the variables that I imputed it said that the number of imputed equaled the incomplete.
I also only used complete variables for the imputation.

When I sum the variables, they do not have the same number of observations as the total, but I thought this was unavoidable.

Kind regards,
Mara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

18 Mar 2024, 12:05

Mara:
I'd check whether a -mle- convergence issue can explain what's going on.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Missing variables in multi-level logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment