Data still missing variables after multiple imputation

alex badalyan

Join Date: Feb 2018

Posts: 24
#1

Data still missing variables after multiple imputation

12 Apr 2021, 13:40

Hi,

I've been trying to impute missing data with multiple imputation. I have GDPG as my dependent variable and OILR, FDI, IMP, AGR, IND, SER as my independent variables. All independent variables have missing values. I used the following command:

mi impute regress OILR FDI IMP AGR IND SER, add(20) rseed(1234)

Then I checked the dataset and all values were still missing. I then used this command:

mi impute regress OILR FDI IMP AGR IND SER GDPG, add(20) rseed(1234)

And still no luck. I also tried to impute each variable one by one along with GDPG but that did not work either (i.e. mi impute regress OILR GDPG, add(20) rseed(1234) for example).

Can someone please advise me on what to do in this case.

Thanks.

Last edited by alex badalyan; 12 Apr 2021, 13:43.
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2430
#2

12 Apr 2021, 13:41

can you show the output that you get from "mi impute regress"
Comment
alex badalyan

Join Date: Feb 2018

Posts: 24
#3

12 Apr 2021, 13:47

FernandoRios

mi impute regress OILR FDI IMP AGR IND SER, add(20) rseed(1234)
note: variables FDI IMP AGR IND SER registered as imputed and used to model variable OILR; this
may cause some observations to be omitted from the estimation and may lead to missing
imputed values
OILR: missing imputed values produced
This may occur when imputation variables are used as independent variables or when
independent variables contain missing values. You can specify option force if you wish to
proceed anyway.
r(498);

mi impute regress OILR FDI IMP AGR IND SER GDPG, add(20) rseed(1234)
note: variables FDI IMP AGR IND SER registered as imputed and used to model variable OILR; this
may cause some observations to be omitted from the estimation and may lead to missing
imputed values
OILR: missing imputed values produced
This may occur when imputation variables are used as independent variables or when
independent variables contain missing values. You can specify option force if you wish to
proceed anyway.
r(498);

mi impute regress OILR GDPG, add(20) rseed(1234)

Univariate imputation Imputations = 90
Linear regression added = 20
Imputed: m=71 through m=90 updated = 0

------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
OILR | 248 12 12. | 260
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#4

12 Apr 2021, 14:38

did you check that atleast one of the variables i non missing for all observations?
if you have cases where ALL are missing, Mi impute cannot do much.
Comment
alex badalyan

Join Date: Feb 2018

Posts: 24
#5

12 Apr 2021, 15:36

The GDPG has all values present (this is the dependent variable), but all of the independent variables have missing values.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#6

12 Apr 2021, 16:10

You want something like

Code:

mi impute chained (regress) OILR FDI IMP AGR IND SER = GDPG , add(20) rseed(1234)

assuming that (i) all variables on the right-hand side of the equals sign do not have any missing values and (ii) all the variables on the left-hand side of the equals sign are continuous (iii) and linear regression is a reasonable model to fill in the respective missing values.

You want to make sure that your imputation model includes all variables -- including the dependent variable -- that you will use in your analyses later. Any variable that you omit from the imputation model will have its association to the other variables biased towards zero; the same is true for any non-linear associations and really anything that is not built into the imputation model.
2 likes
Comment
alex badalyan

Join Date: Feb 2018

Posts: 24
#7

13 Apr 2021, 03:15

Hi Daniel,

I actually tried this as well and tried it again and I get this in the output (all my variables fit with the assumptions you outlined):

mi impute chained (regress) OILR FDI IMP AGR IND SER = GDPG, add
> (20) rseed(1234)

Conditional models:
IMP: regress IMP FDI OILR AGR IND SER GDPG
FDI: regress FDI IMP OILR AGR IND SER GDPG
OILR: regress OILR IMP FDI AGR IND SER GDPG
AGR: regress AGR IMP FDI OILR IND SER GDPG
IND: regress IND IMP FDI OILR AGR SER GDPG
SER: regress SER IMP FDI OILR AGR IND GDPG

Performing chained iterations ...

Multivariate imputation Imputations = 130
Chained equations added = 20
Imputed: m=111 through m=130 updated = 0

Initialization: monotone Iterations = 200
burn-in = 10

OILR: linear regression
FDI: linear regression
IMP: linear regression
AGR: linear regression
IND: linear regression
SER: linear regression

------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
OILR | 248 12 12. | 260
FDI | 249 11 11 | 260
IMP | 252 8 8 | 260
AGR | 248 12 12 | 260
IND | 233 27 27 | 260
SER | 233 27 27 | 260
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)

.

However all the missing values still remain in my dataset…
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#8

13 Apr 2021, 03:42

Yes, the original dataset still has missing values. That is supposed to be the case.

I get the impression that you are fairly new to multiple imputation. I cannot tell whether that applies only to the technical details of Stata or also to the theoretical foundations of the approach. A forum discussion will probably not compensate for the latter. I would recommend that you stop here, take a step back, and start by reading (at least) pages 1--15 of [MI] Multiple Imputation.
Comment
alex badalyan

Join Date: Feb 2018

Posts: 24
#9

13 Apr 2021, 04:45

Hi Daniel,

I have read over the Multiple Imputation manual but still don't seem to understand what I'm doing wrong and why after imputing I don't have the new generated imputed values in my dataset.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 662
#10

13 Apr 2021, 04:57

What happens if you actually run the imputed analysis? Like

Code:

mi estimate: regress GDPG OILR FDI IMP AGR IND SER

Can you provide the output?

Best wishes

(Stata 16.1 MP)
Comment
alex badalyan

Join Date: Feb 2018

Posts: 24
#11

13 Apr 2021, 05:54

Hi Felix,

Here is the output:

mi estimate: regress GDPG OILR FDI IMP AGR IND SER

Multiple-imputation estimates Imputations = 20
Linear regression Number of obs = 260
Average RVI = 0.2623
Largest FMI = 0.5038
Complete DF = 253
DF adjustment: Small sample DF: min = 49.21
avg = 114.79
max = 176.96
Model F test: Equal FMI F( 6, 220.5) = 6.11
Within VCE type: OLS Prob > F = 0.0000

GDPG Coef. Std. Err. t P>t [95% Conf. Interval]

OILR .1569129 .0720589 2.18 0.034 .012121 .3017049
FDI .8074652 .2665314 3.03 0.003 .2774461 1.337484
IMP .0143569 .0400254 0.36 0.720 -.06469 .0934039
AGR .3803026 .1657976 2.29 0.023 .0521299 .7084754
IND -.1397202 .1097591 -1.27 0.208 -.3594361 .0799956
SER -.0922856 .0729068 -1.27 0.207 -.2361643 .051593
_cons 8.334273 6.398373 1.30 0.195 -4.306914 20.97546

Kind regards
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#12

13 Apr 2021, 06:05

The outputs in #7 and #11 suggest that you have 20 complete datasets with 260 observations, each. I do not know where your confusion comes from.

Show (in code delimiters, as Felix did) the results of

Code:

mi query

Perhaps you have used flongsep style, in which case the dataset in memory only includes the original observations with missing values while the completed datasets are stored separately on disk.
Comment

Announcement

Data still missing variables after multiple imputation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment