Creating Dummy for Imputed Values

Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#1

Creating Dummy for Imputed Values

18 Feb 2022, 05:03

Dear Statalisters,

my control variables (CV) contain a high number of missing. For this reason, I will create a table reporting three columns: coefficients without CV, with CV (and a low number of observations) as well as with imputed values for the CV.
For the last column, I will impute the mean on these controls and include a dummy variable for imputation in the model.

I have already imputed the values with the command below. Note that IDescola means ID of the school.

Code:

foreach var of varlist $controlvar2 { egen mean = mean(`var'), by(IDescola) gen `var'imp = cond(missing(`var'), mean, `var') drop mean }

Now I would like to create a dummy indicating if the observation (row) contains any imputation for the CV. In other others: My command should look all the variables in $controlvar2 and report 1 if it finds any imputed value for the observation/row.

Does anyone have any idea of how can I create this dummy?
Any advice would be highly appreciated!
Thanks in advance.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#2

18 Feb 2022, 05:41

Hi Tharcisio, you could try

Code:

egen nmiss = rowmiss($controlvar2) gen anymiss = nmiss > 0

Best wishes

(Stata 16.1 MP)
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#3

18 Feb 2022, 10:54

Tharcisio:
as an aside to Felixìs helpful code, do you think that imputing the mean of the observed values of the CV is the most reliable strategy to deal with missing data?
Have you performed a diagnostic check on the underlying missing mechanism?
Is your missingness ignorable or not?

Kind regards,
Carlo
(StataNow 18.5)
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#4

18 Feb 2022, 11:18

Hello Everybody,

@Felix: Thanks !! A simple and effective code.

@Carlo: Thanks for your feedback.

1. Do you think that imputing the mean of the observed values of the CV is the most reliable strategy to deal with missing data?

No, I would prefer to use the Multiple Imputation (MI). But the reviewer of my paper asked me for the mean. For this reason, I am using the mean imputation.

2. Have you performed a diagnostic check on the underlying missing mechanism?

Not yet, but I will do that. I will include a table in appendix comparing the final sample (with CV and missing) with the whole sample (without CV and missing). Is that what do you mean with "diagnostic check"?

Thanks.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#5

18 Feb 2022, 11:40

Tharcisio:
1) just to save (as per a famous Italian saying) the goat (reviewers' recommendation that should obviously be satisfied) and the cabbages (the methodological standing of your research), can't you present both mean imputation (that, as we know, underestimates the variance) and a (simple) -mi- (provided that your data are MAR)?
2) no, I meant to diagnosis if your data are missing completely at arandom (MCAR), missing at random (MAR) or missing non at random (MNAR).

Kind regards,
Carlo
(StataNow 18.5)
Comment
Tharcisio Leone

Join Date: Sep 2019

Posts: 37
#6

18 Feb 2022, 12:11

Carlo Lazzaro ,
Yes, I will check this assumption later using -mcartest. Would you recommend this command as well?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#7

18 Feb 2022, 12:19

Tharcisio:
yes, I use it from time to time.

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment

Announcement

Creating Dummy for Imputed Values

Comment

Comment

Comment

Comment

Comment

Comment