Dummy variable adjustment for missing values

Paul Olz

Join Date: Nov 2022

Posts: 27
#1

Dummy variable adjustment for missing values

20 Dec 2022, 08:24

I am controlling in a regression besides four other variables for control variable Z. Unfortunately, it is the only variable that does not have values due to the fact that it is a growth variable that calculates the delta of two values. All the others variables would give me more than 1,200 observations. the addition of this variable would reduce the sample to almost 800 observations.

Therefore I looked at similar papers and realized that a dummy variable adjustment exists for panel data. Although I have already found some reports which criticize the method, I would like to add the approach to my calculation. The following lines show my approach for the moment.

gen Zgrowth = D.z

gen Zgrowth_dummy = 0
replace Zgrowth_dummy = 1 if Zgrowth == .

replace Zgrowth = 0 if salesgrwoth == .

How do I conduct not the xtreg command? Should I calculate two separate regressions?

Thank you very much in advance for your support!!!!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3151
#2

23 Dec 2022, 15:12

I'd run it without the missing data and then just add the Zgrowth_dummy as an additional regressor to see if there's much difference.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

23 Dec 2022, 15:47

The dummy variable method of handling missing data involves creating a dummy equal to 1 if your regressor is missing, and 0 otherwise; then replacing the missing values of the regressor with some constant, often the unconditional mean of the regressor. (In your case you have decided this constant to be 0.) Then you run a regression on both your regressor and the dummy you created.

So your code should be

Code:

gen Zgrowth = D.z gen Zgrowth_dummy = Zgrowth == . replace Zgrowth = 0 if Zgrowth == .

Then you just use your regressor and the dummy in your regressions, e.g.,

Code:

xtreg Y Zgrowth Zgrowth_dummy
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1132
#4

24 Dec 2022, 06:38

This article may be of interest.

Groenwold, R. H., White, I. R., Donders, A. R. T., Carpenter, J. R., Altman, D. G., & Moons, K. G. (2012). Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis. Cmaj, 184(11), 1265-1269. https://www.cmaj.ca/content/184/11/1265.short

Key points
The missing-indicator method is a popular and simple method to handle missing data in clinical research but has been criticized for introducing bias.

In nonrandomized studies, the factor or test under study is often related to variables with missing values, in which case the missing-indicator method typically results in biased estimates.

In randomized trials, the distribution of baseline covariates with missing values is likely balanced across treatment groups, which means the missing-indicator method will give unbiased estimates and obeys the intention-to-treat principle.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
George Ford

Join Date: Aug 2014

Posts: 3151
#5

25 Dec 2022, 16:52

Could use Heckman on the missing values.
Comment

Announcement

Dummy variable adjustment for missing values

Comment

Comment

Comment

Comment