Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variable adjustment for missing values

    I am controlling in a regression besides four other variables for control variable Z. Unfortunately, it is the only variable that does not have values due to the fact that it is a growth variable that calculates the delta of two values. All the others variables would give me more than 1,200 observations. the addition of this variable would reduce the sample to almost 800 observations.

    Therefore I looked at similar papers and realized that a dummy variable adjustment exists for panel data. Although I have already found some reports which criticize the method, I would like to add the approach to my calculation. The following lines show my approach for the moment.

    gen Zgrowth = D.z

    gen Zgrowth_dummy = 0
    replace Zgrowth_dummy = 1 if Zgrowth == .

    replace Zgrowth = 0 if salesgrwoth == .


    How do I conduct not the xtreg command? Should I calculate two separate regressions?


    Thank you very much in advance for your support!!!!

  • #2
    I'd run it without the missing data and then just add the Zgrowth_dummy as an additional regressor to see if there's much difference.

    Comment


    • #3
      The dummy variable method of handling missing data involves creating a dummy equal to 1 if your regressor is missing, and 0 otherwise; then replacing the missing values of the regressor with some constant, often the unconditional mean of the regressor. (In your case you have decided this constant to be 0.) Then you run a regression on both your regressor and the dummy you created.

      So your code should be

      Code:
      gen Zgrowth = D.z
      
      gen Zgrowth_dummy = Zgrowth == .
      
      replace Zgrowth = 0 if Zgrowth == .
      Then you just use your regressor and the dummy in your regressions, e.g.,

      Code:
      xtreg Y Zgrowth Zgrowth_dummy

      Comment


      • #4
        This article may be of interest.


        Groenwold, R. H., White, I. R., Donders, A. R. T., Carpenter, J. R., Altman, D. G., & Moons, K. G. (2012). Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis. Cmaj, 184(11), 1265-1269. https://www.cmaj.ca/content/184/11/1265.short

        Key points
        • The missing-indicator method is a popular and simple method to handle missing data in clinical research but has been criticized for introducing bias.
        • In nonrandomized studies, the factor or test under study is often related to variables with missing values, in which case the missing-indicator method typically results in biased estimates.
        • In randomized trials, the distribution of baseline covariates with missing values is likely balanced across treatment groups, which means the missing-indicator method will give unbiased estimates and obeys the intention-to-treat principle.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          Could use Heckman on the missing values.

          Comment

          Working...
          X