Listwise deletion

Nik Nac

Join Date: Mar 2022

Posts: 3
#1

Listwise deletion

29 Apr 2022, 09:11

Hi everybody,

I have a small sample and a few values are missing sometimes for some variables.
If I use the "reg" command obersavtions with missing values are dropped (listwise deletion)
I would like to include all observations in the regression.

Is this possible and how?

Thanks in advance,
Nik
Tags: deletion, listwise, reg, regression
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#2

29 Apr 2022, 09:15

Nik:
no, it is not, unless you deal with missing data (say, via -mi-).

Kind regards,
Carlo
(Stata 19.0)
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#3

29 Apr 2022, 09:40

For the simple linear model, I would go with

Code:

sem depvar <- indepvars , method(mlmv)

and workaround the technical limitations(e.g., no support for factor-variable notation).
1 like
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3047

29 Apr 2022, 10:19

For what Carlo suggested look up -mi impute regress-, but what Daniel suggests is easier.

There is also a frowned upon method called dummy variable adjustment.

Code:

. sysuse auto, clear
(1978 automobile data)

. replace mpg = . in 1/10
(10 real changes made, 10 to missing)

. replace headroom = . in 20/30
(11 real changes made, 11 to missing)

. summ mpg

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         64    21.57813    6.054789         12         41

. gen mpgimp = cond(missing(mpg),r(mean), mpg)

. gen mpgd = missing(mpg)

. summ headroom

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
    headroom |         63     2.97619    .8300242        1.5          5

. gen headroomimp =  cond(missing(headroom),r(mean), headroom)

. gen headroomd = missing(headroom)

. reg price mpgimp mpgd headroomimp headroomd

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(4, 69)        =      4.88
       Model |   139936057         4  34984014.2   Prob > F        =    0.0016
    Residual |   495129339        69  7175787.53   R-squared       =    0.2203
-------------+----------------------------------   Adj R-squared   =    0.1752
       Total |   635065396        73  8699525.97   Root MSE        =    2678.8

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      mpgimp |  -247.8063   59.92469    -4.14   0.000    -367.3528   -128.2598
        mpgd |   -703.501    937.412    -0.75   0.456    -2573.587    1166.585
 headroomimp |  -149.5545   435.2293    -0.34   0.732    -1017.813    718.7043
   headroomd |  -60.57027   912.7814    -0.07   0.947    -1881.519    1760.379
       _cons |   12061.63   2123.454     5.68   0.000     7825.451     16297.8
------------------------------------------------------------------------------


. sem price <- mpg headroom, method(mlmv) nolog
note: Missing values found in observed exogenous variables. Using the noxconditional behavior.
      Specify the forcexconditional option to override this behavior.
Endogenous variables
  Observed: price

Exogenous variables
  Observed: mpg headroom

Fitting saturated model:
Iteration 0:   log likelihood = -967.33167  
Iteration 1:   log likelihood = -966.56026  
Iteration 2:   log likelihood = -966.54433  
Iteration 3:   log likelihood =  -966.5443  

Fitting baseline model:
Iteration 0:   log likelihood = -974.95096  
Iteration 1:   log likelihood = -974.93581  
Iteration 2:   log likelihood =  -974.9358  

Structural equation model                                   Number of obs = 74
Estimation method: mlmv

Log likelihood = -966.5443

----------------------------------------------------------------------------------
                 |                 OIM
                 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
Structural       |
  price          |
             mpg |  -236.5288   56.92707    -4.15   0.000    -348.1038   -124.9537
        headroom |  -173.8385   443.7614    -0.39   0.695    -1043.595    695.9179
           _cons |   11787.66   2158.633     5.46   0.000     7556.816     16018.5
-----------------+----------------------------------------------------------------
        mean(mpg)|   21.56434   .7305129    29.52   0.000     20.13256    22.99612
   mean(headroom)|   3.001718   .1033196    29.05   0.000     2.799215    3.204221
-----------------+----------------------------------------------------------------
     var(e.price)|    6719195    1122370                       4843208     9321835
         var(mpg)|   35.48287   6.163183                      25.24466    49.87327
    var(headroom)|   .6799505   .1214264                      .4791468     .964908
-----------------+----------------------------------------------------------------
cov(mpg,headroom)|  -1.737664   .6894317    -2.52   0.012    -3.088925   -.3864027
----------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0) = 0.00                     Prob > chi2 = .

. reg price mpg headroom, noheader
------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -209.5492   67.06329    -3.12   0.003    -344.2498   -74.84864
    headroom |  -194.9173    473.468    -0.41   0.682    -1145.906    756.0712
       _cons |   11344.27   2382.565     4.76   0.000     6558.745    16129.79
------------------------------------------------------------------------------

Comment

daniel klein

Join Date: Mar 2014
Posts: 3824

29 Apr 2022, 11:59

There is more to say. The dummy variable adjustment is frowned upon because it will (more) often produce biased estimates and almost always underestimate the standard errors (because imputing a constant decreases the variance of the respective predictor).

In the situation that Joro's example portrays, there are no missing values in the outcome and the missing values in the predictors are (probably) missing completely at random. In this situation, all methods, including listwise deletion, will do about equally well. Moreover, in a linear model that conditions on all predictors, the coefficients would remain unbiased even if the missing values depended on the predictors. If, however, there were missing values in the outcome and/or missing values depended on (i.e., were correlated with) the outcome, the fancier methods (FIML and MI) would start outperforming listwise deletion and dummy variable adjustment would quickly become biased.

Obviously, a small sample does not help with either method.

Edit:

For illustration (only; to evaluate the differences, we need to run simulations), here is an example in which missing values in the predictors depend on the outcome:

Code:

version 17

set seed 42

    quietly {

sysuse auto, clear

sem price <- mpg headroom
estimates store truth

replace mpg = . if runiform() < .3 & price >= 6165
replace headroom = . if runiform() < .3 & price < 6165

sem price <- mpg headroom
estimates store listwise

sem price <- mpg headroom, method(mlmv)
estimates store fiml

generate mpgd = missing(mpg)
summarize mpg
replace mpg = r(mean) if mpgd

generate headroomd = missing(headroom)
summarize headroom
replace headroom =  r(mean) if headroomd

sem price <- mpg mpgd headroom headroomd
estimates store mean_imp

replace mpg = . if mpgd
replace headroom = . if headroomd

mi set flong
mi register imputed mpg headroom
mi impute chained (regress) mpg headroom = price , add(20)

mi estimate , cmdok post : sem price <- mpg headroom
estimates store mi
    
    } // quietly

estimates table truth listwise fiml mi mean_imp ///
    , b(%9.3f) se(%9.3f) keep(mpg headroom) stats(N)

The results are:

Code:

--------------------------------------------------------------------------
    Variable |   truth     listwise      fiml         mi       mean_imp   
-------------+------------------------------------------------------------
         mpg |  -259.106    -247.383    -241.713    -238.028    -203.772  
             |    57.228      65.891      59.347      61.282      54.899  
    headroom |  -334.021    -520.517    -322.713    -335.361    -216.044  
             |   391.367     457.227     414.857     416.918     389.370  
-------------+------------------------------------------------------------
           N |        74          56          74          74          74  
--------------------------------------------------------------------------
                                                              Legend: b/se

Last edited by daniel klein; 29 Apr 2022, 12:23.

Announcement

Comment

Comment

Comment

Comment