Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Listwise deletion

    Hi everybody,

    I have a small sample and a few values are missing sometimes for some variables.
    If I use the "reg" command obersavtions with missing values are dropped (listwise deletion)
    I would like to include all observations in the regression.

    Is this possible and how?

    Thanks in advance,
    Nik




  • #2
    Nik:
    no, it is not, unless you deal with missing data (say, via -mi-).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      For the simple linear model, I would go with

      Code:
      sem depvar <- indepvars , method(mlmv)
      and workaround the technical limitations(e.g., no support for factor-variable notation).

      Comment


      • #4
        For what Carlo suggested look up -mi impute regress-, but what Daniel suggests is easier.

        There is also a frowned upon method called dummy variable adjustment.

        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . replace mpg = . in 1/10
        (10 real changes made, 10 to missing)
        
        . replace headroom = . in 20/30
        (11 real changes made, 11 to missing)
        
        . summ mpg
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
                 mpg |         64    21.57813    6.054789         12         41
        
        . gen mpgimp = cond(missing(mpg),r(mean), mpg)
        
        . gen mpgd = missing(mpg)
        
        . summ headroom
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
            headroom |         63     2.97619    .8300242        1.5          5
        
        . gen headroomimp =  cond(missing(headroom),r(mean), headroom)
        
        . gen headroomd = missing(headroom)
        
        . reg price mpgimp mpgd headroomimp headroomd
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(4, 69)        =      4.88
               Model |   139936057         4  34984014.2   Prob > F        =    0.0016
            Residual |   495129339        69  7175787.53   R-squared       =    0.2203
        -------------+----------------------------------   Adj R-squared   =    0.1752
               Total |   635065396        73  8699525.97   Root MSE        =    2678.8
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              mpgimp |  -247.8063   59.92469    -4.14   0.000    -367.3528   -128.2598
                mpgd |   -703.501    937.412    -0.75   0.456    -2573.587    1166.585
         headroomimp |  -149.5545   435.2293    -0.34   0.732    -1017.813    718.7043
           headroomd |  -60.57027   912.7814    -0.07   0.947    -1881.519    1760.379
               _cons |   12061.63   2123.454     5.68   0.000     7825.451     16297.8
        ------------------------------------------------------------------------------
        
        
        . sem price <- mpg headroom, method(mlmv) nolog
        note: Missing values found in observed exogenous variables. Using the noxconditional behavior.
              Specify the forcexconditional option to override this behavior.
        Endogenous variables
          Observed: price
        
        Exogenous variables
          Observed: mpg headroom
        
        Fitting saturated model:
        Iteration 0:   log likelihood = -967.33167  
        Iteration 1:   log likelihood = -966.56026  
        Iteration 2:   log likelihood = -966.54433  
        Iteration 3:   log likelihood =  -966.5443  
        
        Fitting baseline model:
        Iteration 0:   log likelihood = -974.95096  
        Iteration 1:   log likelihood = -974.93581  
        Iteration 2:   log likelihood =  -974.9358  
        
        Structural equation model                                   Number of obs = 74
        Estimation method: mlmv
        
        Log likelihood = -966.5443
        
        ----------------------------------------------------------------------------------
                         |                 OIM
                         | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
        -----------------+----------------------------------------------------------------
        Structural       |
          price          |
                     mpg |  -236.5288   56.92707    -4.15   0.000    -348.1038   -124.9537
                headroom |  -173.8385   443.7614    -0.39   0.695    -1043.595    695.9179
                   _cons |   11787.66   2158.633     5.46   0.000     7556.816     16018.5
        -----------------+----------------------------------------------------------------
                mean(mpg)|   21.56434   .7305129    29.52   0.000     20.13256    22.99612
           mean(headroom)|   3.001718   .1033196    29.05   0.000     2.799215    3.204221
        -----------------+----------------------------------------------------------------
             var(e.price)|    6719195    1122370                       4843208     9321835
                 var(mpg)|   35.48287   6.163183                      25.24466    49.87327
            var(headroom)|   .6799505   .1214264                      .4791468     .964908
        -----------------+----------------------------------------------------------------
        cov(mpg,headroom)|  -1.737664   .6894317    -2.52   0.012    -3.088925   -.3864027
        ----------------------------------------------------------------------------------
        LR test of model vs. saturated: chi2(0) = 0.00                     Prob > chi2 = .
        
        . reg price mpg headroom, noheader
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 mpg |  -209.5492   67.06329    -3.12   0.003    -344.2498   -74.84864
            headroom |  -194.9173    473.468    -0.41   0.682    -1145.906    756.0712
               _cons |   11344.27   2382.565     4.76   0.000     6558.745    16129.79
        ------------------------------------------------------------------------------

        Comment


        • #5
          There is more to say. The dummy variable adjustment is frowned upon because it will (more) often produce biased estimates and almost always underestimate the standard errors (because imputing a constant decreases the variance of the respective predictor).

          In the situation that Joro's example portrays, there are no missing values in the outcome and the missing values in the predictors are (probably) missing completely at random. In this situation, all methods, including listwise deletion, will do about equally well. Moreover, in a linear model that conditions on all predictors, the coefficients would remain unbiased even if the missing values depended on the predictors. If, however, there were missing values in the outcome and/or missing values depended on (i.e., were correlated with) the outcome, the fancier methods (FIML and MI) would start outperforming listwise deletion and dummy variable adjustment would quickly become biased.

          Obviously, a small sample does not help with either method.


          Edit:

          For illustration (only; to evaluate the differences, we need to run simulations), here is an example in which missing values in the predictors depend on the outcome:

          Code:
          version 17
          
          set seed 42
          
              quietly {
          
          sysuse auto, clear
          
          sem price <- mpg headroom
          estimates store truth
          
          replace mpg = . if runiform() < .3 & price >= 6165
          replace headroom = . if runiform() < .3 & price < 6165
          
          sem price <- mpg headroom
          estimates store listwise
          
          sem price <- mpg headroom, method(mlmv)
          estimates store fiml
          
          generate mpgd = missing(mpg)
          summarize mpg
          replace mpg = r(mean) if mpgd
          
          generate headroomd = missing(headroom)
          summarize headroom
          replace headroom =  r(mean) if headroomd
          
          sem price <- mpg mpgd headroom headroomd
          estimates store mean_imp
          
          replace mpg = . if mpgd
          replace headroom = . if headroomd
          
          mi set flong
          mi register imputed mpg headroom
          mi impute chained (regress) mpg headroom = price , add(20)
          
          mi estimate , cmdok post : sem price <- mpg headroom
          estimates store mi
              
              } // quietly
          
          estimates table truth listwise fiml mi mean_imp ///
              , b(%9.3f) se(%9.3f) keep(mpg headroom) stats(N)

          The results are:

          Code:
          --------------------------------------------------------------------------
              Variable |   truth     listwise      fiml         mi       mean_imp   
          -------------+------------------------------------------------------------
                   mpg |  -259.106    -247.383    -241.713    -238.028    -203.772  
                       |    57.228      65.891      59.347      61.282      54.899  
              headroom |  -334.021    -520.517    -322.713    -335.361    -216.044  
                       |   391.367     457.227     414.857     416.918     389.370  
          -------------+------------------------------------------------------------
                     N |        74          56          74          74          74  
          --------------------------------------------------------------------------
                                                                        Legend: b/se
          Last edited by daniel klein; 29 Apr 2022, 12:23.

          Comment

          Working...
          X