Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regressions affected by missing values

    Hello,

    First of all, I will show a sample of my data in order you can advise me more efficiently:

    Code:
    input float(var1 var2 var3 var4) int var5 float(var6 var7 var8) int var9 float var10
            .        .     .      .    .      .     .    .    .         .
            .        .     .      .    .      .     .    .    .         .
            .        .     .      .    .      .     .    .    .         .
            .        .     .      .    .      .     .    .    .         .
     .0013657    .0025  2.16   2.05    0   .013  .751 2.67    0 3.6963515
     .0016142    .0025  2.16   1.66    0   .012  .294  .83    0 2.1860511
    -.0087183    .0025  2.58  13.99    1   .015  .055 4.97    1  .3364722
    -.0117893    .0025  4.58   4.34    0   .014  .975 1.63    1  5.605434
    -.0049639    .0025  8.24   7.02    0  .0106  .727  .38    0  6.633187
     .0136232    .0025  8.36    .75    0   .013  .779  .45    0 4.3515673
        .0039    .0025  9.13   6.35    0   .011  .753  .08    1  6.466611
            .        .     .      .    .      .     .    .    .         .
            .        .     .      .    .      .     .    .    .         .
            .        .     .      .    .      .     .    .    .         .
    -.0175924        .     .      .    .      .     .    .    .         .
    -.0021373    .0025 10.07   4.36    0  .0096  .623 1.45    1  5.063228
            .        .     .      .    .      .     .    .    .         .
            .        .     .      .    .      .     .    .    .         .
    -.0028949        .     .      .    .      .     .    .    .         .
     -.003418        .     .      .    .      .     .    .    .         .
     .0017405        .     .      .    .      .     .    .    .         .
     .0017405        .     .      .    .      .     .    .    .         .
     -.003514        .     .      .    .      .     .    .    .         .
     .0026327        .     .      .    .      .     .    .    .         .
    -.0050415      .01  1.29   1.46    0  .0208  .055  .26    0         .
    -.0050251        .     .      .    .      .     .    .    .         .
     -.004573    .0025  1.29   1.46    0  .0133  .055  .26    0  .6418539
    I want to regress the first variable on the other nine variables that compose my data. However, as you can observe on the sample data above, there are observations for which there is no complete data (i.e. some observations have data for the first variable but not for the rest of variables, or other observations have data for all variables except for the last one). My main question is if entering the regress command without modificating the data (i.e. as it appears on the above sample) would affect severly the regression results. If so, what changes should I make to the data?Thanks a lot for your help.

  • #2
    Like most estimation commands in Stata, regress will ignore any observation for which one or more of the dependent and independent variables are missing. So you need do nothing special, Stata will do the right thing for you.

    Comment


    • #3
      missing data generally causes two problems for the analyst: (1) reduction of statistical power because of the reduced N for the analysis; (2) if the observations with missing data differ from the observations without missing data, then just analyzing the complete cases will result in bias; you don't give us any information about the reasons why some data are missing; however, in many situations, multiple imputation will reduce bias; you should read up on this; a good place to start is the Stata documentation for the "mi" set of commands (so, start with "h mi")

      Comment


      • #4
        Just another question. When I run the regressions the result window shows me the number of observations. This number of observations corresponds to the number of observations for which there is full data (i.e. data on all variables), right?

        Thank you again.

        Comment


        • #5
          Inigo:
          yes, you're right.
          Besides, if you add an -if- qualifier at the end of you regression command, Stata will report the number of observations with no missing values that satisfy your condition(s) (i.e., a subsample of observations with no missing values in <depvar> and <indepvars>).
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Carlo: i tried following command, but number of observation in result window, didnt change
            reg <depvar> <indepvars> if <depvar>!=.

            Comment


            • #7
              Try:

              Code:
              reg depvar indepvar1 indepvar2 if !missing(depvar, indepvar1, indepvar2)
              Best wishes

              (Stata 16.1 MP)

              Comment


              • #8
                Inigo:
                otherwise, you can use the -rowmiss- function available from -egen-:
                Code:
                . egen wanted=rowmiss(var1- var10)
                
                . regress var1 var2-var10 if wanted==0
                note: var2 omitted because of collinearity
                
                      Source |       SS           df       MS      Number of obs   =         9
                -------------+----------------------------------   F(8, 0)         =         .
                       Model |  .000455235         8  .000056904   Prob > F        =         .
                    Residual |           0         0           .   R-squared       =    1.0000
                -------------+----------------------------------   Adj R-squared   =         .
                       Total |  .000455235         8  .000056904   Root MSE        =         0
                
                ------------------------------------------------------------------------------
                        var1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                        var2 |          0  (omitted)
                        var3 |   -.005761          .        .       .            .           .
                        var4 |   .0823143          .        .       .            .           .
                        var5 |  -.2624258          .        .       .            .           .
                        var6 |  -104.3477          .        .       .            .           .
                        var7 |   1.738927          .        .       .            .           .
                        var8 |  -.1390943          .        .       .            .           .
                        var9 |  -.0275399          .        .       .            .           .
                       var10 |  -.3090507          .        .       .            .           .
                       _cons |   1.409393          .        .       .            .           .
                ------------------------------------------------------------------------------
                
                .
                As expected, the outcome is identical to the one obtained just leaving Stata applying listwise deletion by default:
                Code:
                . regress var1 var2-var10
                note: var2 omitted because of collinearity
                
                      Source |       SS           df       MS      Number of obs   =         9
                -------------+----------------------------------   F(8, 0)         =         .
                       Model |  .000455235         8  .000056904   Prob > F        =         .
                    Residual |           0         0           .   R-squared       =    1.0000
                -------------+----------------------------------   Adj R-squared   =         .
                       Total |  .000455235         8  .000056904   Root MSE        =         0
                
                ------------------------------------------------------------------------------
                        var1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                        var2 |          0  (omitted)
                        var3 |   -.005761          .        .       .            .           .
                        var4 |   .0823143          .        .       .            .           .
                        var5 |  -.2624258          .        .       .            .           .
                        var6 |  -104.3477          .        .       .            .           .
                        var7 |   1.738927          .        .       .            .           .
                        var8 |  -.1390943          .        .       .            .           .
                        var9 |  -.0275399          .        .       .            .           .
                       var10 |  -.3090507          .        .       .            .           .
                       _cons |   1.409393          .        .       .            .           .
                ------------------------------------------------------------------------------
                
                .
                Last edited by Carlo Lazzaro; 16 Nov 2021, 03:16.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment

                Working...
                X