Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing value generated after predicting standardized residual

    Hello,

    When I predict standardized residuals after a regression (including a dummy where only 1 observation is non-zero), predict will generate a missing value when I use factornotation (i.e. the i-prefix) for my dummy. If I omit the i-prefix, there is no missing value generated ... does anyone has an explanation for this finding? If I predict 'normal' residuals, there is no problem (in both cases). See example code below:

    Code:
    sysuse auto, clear
    gen D=0
    replace D=1 in 1
    reg price mpg headroom D
    predict r1, res 
    predict rs1, rstandard
    reg price mpg headroom i.D
    predict r2, res 
    predict rs2, rstandard
    list r1 r2 rs1 rs2 in 1/5
    Thank you,
    Mike

  • #2
    They are not missing, they are 0s.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . 
    . gen D=0
    
    . 
    . replace D=1 in 1
    (1 real change made)
    
    . 
    . reg price mpg headroom D
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(3, 70)        =      7.12
           Model |   148557366         3  49519122.1   Prob > F        =    0.0003
        Residual |   486508030        70  6950114.71   R-squared       =    0.2339
    -------------+----------------------------------   Adj R-squared   =    0.2011
           Total |   635065396        73  8699525.97   Root MSE        =    2636.3
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -259.8201   58.59083    -4.43   0.000    -376.6758   -142.9644
        headroom |  -355.7493   401.5929    -0.89   0.379    -1156.701    445.2025
               D |  -2087.359   2660.911    -0.78   0.435    -7394.378     3219.66
           _cons |   12791.77   2084.729     6.14   0.000     8633.913    16949.64
    ------------------------------------------------------------------------------
    
    . 
    . predict r1, res 
    
    . 
    . predict rs1, rstandard
    
    . 
    . reg price mpg headroom i.D
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(3, 70)        =      7.12
           Model |   148557366         3  49519122.1   Prob > F        =    0.0003
        Residual |   486508030        70  6950114.71   R-squared       =    0.2339
    -------------+----------------------------------   Adj R-squared   =    0.2011
           Total |   635065396        73  8699525.97   Root MSE        =    2636.3
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -259.8201   58.59083    -4.43   0.000    -376.6758   -142.9644
        headroom |  -355.7493   401.5929    -0.89   0.379    -1156.701    445.2025
             1.D |  -2087.359   2660.911    -0.78   0.435    -7394.378     3219.66
           _cons |   12791.77   2084.729     6.14   0.000     8633.913    16949.64
    ------------------------------------------------------------------------------
    
    . 
    . predict r2, res 
    
    . 
    . predict rs2, rstandard
    
    . 
    . list r1 r2 rs1 rs2 in 1/5
    
         +-----------------------------------------------+
         |        r1          r2         rs1         rs2 |
         |-----------------------------------------------|
      1. | -7.28e-12           0   -4.49e-08           0 |
      2. | -2558.585   -2558.585   -.9817648   -.9817648 |
      3. | -2209.484   -2209.484   -.8440055   -.8440055 |
      4. |   -1178.5     -1178.5   -.4613964   -.4613964 |
      5. |  355.5243    355.5243    .1375527    .1375527 |
         +-----------------------------------------------+
    
    .

    Comment


    • #3
      #1 is reproducible in Stata 17.


      Code:
       sysuse auto, clear
      (1978 automobile data)
      
      . gen D=0
      
      . replace D=1 in 1
      (1 real change made)
      
      . reg price mpg headroom D
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(3, 70)        =      7.12
             Model |   148557366         3  49519122.1   Prob > F        =    0.0003
          Residual |   486508030        70  6950114.71   R-squared       =    0.2339
      -------------+----------------------------------   Adj R-squared   =    0.2011
             Total |   635065396        73  8699525.97   Root MSE        =    2636.3
      
      ------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               mpg |  -259.8201   58.59083    -4.43   0.000    -376.6758   -142.9644
          headroom |  -355.7493   401.5929    -0.89   0.379    -1156.701    445.2025
                 D |  -2087.359   2660.911    -0.78   0.435    -7394.378     3219.66
             _cons |   12791.77   2084.729     6.14   0.000     8633.913    16949.64
      ------------------------------------------------------------------------------
      
      . predict r1, res 
      
      . predict rs1, rstandard
      
      . reg price mpg headroom i.D
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(3, 70)        =      7.12
             Model |   148557366         3  49519122.1   Prob > F        =    0.0003
          Residual |   486508030        70  6950114.71   R-squared       =    0.2339
      -------------+----------------------------------   Adj R-squared   =    0.2011
             Total |   635065396        73  8699525.97   Root MSE        =    2636.3
      
      ------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               mpg |  -259.8201   58.59083    -4.43   0.000    -376.6758   -142.9644
          headroom |  -355.7493   401.5929    -0.89   0.379    -1156.701    445.2025
               1.D |  -2087.359   2660.911    -0.78   0.435    -7394.378     3219.66
             _cons |   12791.77   2084.729     6.14   0.000     8633.913    16949.64
      ------------------------------------------------------------------------------
      
      . predict r2, res 
      
      . predict rs2, rstandard
      (1 missing value generated)
      
      . list r1 r2 rs1 rs2 in 1/5
      
           +-----------------------------------------------+
           |        r1          r2         rs1         rs2 |
           |-----------------------------------------------|
        1. | -3.64e-12   -1.82e-12   -3.78e-08           . |
        2. | -2558.585   -2558.585   -.9817648   -.9817648 |
        3. | -2209.484   -2209.484   -.8440055   -.8440055 |
        4. |   -1178.5     -1178.5   -.4613964   -.4613964 |
        5. |  355.5243    355.5243    .1375527    .1375527 |
           +-----------------------------------------------+
      It's evident a side-effect of the uniqueness (literally) of 1, as set two ore more values to 1, and the missing values no longer appear.

      Comment


      • #4
        This is probably a bug, which they tried to fix in Stata 17, but they did not quite succeed. You might want to write to Stata Corp technical support.

        According to the Methods and Formulas of regress postestimation, the statistic that you are calculating is

        stdEj = Ej/ (s*sqrt(1-Hj)), where Ej is the usual residual, Hj is the diagonal element of the hat matrix, and s is the standard error of the regression (RMSE).

        For the singleton dummy, my Stata 15.1 gives me Hj = 1, which seems about right, the singleton dummy is an extreme observation with highest leverage.

        As 1 - 1 = 0, and the 0 is in the denominator, my Stata 15.1 should give me standardised residual missing, but she does not, she gives me value of 0.

        Forward to Stata 17, when you tell Stata that this is a dummy explicitly, she calculates the correct missing value. However if you do not alert Stata that this is a singleton dummy, she still calculates the incorrect value of 0.

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . gen D=0
        
        . replace D=1 in 1
        (1 real change made)
        
        . qui reg price mpg headroom D
        
        . predict r1, res 
        
        . predict rs1, rstandard
        
        . predict hat, hat
        
        . qui reg price mpg headroom i.D
        
        . predict r2, res 
        
        . predict rs2, rstandard
        
        . predict hat2, hat
        
        . list r1 r2 hat rs1 rs2 hat2 in 1/5
        
             +---------------------------------------------------------------------+
             |        r1          r2        hat         rs1         rs2       hat2 |
             |---------------------------------------------------------------------|
          1. | -7.28e-12           0          1   -4.49e-08           0          1 |
          2. | -2558.585   -2558.585   .0227791   -.9817648   -.9817648   .0227791 |
          3. | -2209.484   -2209.484   .0139493   -.8440055   -.8440055   .0139493 |
          4. |   -1178.5     -1178.5   .0613164   -.4613964   -.4613964   .0613164 |
          5. |  355.5243    355.5243   .0388123    .1375527    .1375527   .0388123 |
             +---------------------------------------------------------------------+
        
        .


        So clearly Stata 15.1 does not use the formula displayed in the Methods and Formulas section, because she divides by 0 according to the formula, and yet gets an outcome equal to 0.



        Comment


        • #5
          I'm running Stata 16.1 and have the same result as post #3.
          Since the dummy has only 1 unique non-zero value, I would expect a perfect prediction for that observation, hence residual and standardized residual equal to 0 (as in post #2), but in my results (and these of post #3) ...
          1. Residual is (almost) zero, but standardized residual is missing when i.D is used
          2. Why is there a difference in standardized residual (almost zero vs missing) depending on whether the i-prefix is used for the dummy or not (I would expect the result to be identical)
          I'm trying to understand what's happening, but ...

          Comment


          • #6
            When I calculate manually the standardised residual according to the formula they display in the methods and formulas, I get all residuals the same as the ones produced by predict, except for the residual in question which is missing, unlike the residual from predict. Here:

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . gen D=0
            
            . replace D=1 in 1
            (1 real change made)
            
            . reg price mpg headroom D
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(3, 70)        =      7.12
                   Model |   148557366         3  49519122.1   Prob > F        =    0.0003
                Residual |   486508030        70  6950114.71   R-squared       =    0.2339
            -------------+----------------------------------   Adj R-squared   =    0.2011
                   Total |   635065396        73  8699525.97   Root MSE        =    2636.3
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -259.8201   58.59083    -4.43   0.000    -376.6758   -142.9644
                headroom |  -355.7493   401.5929    -0.89   0.379    -1156.701    445.2025
                       D |  -2087.359   2660.911    -0.78   0.435    -7394.378     3219.66
                   _cons |   12791.77   2084.729     6.14   0.000     8633.913    16949.64
            ------------------------------------------------------------------------------
            
            . predict r1, res 
            
            . predict rs1, rstandard
            
            . predict hat, hat
            
            . gen myrs1 = r1/(e(rmse)*sqrt(1-hat))
            (1 missing value generated)
            
            . list D r1 rs1 hat myrs1 in 1/5
            
                 +--------------------------------------------------+
                 | D          r1         rs1        hat       myrs1 |
                 |--------------------------------------------------|
              1. | 1   -7.28e-12   -4.49e-08          1           . |
              2. | 0   -2558.585   -.9817648   .0227791   -.9817648 |
              3. | 0   -2209.484   -.8440055   .0139493   -.8440056 |
              4. | 0     -1178.5   -.4613964   .0613164   -.4613964 |
              5. | 0    355.5243    .1375527   .0388123    .1375527 |
                 +--------------------------------------------------+
            
            .

            Comment


            • #7
              Strange ... the 2 regressions (i.e. using D and using i.D) generate identical hat values (1 for the first observation), so in the end the standardized residuals should be equal too, no?

              Code:
              sysuse auto, clear
              gen D=0
              replace D=1 in 1
              reg price mpg headroom D
              predict hat1, hat
              predict rs1, rstandard
              reg price mpg headroom i.D
              predict hat2, hat
              predict rs2, rstandard
              list D hat* rs* in 1/5


              Code:
              . list D hat* rs* in 1/5
              
                   +-------------------------------------------------+
                   | D       hat1       hat2         rs1         rs2 |
                   |-------------------------------------------------|
                1. | 1          1          1   -3.78e-08           . |
                2. | 0   .0227791   .0227791   -.9817648   -.9817648 |
                3. | 0   .0139493   .0139493   -.8440055   -.8440055 |
                4. | 0   .0613164   .0613164   -.4613964   -.4613964 |
                5. | 0   .0388123   .0388123    .1375527    .1375527 |
                   +-------------------------------------------------+

              Comment

              Working...
              X