Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deviance residuals after logit

    Why do the deviance residuals produced by -predict- (using the option deviance) after -logit- differ from the deviance residuals I calculate following the formula in [R]Base Reference (p. 11 or 1273)?

    See the following example:
    Code:
    sysuse auto
    logit foreign mpg
    predict pr, pr
    predict dev, dev
    
    * Calculate the deviance residuals "manually" according to the formula given in the manual:
    gen d = -sqrt(2*abs(ln(1-pr))) if foreign==0
    replace d = sqrt(2*abs(ln(pr))) if foreign==1
    
    * If everything is correct the sum of the squared deviance residuals should be equal to -2*ll :
    gen d2 = d^2
    qui sum d2
    di "Deviance = " r(sum) " ~= " -2*e(ll)
    
    * However, using the deviance residuals produced by -predict dev, dev- are different and (seemingly?) wrong:
    gen dev2 = dev^2
    qui sum dev2
    di "Deviance = " r(sum) " ~= " -2*e(ll)
    The deviance residuals calculated "manually" seem to be OK, the deviance residuals produced by -predict- not. Any ideas?
    Last edited by Dirk Enzmann; 12 Apr 2016, 13:57.

  • #2
    In the methods and formulas, p. 11 of the manual, you have

    Let j index observations. Define Mj for each observation as the total number of observations sharing j’s covariate pattern. Define Yj as the total number of positive responses among observations sharing j’s covariate pattern.

    What is immediately apparent to me is that in your manual calculations, you are imposing that Mj (total number of observations sharing j’s covariate pattern) is equal to one which is not correct. For continuous variables such as mpg, the number of observations sharing j’s covariate pattern will be other observations that have the same mpg value. We can take an example:

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . logit foreign mpg
    
    Iteration 0:   log likelihood =  -45.03321  
    Iteration 1:   log likelihood = -39.380959  
    Iteration 2:   log likelihood = -39.288802  
    Iteration 3:   log likelihood =  -39.28864  
    Iteration 4:   log likelihood =  -39.28864  
    
    Logistic regression                             Number of obs     =         74
                                                    LR chi2(1)        =      11.49
                                                    Prob > chi2       =     0.0007
    Log likelihood =  -39.28864                     Pseudo R2         =     0.1276
    
    ------------------------------------------------------------------------------
         foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
           _cons |  -4.378866   1.211295    -3.62   0.000    -6.752961   -2.004771
    ------------------------------------------------------------------------------
    
    . predict pr, pr
    
    . predict dev, dev
    
    . list mpg foreign pr dev in 1
    
         +---------------------------------------+
         | mpg    foreign         pr         dev |
         |---------------------------------------|
      1. |  22   Domestic   .2964837   -1.875271 |
         +---------------------------------------+
    
    . count if mpg==22
      5
    
    . scalar d1= -sqrt(5*2*abs(ln(1-.2964837)))
    
    . di d1
    -1.8752713

    Now, the log-likelihood of the saturated model will be equal to zero ("perfect fit") if the saturated model has as many parameters as number of observations, and consequently you can reduce the deviance statistic to -2*e(ll) as you do above. However, we do not satisfy this condition here and the deviance statistic will therefore be given by 2*(e(lls)-e(ll)) where e(lls) is the maximized log-likelihood of the saturated model which is defined as the largest model that we can fit which leads to as perfect prediction of the outcome of interest as we can achieve.

    Bellocco and Algeri in the Stata Italy meeting of 2011 have an excellent presentation on these issues which can be accessed from the link below:

    HTML Code:
    http://www.stata.com/meeting/italy11/abstracts/italy11_bellocco.pdf

    Comment


    • #3
      Thanks a lot!

      If follow your explanation I can calculate the deviance as follows:
      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . logit foreign mpg
      
      Iteration 0:   log likelihood =  -45.03321  
      Iteration 1:   log likelihood = -39.380959  
      Iteration 2:   log likelihood = -39.288802  
      Iteration 3:   log likelihood =  -39.28864  
      Iteration 4:   log likelihood =  -39.28864  
      
      Logistic regression                             Number of obs     =         74
                                                      LR chi2(1)        =      11.49
                                                      Prob > chi2       =     0.0007
      Log likelihood =  -39.28864                     Pseudo R2         =     0.1276
      
      ------------------------------------------------------------------------------
           foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
             _cons |  -4.378866   1.211295    -3.62   0.000    -6.752961   -2.004771
      ------------------------------------------------------------------------------
      
      . predict dev, dev
      
      . * Using the deviance residuals produced by -predict dev- I can calculate D as:
      . qui tabulate dev, matrow(cdev)
      
      . mata: st_numscalar("D",colsum(st_matrix("cdev"):^2))
      
      . di "Deviance = " D
      Deviance = 31.447075
      which corresponds to the way deviance is calculated in Hosmer & Lemeshow (1989), p. 138 [Hosmer, D. W. & Lemeshow, S. (1989). Applied Logistic Regression. New York: Wiley.]

      Note that glm (with family binomial and logit link) calculates the deviance the same way as I did in my first example in #1:
      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . glm foreign mpg, family(binomial) link(logit)
      
      Iteration 0:   log likelihood = -39.301251  
      Iteration 1:   log likelihood = -39.288644  
      Iteration 2:   log likelihood =  -39.28864  
      
      Generalized linear models                         No. of obs      =         74
      Optimization     : ML                             Residual df     =         72
                                                        Scale parameter =          1
      Deviance         =  78.57728016                   (1/df) Deviance =   1.091351
      Pearson          =  73.08219269                   (1/df) Pearson  =    1.01503
      
      Variance function: V(u) = u*(1-u)                 [Bernoulli]
      Link function    : g(u) = ln(u/(1-u))             [Logit]
      
                                                        AIC             =   1.115909
      Log likelihood   = -39.28864008                   BIC             =  -231.3154
      
      ------------------------------------------------------------------------------
                   |                 OIM
           foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
             _cons |  -4.378865   1.211295    -3.62   0.000     -6.75296    -2.00477
      ------------------------------------------------------------------------------
      
      . predict dev, dev
      
      . gen dev2 = dev^2
      
      . qui sum dev2
      
      . di "Deviance = " r(sum) " = " -2*e(ll)
      Deviance = 78.577279 = 78.57728
      Last edited by Dirk Enzmann; 13 Apr 2016, 15:26.

      Comment


      • #4
        Correct! The issue here is how you define the saturated model. Under glm (with family binomial and logit link), when calculating the saturated model, you impose that its maximized log-likelihood is equal to zero. Hosmer & Lemeshow's deviance statistic takes into account the number of covariate patterns, and if these are far less than the total number of observations, then the maximized log-likelihood of the saturated model may be significantly different from zero and the two statistics will differ. Stata has an option to predict the number of covariate patterns which I show below.

        Code:
        . sysuse auto
        (1978 Automobile Data)
        
        . logit foreign mpg
        
        Iteration 0:   log likelihood =  -45.03321  
        Iteration 1:   log likelihood = -39.380959  
        Iteration 2:   log likelihood = -39.288802  
        Iteration 3:   log likelihood =  -39.28864  
        Iteration 4:   log likelihood =  -39.28864  
        
        Logistic regression                             Number of obs     =         74
                                                        LR chi2(1)        =      11.49
                                                        Prob > chi2       =     0.0007
        Log likelihood =  -39.28864                     Pseudo R2         =     0.1276
        
        ------------------------------------------------------------------------------
             foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
               _cons |  -4.378866   1.211295    -3.62   0.000    -6.752961   -2.004771
        ------------------------------------------------------------------------------
        
        . predict d, de
        
        *\ PREDICT THE NUMBER OF COVARIATE PATTERNS IN THE DATA
        . predict n, n
        
        . generate d2 = (d)^2
        
        . sort n
        
        . quietly by n: generate j = _n
        
        . quietly summarize d2 if j == 1
        
        . return list
        
        *\ r(N) GIVES YOU THE TOTAL NUMBER OF COVARIATE PATTERNS IN THE DATA and r(sum) IS THE DEVIANCE STATISTIC (CODE IS DERIVED FROM BELLOCCO and ALGERI'S PRESENTATION: FIND URL IN # 1 )
        scalars:
                          r(N) =  21
                      r(sum_w) =  21
                       r(mean) =  1.49747977050997
                        r(Var) =  2.843557137082056
                         r(sd) =  1.686285010631968
                        r(min) =  .0850921124219894
                        r(max) =  6.636515140533447
                        r(sum) =  31.44707518070936
        The two Statistics will approach each other when the number of covariate patterns is close to the number of observations. I illustrate this with a randomly generated continuous variable that has unique values for each observation.

        Code:
        . clear
        
        . sysuse auto
        (1978 Automobile Data)
        
        *\ GENERATE VARIABLE WITH UNIQUE VALUES FOR EACH OBSERVATION
        . gen random = rnormal(7.5, 2.5)
        
        . logit foreign random
        
        Iteration 0:   log likelihood =  -45.03321  
        Iteration 1:   log likelihood = -44.970405  
        Iteration 2:   log likelihood = -44.970384  
        Iteration 3:   log likelihood = -44.970384  
        
        Logistic regression                             Number of obs     =         74
                                                        LR chi2(1)        =       0.13
                                                        Prob > chi2       =     0.7230
        Log likelihood = -44.970384                     Pseudo R2         =     0.0014
        
        ------------------------------------------------------------------------------
             foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              random |   .0341981   .0964951     0.35   0.723    -.1549289     .223325
               _cons |  -1.128897    .804122    -1.40   0.160    -2.704947    .4471535
        ------------------------------------------------------------------------------
        
        . predict d, de
        
        . predict n, n
        
        . generate d2 = (d)^2
        
        . sort n
        
        . quietly by n: generate j = _n
        
        . quietly summarize d2 if j == 1
        
        . return list
        
        scalars:
                          r(N) =  74
                      r(sum_w) =  74
                       r(mean) =  1.215415772554037
                        r(Var) =  .6313875009624242
                         r(sd) =  .7945989560541998
                        r(min) =  .6165417432785034
                        r(max) =  2.719594717025757
                        r(sum) =  89.94076716899872
        
        . glm foreign random, family(binomial n) link(logit)
        
        Iteration 0:   log likelihood = -52.668627  
        Iteration 1:   log likelihood = -49.910897  
        Iteration 2:   log likelihood = -49.885571  
        Iteration 3:   log likelihood = -49.885551  
        Iteration 4:   log likelihood = -49.885551  
        
        Generalized linear models                         No. of obs      =         74
        Optimization     : ML                             Residual df     =         72
                                                          Scale parameter =          1
        Deviance         =  92.72457792                   (1/df) Deviance =   1.287841
        Pearson          =  76.63132543                   (1/df) Pearson  =   1.064324
        
        Variance function: V(u) = u*(1-u/n)               [Binomial]
        Link function    : g(u) = ln(u/(n-u))             [Logit]
        
                                                          AIC             =   1.402312
        Log likelihood   = -49.88555133                   BIC             =  -217.1681
        
        ------------------------------------------------------------------------------
                     |                 OIM
             foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              random |  -.2617495   .0975049    -2.68   0.007    -.4528556   -.0706434
               _cons |  -2.569458   .8078877    -3.18   0.001    -4.152889   -.9860274
        ------------------------------------------------------------------------------







        Comment


        • #5
          I think we had this discussion several years ago. If you really care, you can work your way through this thread:

          http://www.stata.com/statalist/archi.../msg00205.html

          The main conclusion was that logit did this adjustment for covariate patterns and glm does not.

          See especially

          http://www.stata.com/statalist/archi.../msg00218.html

          http://www.stata.com/statalist/archi.../msg00220.html

          http://www.stata.com/statalist/archi.../msg00231.html

          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Thanks Richard for the references! A bit before I would be interested in this kind of thing...

            Comment


            • #7
              Thanks to Andrew for the explanation and demonstration and to Richard for the link to the very interesting and helpful 2004 debate!

              If you want to obtain the deviance residuals based on the unadjusted individual observations after logit (or logistic) - thus, without via the use of glm - you can do so with a two line code:
              Code:
              . sysuse auto
              (1978 Automobile Data)
              
              . qui logit foreign mpg
              
              . * Calculate the deviance residuals based on individual observations unadjusted for covariance patterns:
              . predict double pr, pr
              . gen double d_ind = sign(`e(depvar)'-pr)*sqrt(abs(2*(`e(depvar)'*ln(pr)+(1-`e(depvar)')*ln(1-pr))))
              
              . * Demonstration that they are identical (up to 5 decimal digits) to the deviance residuals using -glm-:
              . qui glm foreign mpg, family(binomial) link(logit)
              . predict double d_glm, dev
              
              . sum d_ind d_glm
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                     d_ind |         74   -.1150062    1.031016  -1.644577   2.122827
                     d_glm |         74   -.1150062    1.031016  -1.644576   2.122827
              
              . assert round(d_ind,.00001) == round(d_glm,.00001)  // identical up to 5 decimal digits
              
              . assert round(d_ind,.000001) == round(d_glm,.000001)
              3 contradictions in 74 observations
              assertion is false
              Astonishingly, the devicance residuals of glm seem to be less precise (if you trust logit and predict after logit).
              Last edited by Dirk Enzmann; 14 Apr 2016, 12:35.

              Comment

              Working...
              X