Deviance residuals after logit

Dirk Enzmann

Join Date: Apr 2014
Posts: 524

Deviance residuals after logit

12 Apr 2016, 13:52

Why do the deviance residuals produced by -predict- (using the option deviance) after -logit- differ from the deviance residuals I calculate following the formula in [R]Base Reference (p. 11 or 1273)?

See the following example:

Code:

sysuse auto
logit foreign mpg
predict pr, pr
predict dev, dev

* Calculate the deviance residuals "manually" according to the formula given in the manual:
gen d = -sqrt(2*abs(ln(1-pr))) if foreign==0
replace d = sqrt(2*abs(ln(pr))) if foreign==1

* If everything is correct the sum of the squared deviance residuals should be equal to -2*ll :
gen d2 = d^2
qui sum d2
di "Deviance = " r(sum) " ~= " -2*e(ll)

* However, using the deviance residuals produced by -predict dev, dev- are different and (seemingly?) wrong:
gen dev2 = dev^2
qui sum dev2
di "Deviance = " r(sum) " ~= " -2*e(ll)

The deviance residuals calculated "manually" seem to be OK, the deviance residuals produced by -predict- not. Any ideas?

Last edited by Dirk Enzmann; 12 Apr 2016, 13:57.

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10091

13 Apr 2016, 03:38

In the methods and formulas, p. 11 of the manual, you have

Let j index observations. Define Mj for each observation as the total number of observations sharing j’s covariate pattern. Define Yj as the total number of positive responses among observations sharing j’s covariate pattern.

What is immediately apparent to me is that in your manual calculations, you are imposing that Mj (total number of observations sharing j’s covariate pattern) is equal to one which is not correct. For continuous variables such as mpg, the number of observations sharing j’s covariate pattern will be other observations that have the same mpg value. We can take an example:

Code:

. sysuse auto
(1978 Automobile Data)

. logit foreign mpg

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood = -39.380959  
Iteration 2:   log likelihood = -39.288802  
Iteration 3:   log likelihood =  -39.28864  
Iteration 4:   log likelihood =  -39.28864  

Logistic regression                             Number of obs     =         74
                                                LR chi2(1)        =      11.49
                                                Prob > chi2       =     0.0007
Log likelihood =  -39.28864                     Pseudo R2         =     0.1276

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
       _cons |  -4.378866   1.211295    -3.62   0.000    -6.752961   -2.004771
------------------------------------------------------------------------------

. predict pr, pr

. predict dev, dev

. list mpg foreign pr dev in 1

     +---------------------------------------+
     | mpg    foreign         pr         dev |
     |---------------------------------------|
  1. |  22   Domestic   .2964837   -1.875271 |
     +---------------------------------------+

. count if mpg==22
  5

. scalar d1= -sqrt(5*2*abs(ln(1-.2964837)))

. di d1
-1.8752713

Now, the log-likelihood of the saturated model will be equal to zero ("perfect fit") if the saturated model has as many parameters as number of observations, and consequently you can reduce the deviance statistic to -2*e(ll) as you do above. However, we do not satisfy this condition here and the deviance statistic will therefore be given by 2*(e(lls)-e(ll)) where e(lls) is the maximized log-likelihood of the saturated model which is defined as the largest model that we can fit which leads to as perfect prediction of the outcome of interest as we can achieve.

Bellocco and Algeri in the Stata Italy meeting of 2011 have an excellent presentation on these issues which can be accessed from the link below:

HTML Code:

http://www.stata.com/meeting/italy11/abstracts/italy11_bellocco.pdf

Comment

Dirk Enzmann

Join Date: Apr 2014
Posts: 524

13 Apr 2016, 15:21

Thanks a lot!

If follow your explanation I can calculate the deviance as follows:

Code:

. sysuse auto
(1978 Automobile Data)

. logit foreign mpg

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood = -39.380959  
Iteration 2:   log likelihood = -39.288802  
Iteration 3:   log likelihood =  -39.28864  
Iteration 4:   log likelihood =  -39.28864  

Logistic regression                             Number of obs     =         74
                                                LR chi2(1)        =      11.49
                                                Prob > chi2       =     0.0007
Log likelihood =  -39.28864                     Pseudo R2         =     0.1276

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
       _cons |  -4.378866   1.211295    -3.62   0.000    -6.752961   -2.004771
------------------------------------------------------------------------------

. predict dev, dev

. * Using the deviance residuals produced by -predict dev- I can calculate D as:
. qui tabulate dev, matrow(cdev)

. mata: st_numscalar("D",colsum(st_matrix("cdev"):^2))

. di "Deviance = " D
Deviance = 31.447075

which corresponds to the way deviance is calculated in Hosmer & Lemeshow (1989), p. 138 [Hosmer, D. W. & Lemeshow, S. (1989). Applied Logistic Regression. New York: Wiley.]

Note that glm (with family binomial and logit link) calculates the deviance the same way as I did in my first example in #1:

Code:

. sysuse auto
(1978 Automobile Data)

. glm foreign mpg, family(binomial) link(logit)

Iteration 0:   log likelihood = -39.301251  
Iteration 1:   log likelihood = -39.288644  
Iteration 2:   log likelihood =  -39.28864  

Generalized linear models                         No. of obs      =         74
Optimization     : ML                             Residual df     =         72
                                                  Scale parameter =          1
Deviance         =  78.57728016                   (1/df) Deviance =   1.091351
Pearson          =  73.08219269                   (1/df) Pearson  =    1.01503

Variance function: V(u) = u*(1-u)                 [Bernoulli]
Link function    : g(u) = ln(u/(1-u))             [Logit]

                                                  AIC             =   1.115909
Log likelihood   = -39.28864008                   BIC             =  -231.3154

------------------------------------------------------------------------------
             |                 OIM
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
       _cons |  -4.378865   1.211295    -3.62   0.000     -6.75296    -2.00477
------------------------------------------------------------------------------

. predict dev, dev

. gen dev2 = dev^2

. qui sum dev2

. di "Deviance = " r(sum) " = " -2*e(ll)
Deviance = 78.577279 = 78.57728

Last edited by Dirk Enzmann; 13 Apr 2016, 15:26.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10091

14 Apr 2016, 06:57

Correct! The issue here is how you define the saturated model. Under glm (with family binomial and logit link), when calculating the saturated model, you impose that its maximized log-likelihood is equal to zero. Hosmer & Lemeshow's deviance statistic takes into account the number of covariate patterns, and if these are far less than the total number of observations, then the maximized log-likelihood of the saturated model may be significantly different from zero and the two statistics will differ. Stata has an option to predict the number of covariate patterns which I show below.

Code:

. sysuse auto
(1978 Automobile Data)

. logit foreign mpg

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood = -39.380959  
Iteration 2:   log likelihood = -39.288802  
Iteration 3:   log likelihood =  -39.28864  
Iteration 4:   log likelihood =  -39.28864  

Logistic regression                             Number of obs     =         74
                                                LR chi2(1)        =      11.49
                                                Prob > chi2       =     0.0007
Log likelihood =  -39.28864                     Pseudo R2         =     0.1276

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .1597621   .0525876     3.04   0.002     .0566922     .262832
       _cons |  -4.378866   1.211295    -3.62   0.000    -6.752961   -2.004771
------------------------------------------------------------------------------

. predict d, de

*\ PREDICT THE NUMBER OF COVARIATE PATTERNS IN THE DATA
. predict n, n

. generate d2 = (d)^2

. sort n

. quietly by n: generate j = _n

. quietly summarize d2 if j == 1

. return list

*\ r(N) GIVES YOU THE TOTAL NUMBER OF COVARIATE PATTERNS IN THE DATA and r(sum) IS THE DEVIANCE STATISTIC (CODE IS DERIVED FROM BELLOCCO and ALGERI'S PRESENTATION: FIND URL IN # 1 )
scalars:
                  r(N) =  21
              r(sum_w) =  21
               r(mean) =  1.49747977050997
                r(Var) =  2.843557137082056
                 r(sd) =  1.686285010631968
                r(min) =  .0850921124219894
                r(max) =  6.636515140533447
                r(sum) =  31.44707518070936

The two Statistics will approach each other when the number of covariate patterns is close to the number of observations. I illustrate this with a randomly generated continuous variable that has unique values for each observation.

Code:

. clear

. sysuse auto
(1978 Automobile Data)

*\ GENERATE VARIABLE WITH UNIQUE VALUES FOR EACH OBSERVATION
. gen random = rnormal(7.5, 2.5)

. logit foreign random

Iteration 0:   log likelihood =  -45.03321  
Iteration 1:   log likelihood = -44.970405  
Iteration 2:   log likelihood = -44.970384  
Iteration 3:   log likelihood = -44.970384  

Logistic regression                             Number of obs     =         74
                                                LR chi2(1)        =       0.13
                                                Prob > chi2       =     0.7230
Log likelihood = -44.970384                     Pseudo R2         =     0.0014

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      random |   .0341981   .0964951     0.35   0.723    -.1549289     .223325
       _cons |  -1.128897    .804122    -1.40   0.160    -2.704947    .4471535
------------------------------------------------------------------------------

. predict d, de

. predict n, n

. generate d2 = (d)^2

. sort n

. quietly by n: generate j = _n

. quietly summarize d2 if j == 1

. return list

scalars:
                  r(N) =  74
              r(sum_w) =  74
               r(mean) =  1.215415772554037
                r(Var) =  .6313875009624242
                 r(sd) =  .7945989560541998
                r(min) =  .6165417432785034
                r(max) =  2.719594717025757
                r(sum) =  89.94076716899872

. glm foreign random, family(binomial n) link(logit)

Iteration 0:   log likelihood = -52.668627  
Iteration 1:   log likelihood = -49.910897  
Iteration 2:   log likelihood = -49.885571  
Iteration 3:   log likelihood = -49.885551  
Iteration 4:   log likelihood = -49.885551  

Generalized linear models                         No. of obs      =         74
Optimization     : ML                             Residual df     =         72
                                                  Scale parameter =          1
Deviance         =  92.72457792                   (1/df) Deviance =   1.287841
Pearson          =  76.63132543                   (1/df) Pearson  =   1.064324

Variance function: V(u) = u*(1-u/n)               [Binomial]
Link function    : g(u) = ln(u/(n-u))             [Logit]

                                                  AIC             =   1.402312
Log likelihood   = -49.88555133                   BIC             =  -217.1681

------------------------------------------------------------------------------
             |                 OIM
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      random |  -.2617495   .0975049    -2.68   0.007    -.4528556   -.0706434
       _cons |  -2.569458   .8078877    -3.18   0.001    -4.152889   -.9860274
------------------------------------------------------------------------------

Comment

Richard Williams

Join Date: Apr 2014

Posts: 4955
#5

14 Apr 2016, 07:10

I think we had this discussion several years ago. If you really care, you can work your way through this thread:

http://www.stata.com/statalist/archi.../msg00205.html

The main conclusion was that logit did this adjustment for covariate patterns and glm does not.

See especially

http://www.stata.com/statalist/archi.../msg00218.html

http://www.stata.com/statalist/archi.../msg00220.html

http://www.stata.com/statalist/archi.../msg00231.html

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10091
#6

14 Apr 2016, 07:46

Thanks Richard for the references! A bit before I would be interested in this kind of thing...
Comment

Dirk Enzmann

Join Date: Apr 2014
Posts: 524

14 Apr 2016, 12:26

Thanks to Andrew for the explanation and demonstration and to Richard for the link to the very interesting and helpful 2004 debate!

If you want to obtain the deviance residuals based on the unadjusted individual observations after logit (or logistic) - thus, without via the use of glm - you can do so with a two line code:

Code:

. sysuse auto
(1978 Automobile Data)

. qui logit foreign mpg

. * Calculate the deviance residuals based on individual observations unadjusted for covariance patterns:
. predict double pr, pr
. gen double d_ind = sign(`e(depvar)'-pr)*sqrt(abs(2*(`e(depvar)'*ln(pr)+(1-`e(depvar)')*ln(1-pr))))

. * Demonstration that they are identical (up to 5 decimal digits) to the deviance residuals using -glm-:
. qui glm foreign mpg, family(binomial) link(logit)
. predict double d_glm, dev

. sum d_ind d_glm

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       d_ind |         74   -.1150062    1.031016  -1.644577   2.122827
       d_glm |         74   -.1150062    1.031016  -1.644576   2.122827

. assert round(d_ind,.00001) == round(d_glm,.00001)  // identical up to 5 decimal digits

. assert round(d_ind,.000001) == round(d_glm,.000001)
3 contradictions in 74 observations
assertion is false

Astonishingly, the devicance residuals of glm seem to be less precise (if you trust logit and predict after logit).

Last edited by Dirk Enzmann; 14 Apr 2016, 12:35.

Announcement

Deviance residuals after logit

Comment

Comment

Comment

Comment

Comment

Comment