estat classification after logit

Niels Henrik Bruun

Join Date: Aug 2014
Posts: 555

estat classification after logit

20 Jan 2021, 07:03

There has been some problems getting correct/usable results from -estat classification-, eg:

I guess that in most cases users want similar outcome as -diagt- (use findit diagt to install):

Code:

. webuse lbw
(Hosmer & Lemeshow data)

. diagt low smoke

           |     smoked during
birthweigh |       pregnancy
   t<2500g |      Pos.       Neg. |     Total
-----------+----------------------+----------
  Abnormal |        30         29 |        59
    Normal |        44         86 |       130
-----------+----------------------+----------
     Total |        74        115 |       189
True abnormal diagnosis defined as low = 1


                                                  [95% Confidence Interval]
---------------------------------------------------------------------------
Prevalence                         Pr(A)     31.2%     24.7%      38.3%
---------------------------------------------------------------------------
Sensitivity                      Pr(+|A)     50.8%     37.5%     64.1%
Specificity                      Pr(-|N)     66.2%     57.3%     74.2%
ROC area               (Sens. + Spec.)/2      0.59      0.51      0.66
---------------------------------------------------------------------------
Likelihood ratio (+)     Pr(+|A)/Pr(+|N)      1.50      1.06      2.13
Likelihood ratio (-)     Pr(-|A)/Pr(-|N)      0.74      0.56      0.99
Odds ratio                   LR(+)/LR(-)      2.02      1.08      3.77
Positive predictive value        Pr(A|+)     40.5%     29.3%     52.6%
Negative predictive value        Pr(N|-)     74.8%     65.8%     82.4%
---------------------------------------------------------------------------

The default cutoff (0.5) returns something of little value for most users

Code:

. quietly logit low i.smoke, or

. estat classification

Logistic model for low

              -------- True --------
Classified |         D            ~D  |      Total
-----------+--------------------------+-----------
     +     |         0             0  |          0
     -     |        59           130  |        189
-----------+--------------------------+-----------
   Total   |        59           130  |        189

Classified + if predicted Pr(D) >= .5
True D defined as low != 0
--------------------------------------------------
Sensitivity                     Pr( +| D)    0.00%
Specificity                     Pr( -|~D)  100.00%
Positive predictive value       Pr( D| +)       .%
Negative predictive value       Pr(~D| -)   68.78%
--------------------------------------------------
False + rate for true ~D        Pr( +|~D)    0.00%
False - rate for true D         Pr( -| D)  100.00%
False + rate for classified +   Pr(~D| +)       .%
False - rate for classified -   Pr( D| -)   31.22%
--------------------------------------------------
Correctly classified                        68.78%
--------------------------------------------------

What most users would expect (my opinion) is having the regression constant exponentialized as cutoff:

Code:

. estat classification, cutoff(`=exp(_b[_cons])')

Logistic model for low

              -------- True --------
Classified |         D            ~D  |      Total
-----------+--------------------------+-----------
     +     |        30            44  |         74
     -     |        29            86  |        115
-----------+--------------------------+-----------
   Total   |        59           130  |        189

Classified + if predicted Pr(D) >= .3372093
True D defined as low != 0
--------------------------------------------------
Sensitivity                     Pr( +| D)   50.85%
Specificity                     Pr( -|~D)   66.15%
Positive predictive value       Pr( D| +)   40.54%
Negative predictive value       Pr(~D| -)   74.78%
--------------------------------------------------
False + rate for true ~D        Pr( +|~D)   33.85%
False - rate for true D         Pr( -| D)   49.15%
False + rate for classified +   Pr(~D| +)   59.46%
False - rate for classified -   Pr( D| -)   25.22%
--------------------------------------------------
Correctly classified                        61.38%
--------------------------------------------------

This reproduces the -diagt- output for main values

Last edited by Niels Henrik Bruun; 20 Jan 2021, 07:06.

Kind regards

nhb

Tags: None

Niels Henrik Bruun

Join Date: Aug 2014

Posts: 555
#2

01 Jun 2021, 03:58

Post #1 needs a clarifying comment.

All cutoff values between (I have used round this way to get a ceil value on _b[_cons]):

Code:

`=round(invlogit(_b[_cons]), 0.0001) + 0.0001'

and (I have used round this way to get a floor value on _b[_cons] + _b[smoke]):

Code:

`=round(invlogit(_b[_cons] + _b[smoke]), 0.0001) - 0.0001'

will give the correct table.

As long as we are looking at relatively rare events the cutoff used in #1:

Code:

`=exp(_b[_cons])'

will lie between the two above limits.
However, this value is the estimated odds of low birthweight for non-smokers, ie not a probability.

The lower bound above will always give the correct/wanted table as long as it is below the upper bound above.

Sorry for not being precise
.

Kind regards

nhb
Comment

Announcement

estat classification after logit

Comment