Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • estat classification after logit

    There has been some problems getting correct/usable results from -estat classification-, eg:I guess that in most cases users want similar outcome as -diagt- (use findit diagt to install):
    Code:
    . webuse lbw
    (Hosmer & Lemeshow data)
    
    . diagt low smoke
    
               |     smoked during
    birthweigh |       pregnancy
       t<2500g |      Pos.       Neg. |     Total
    -----------+----------------------+----------
      Abnormal |        30         29 |        59
        Normal |        44         86 |       130
    -----------+----------------------+----------
         Total |        74        115 |       189
    True abnormal diagnosis defined as low = 1
    
    
                                                      [95% Confidence Interval]
    ---------------------------------------------------------------------------
    Prevalence                         Pr(A)     31.2%     24.7%      38.3%
    ---------------------------------------------------------------------------
    Sensitivity                      Pr(+|A)     50.8%     37.5%     64.1%
    Specificity                      Pr(-|N)     66.2%     57.3%     74.2%
    ROC area               (Sens. + Spec.)/2      0.59      0.51      0.66
    ---------------------------------------------------------------------------
    Likelihood ratio (+)     Pr(+|A)/Pr(+|N)      1.50      1.06      2.13
    Likelihood ratio (-)     Pr(-|A)/Pr(-|N)      0.74      0.56      0.99
    Odds ratio                   LR(+)/LR(-)      2.02      1.08      3.77
    Positive predictive value        Pr(A|+)     40.5%     29.3%     52.6%
    Negative predictive value        Pr(N|-)     74.8%     65.8%     82.4%
    ---------------------------------------------------------------------------
    The default cutoff (0.5) returns something of little value for most users
    Code:
    . quietly logit low i.smoke, or
    
    . estat classification
    
    Logistic model for low
    
                  -------- True --------
    Classified |         D            ~D  |      Total
    -----------+--------------------------+-----------
         +     |         0             0  |          0
         -     |        59           130  |        189
    -----------+--------------------------+-----------
       Total   |        59           130  |        189
    
    Classified + if predicted Pr(D) >= .5
    True D defined as low != 0
    --------------------------------------------------
    Sensitivity                     Pr( +| D)    0.00%
    Specificity                     Pr( -|~D)  100.00%
    Positive predictive value       Pr( D| +)       .%
    Negative predictive value       Pr(~D| -)   68.78%
    --------------------------------------------------
    False + rate for true ~D        Pr( +|~D)    0.00%
    False - rate for true D         Pr( -| D)  100.00%
    False + rate for classified +   Pr(~D| +)       .%
    False - rate for classified -   Pr( D| -)   31.22%
    --------------------------------------------------
    Correctly classified                        68.78%
    --------------------------------------------------
    What most users would expect (my opinion) is having the regression constant exponentialized as cutoff:
    Code:
    . estat classification, cutoff(`=exp(_b[_cons])')
    
    Logistic model for low
    
                  -------- True --------
    Classified |         D            ~D  |      Total
    -----------+--------------------------+-----------
         +     |        30            44  |         74
         -     |        29            86  |        115
    -----------+--------------------------+-----------
       Total   |        59           130  |        189
    
    Classified + if predicted Pr(D) >= .3372093
    True D defined as low != 0
    --------------------------------------------------
    Sensitivity                     Pr( +| D)   50.85%
    Specificity                     Pr( -|~D)   66.15%
    Positive predictive value       Pr( D| +)   40.54%
    Negative predictive value       Pr(~D| -)   74.78%
    --------------------------------------------------
    False + rate for true ~D        Pr( +|~D)   33.85%
    False - rate for true D         Pr( -| D)   49.15%
    False + rate for classified +   Pr(~D| +)   59.46%
    False - rate for classified -   Pr( D| -)   25.22%
    --------------------------------------------------
    Correctly classified                        61.38%
    --------------------------------------------------
    This reproduces the -diagt- output for main values
    Last edited by Niels Henrik Bruun; 20 Jan 2021, 07:06.
    Kind regards

    nhb

  • #2
    Post #1 needs a clarifying comment.

    All cutoff values between (I have used round this way to get a ceil value on _b[_cons]):
    Code:
    `=round(invlogit(_b[_cons]), 0.0001) + 0.0001'
    and (I have used round this way to get a floor value on _b[_cons] + _b[smoke]):
    Code:
    `=round(invlogit(_b[_cons] + _b[smoke]), 0.0001) - 0.0001'
    will give the correct table.

    As long as we are looking at relatively rare events the cutoff used in #1:
    Code:
    `=exp(_b[_cons])'
    will lie between the two above limits.
    However, this value is the estimated odds of low birthweight for non-smokers, ie not a probability.

    The lower bound above will always give the correct/wanted table as long as it is below the upper bound above.

    Sorry for not being precise
    .
    Kind regards

    nhb

    Comment

    Working...
    X