A new command, -classtabi-, now available for download from SSC

Ariel Linden

Join Date: Apr 2014

Posts: 170
#1

A new command, -classtabi-, now available for download from SSC

01 Jan 2016, 08:11

Thanks to Kit Baum, a new command called classtabi is now available on SSC.

classtabi is an immediate command that reports various summary statistics, including a 2 x 2 table for discrete classification data. classtabi is helpful in cases where only summarized data are available. For example, data-mining software generally produce a 2 X 2 classification table (referred to as confusion matrix) as part of the output. Those values can then be entered into classtabi to produce the additional classification statistics (such as ROC area, Effect Strength for Sensitivity, etc.).

As always, I welcome users to share comments and suggestions.

Ariel
Tags: None

Adam Ross Nelson

Join Date: Aug 2015
Posts: 37

02 Oct 2017, 13:08

Hello, I am a fan of this command. I was recently attempting to use it with the following approach:

Code:

. tab prUnd isUnd, matcell(confmat1)

 Predicted |
     Under |  Actual Under 1.9999
    2.0001 |         0          1 |     Total
-----------+----------------------+----------
         0 |     5,738         70 |     5,808 
         1 |        28         42 |        70 
-----------+----------------------+----------
     Total |     5,766        112 |     5,878 


. classtabi confmat1[1,1] confmat1[1,2] confmat1[2,1] confmat1[2,2]
invalid '2' 
r(198);

The goal was to avoid using local or global. The following (below) worked. Any suggestion on how to make this more efficient?

Code:

. tab prUnd isUnd, row matcell(confmat1)

+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+

 Predicted |
     Under |  Actual Under
           |         0          1 |     Total
-----------+----------------------+----------
         0 |     5,738         70 |     5,808 
           |     98.79       1.21 |    100.00 
-----------+----------------------+----------
         1 |        28         42 |        70 
           |     40.00      60.00 |    100.00 
-----------+----------------------+----------
     Total |     5,766        112 |     5,878 
           |     98.09       1.91 |    100.00 


. local trueneg = confmat1[1,1]

. local falseneg = confmat1[1,2]

. local falspos = confmat1[2,1]

. local truepos = confmat1[2,2]

. classtabi `trueneg' `falseneg' `falspos' `truepos', rowlabel(Predicted) collabel(Actual)

           |        Actual
 Predicted |         0          1 |     Total
-----------+----------------------+----------
         0 |     5,738         70 |     5,808 
         1 |        28         42 |        70 
-----------+----------------------+----------
     Total |     5,766        112 |     5,878 



-------------------------------------------------
Sensitivity                     D/(C+D)   60.00%      
Specificity                     A/(A+B)   98.79%      
Positive predictive value       D/(B+D)   37.50%      
Negative predictive value       A/(A+C)   99.51%      
-------------------------------------------------
False positive rate             B/(A+B)    1.21%      
False negative rate             C/(C+D)   40.00%      
-------------------------------------------------
Correctly classified      A+C/(A+B+C+D)   98.33%      
-------------------------------------------------
Effect strength for sensitivity           58.79%      
-------------------------------------------------
ROC area                                  0.7940      
-------------------------------------------------

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

02 Oct 2017, 16:10

Adam: You don't give a data example (FAQ Advice #12) but as it happens we can reproduce the relevant part of your data.

There's an easier way to use your matrix elements; note that a matrix name is, whatever you do here so far as I can see, one you would have to retype four times, so a short name helps.

For the syntax used here see

Code:

help macro

Code:

 
 . clear   . tabi 5738 70 \ 28 42              |          col        row |         1          2 |     Total -----------+----------------------+----------          1 |     5,738         70 |     5,808           2 |        28         42 |        70  -----------+----------------------+----------      Total |     5,766        112 |     5,878              Fisher's exact =                 0.000    1-sided Fisher's exact =                 0.000  . rename (row col) (prUnd isUnd)  . expand pop  (5,874 observations created)  . tab prUnd isUnd, matcell(t1)             |         isUnd      prUnd |         1          2 |     Total -----------+----------------------+----------          1 |     5,738         70 |     5,808           2 |        28         42 |        70  -----------+----------------------+----------      Total |     5,766        112 |     5,878    .  . classtabi `=t1[1,1]' `=t1[1,2]' `=t1[2,1]' `=t1[2,2]'              |          col        row |         0          1 |     Total -----------+----------------------+----------          0 |     5,738         70 |     5,808           1 |        28         42 |        70  -----------+----------------------+----------      Total |     5,766        112 |     5,878     ------------------------------------------------- Sensitivity                     D/(C+D)   60.00%       Specificity                     A/(A+B)   98.79%       Positive predictive value       D/(B+D)   37.50%       Negative predictive value       A/(A+C)   99.51%       ------------------------------------------------- False positive rate             B/(A+B)    1.21%       False negative rate             C/(C+D)   40.00%       ------------------------------------------------- Correctly classified      A+C/(A+B+C+D)   98.33%       ------------------------------------------------- Effect strength for sensitivity           58.79%       ------------------------------------------------- ROC area                                  0.7940       -------------------------------------------------

Comment

Anders Alexandersson

Join Date: Apr 2014
Posts: 203

03 Oct 2017, 06:55

Ariel Linden I think I have discovered a bug in classtabi because when I type 0 the output is instead 1; why is that? I am using Stata/MP 15.0 for Windows (64-bit x86-64) on Windows 7.

Attempt to reproduce example from R package RecordLinkage using the classification table on page 64 in the published article at https://journal.r-project.org/archiv...riyar+Borg.pdf :

Code:

. classtabi 1172 0 3 46

           |          col
       row |         0          1 |     Total
-----------+----------------------+----------
         0 |     1,172          1 |     1,173
         1 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         47 |     1,222



-------------------------------------------------
Sensitivity                     D/(C+D)   93.88%      
Specificity                     A/(A+B)  100.00%      
Positive predictive value       D/(B+D)  100.00%      
Negative predictive value       A/(A+C)   99.74%      
-------------------------------------------------
False positive rate             B/(A+B)    0.00%      
False negative rate             C/(C+D)    6.12%      
-------------------------------------------------
Correctly classified      A+C/(A+B+C+D)   99.75%      
-------------------------------------------------
Effect strength for sensitivity           93.88%      
-------------------------------------------------
ROC area                                  0.9690      
-------------------------------------------------

Edit: The problem above is shown for #b, false positive. The example on page 65 classtabi 1159 13 0 49 with 0 for #c, false negative, also wrongly displays 1.

Last edited by Anders Alexandersson; 03 Oct 2017, 07:06.

Comment

Bruce Weaver

Join Date: May 2014
Posts: 1132

03 Oct 2017, 07:39

Following up on Anders in #4:

Code:

Code:

clear
input a b c d
1172 0 3 46
1172 1 3 46
end

generate N    = a+b+c+d
generate sens = d/(c+d)
generate spec = a/(a+b)
generate ppv  = d/(b+d)
generate npv  = a/(a+c)
generate fpr  = b/(a+b)
generate fnr  = c/(c+d)
generate acc  = (a+d)/N
format sens-acc %8.4f

classtabi 1172 0 3 46
list a-N
list sens-acc

Output:

Code:

. classtabi 1172 0 3 46

           |          col
       row |         0          1 |     Total
-----------+----------------------+----------
         0 |     1,172          1 |     1,173
         1 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         47 |     1,222



-------------------------------------------------
Sensitivity                     D/(C+D)   93.88%      
Specificity                     A/(A+B)  100.00%      
Positive predictive value       D/(B+D)  100.00%      
Negative predictive value       A/(A+C)   99.74%      
-------------------------------------------------
False positive rate             B/(A+B)    0.00%      
False negative rate             C/(C+D)    6.12%      
-------------------------------------------------
Correctly classified      A+C/(A+B+C+D)   99.75%      
-------------------------------------------------
Effect strength for sensitivity           93.88%      
-------------------------------------------------
ROC area                                  0.9690      
-------------------------------------------------

. list a-N

     +--------------------------+
     |    a   b   c    d      N |
     |--------------------------|
  1. | 1172   0   3   46   1221 |
  2. | 1172   1   3   46   1222 |
     +--------------------------+

. list sens-acc

     +--------------------------------------------------------------+
     |   sens     spec      ppv      npv      fpr      fnr      acc |
     |--------------------------------------------------------------|
  1. | 0.9388   1.0000   1.0000   0.9974   0.0000   0.0612   0.9975 |
  2. | 0.9388   0.9991   0.9787   0.9974   0.0009   0.0612   0.9967 |
     +--------------------------------------------------------------+

The results from classtabi match those in the first row of output, where b=0. So perhaps it's just a problem with displaying the cell counts properly?

Note that the output from classtabi shows classification accuracy = A+C/(A+B+C+D). It should be (A+D)/(A+B+C+D).

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

03 Oct 2017, 07:46

Sorry, just noticed that #3 is messed up. Let's try again:

Code:

. clear  

. tabi 5738 70 \ 28 42      

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     5,738         70 |     5,808
         2 |        28         42 |        70
-----------+----------------------+----------
     Total |     5,766        112 |     5,878

           Fisher's exact =                 0.000
   1-sided Fisher's exact =                 0.000

. rename (row col) (prUnd isUnd)

. tab prUnd isUnd, matcell(t1)  

           |         isUnd
     prUnd |         1          2 |     Total
-----------+----------------------+----------
         1 |         1          1 |         2
         2 |         1          1 |         2
-----------+----------------------+----------
     Total |         2          2 |         4


. classtabi `=t1[1,1]' `=t1[1,2]' `=t1[2,1]' `=t1[2,2]', rowlabel(Predicted) collabel(Actual)

           |        Actual
 Predicted |         0          1 |     Total
-----------+----------------------+----------
         0 |         1          1 |         2
         1 |         1          1 |         2
-----------+----------------------+----------
     Total |         2          2 |         4



-------------------------------------------------
Sensitivity                     D/(C+D)   50.00%      
Specificity                     A/(A+B)   50.00%      
Positive predictive value       D/(B+D)   50.00%      
Negative predictive value       A/(A+C)   50.00%      
-------------------------------------------------
False positive rate             B/(A+B)   50.00%      
False negative rate             C/(C+D)   50.00%      
-------------------------------------------------
Correctly classified      A+C/(A+B+C+D)   50.00%      
-------------------------------------------------
Effect strength for sensitivity            0.00%      
-------------------------------------------------
ROC area                                  0.5000      
-------------------------------------------------

Comment

daniel klein

Join Date: Mar 2014
Posts: 3849

03 Oct 2017, 09:50

Originally posted by Bruce Weaver View Post

Following up on Anders in #4:

[...]

So perhaps it's just a problem with displaying the cell counts properly?

No, it is a bug. When classtabi expands the dataset from tabi internally, Ariel needs to drop the observations with zero weight. Here is the relevant part of what happens inside classtabi

Code:

. // create the dataset from contingency table
. tabi 1172 0 \ 3 46 , replace

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     1,172          0 |     1,172
         2 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         46 |     1,221

           Fisher's exact =                 0.000
   1-sided Fisher's exact =                 0.000

.
. // how does it look like
. list

     +------------------+
     | row   col    pop |
     |------------------|
  1. |   1     1   1172 |
  2. |   1     2      0 |
  3. |   2     1      3 |
  4. |   2     2     46 |
     +------------------+

.
. // now expand and tabulate again (this is what -classtabi- does)
. expand pop
(1 zero count ignored; observation not deleted)
(1218 observations created)

. tabulate row col

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     1,172          1 |     1,173
         2 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         47 |     1,222


.
. // this is what should be done
. drop if !pop
(1 observation deleted)

. tabulate row col

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     1,172          0 |     1,172
         2 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         46 |     1,221


.
. // alternatively use frequency weights
. tabi 1172 0 \ 3 46 , replace

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     1,172          0 |     1,172
         2 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         46 |     1,221

           Fisher's exact =                 0.000
   1-sided Fisher's exact =                 0.000

. tabulate row col [ fweight = pop ]

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     1,172          0 |     1,172
         2 |         3         46 |        49
-----------+----------------------+----------
     Total |     1,175         46 |     1,221

Note that Nick's code above should have included frequency weights in the middle part (or expanded the data in memory).

Best
Daniel

Last edited by daniel klein; 03 Oct 2017, 09:52.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

03 Oct 2017, 11:03

Daniel's right. Sorry about that.

Code:

. clear  

. tabi 5738 70 \ 28 42      

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |     5,738         70 |     5,808 
         2 |        28         42 |        70 
-----------+----------------------+----------
     Total |     5,766        112 |     5,878 

           Fisher's exact =                 0.000
   1-sided Fisher's exact =                 0.000

. rename (row col) (prUnd isUnd)

. expand pop 
(5,874 observations created)

. tab prUnd isUnd, matcell(t1)  

           |         isUnd
     prUnd |         1          2 |     Total
-----------+----------------------+----------
         1 |     5,738         70 |     5,808 
         2 |        28         42 |        70 
-----------+----------------------+----------
     Total |     5,766        112 |     5,878 


. classtabi `=t1[1,1]' `=t1[1,2]' `=t1[2,1]' `=t1[2,2]', rowlabel(Predicted) collabel(Actual)

           |        Actual
 Predicted |         0          1 |     Total
-----------+----------------------+----------
         0 |     5,738         70 |     5,808 
         1 |        28         42 |        70 
-----------+----------------------+----------
     Total |     5,766        112 |     5,878 



-------------------------------------------------
Sensitivity                     D/(C+D)   60.00%      
Specificity                     A/(A+B)   98.79%      
Positive predictive value       D/(B+D)   37.50%      
Negative predictive value       A/(A+C)   99.51%      
-------------------------------------------------
False positive rate             B/(A+B)    1.21%      
False negative rate             C/(C+D)   40.00%      
-------------------------------------------------
Correctly classified      A+C/(A+B+C+D)   98.33%      
-------------------------------------------------
Effect strength for sensitivity           58.79%      
-------------------------------------------------
ROC area                                  0.7940      
-------------------------------------------------

Comment

Bruce Weaver

Join Date: May 2014

Posts: 1132
#9

03 Oct 2017, 12:05

daniel klein, thanks for clarifying (in #7).

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 170
#10

03 Oct 2017, 12:18

Thank you all for identifying the bug in -classtabi- as well as the code for including values from a matrix. I'll revise the command and let you know when it's up on SSC.

Ariel
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 170
#11

09 Oct 2017, 13:26

Hi All,

A revised version of -classtabi- is now available for download from SSC (type: ssc install classtabi, replace)

I would like to acknowledge those individuals on this list that contributed to this revision: Adam Ross Nelson suggested that classtabi accept matrix arguments, and Nicholas J. Cox supplied some elegant code to do just that! Anders Alexandersson found a bug when the user enters a zero into any cell, and Daniel Klein found the source of the bug and provided a simple fix. Bruce Weaver noted an error in the output label for "Correctly classified".

Thank you all for your support!

Ariel
Comment

Announcement