Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A new command, -classtabi-, now available for download from SSC

    Thanks to Kit Baum, a new command called classtabi is now available on SSC.

    classtabi is an immediate command that reports various summary statistics, including a 2 x 2 table for discrete classification data. classtabi is helpful in cases where only summarized data are available. For example, data-mining software generally produce a 2 X 2 classification table (referred to as confusion matrix) as part of the output. Those values can then be entered into classtabi to produce the additional classification statistics (such as ROC area, Effect Strength for Sensitivity, etc.).

    As always, I welcome users to share comments and suggestions.

    Ariel

  • #2
    Hello, I am a fan of this command. I was recently attempting to use it with the following approach:


    Code:
    . tab prUnd isUnd, matcell(confmat1)
    
     Predicted |
         Under |  Actual Under 1.9999
        2.0001 |         0          1 |     Total
    -----------+----------------------+----------
             0 |     5,738         70 |     5,808 
             1 |        28         42 |        70 
    -----------+----------------------+----------
         Total |     5,766        112 |     5,878 
    
    
    . classtabi confmat1[1,1] confmat1[1,2] confmat1[2,1] confmat1[2,2]
    invalid '2' 
    r(198);
    The goal was to avoid using local or global. The following (below) worked. Any suggestion on how to make this more efficient?


    Code:
    . tab prUnd isUnd, row matcell(confmat1)
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
     Predicted |
         Under |  Actual Under
               |         0          1 |     Total
    -----------+----------------------+----------
             0 |     5,738         70 |     5,808 
               |     98.79       1.21 |    100.00 
    -----------+----------------------+----------
             1 |        28         42 |        70 
               |     40.00      60.00 |    100.00 
    -----------+----------------------+----------
         Total |     5,766        112 |     5,878 
               |     98.09       1.91 |    100.00 
    
    
    . local trueneg = confmat1[1,1]
    
    . local falseneg = confmat1[1,2]
    
    . local falspos = confmat1[2,1]
    
    . local truepos = confmat1[2,2]
    
    . classtabi `trueneg' `falseneg' `falspos' `truepos', rowlabel(Predicted) collabel(Actual)
    
               |        Actual
     Predicted |         0          1 |     Total
    -----------+----------------------+----------
             0 |     5,738         70 |     5,808 
             1 |        28         42 |        70 
    -----------+----------------------+----------
         Total |     5,766        112 |     5,878 
    
    
    
    -------------------------------------------------
    Sensitivity                     D/(C+D)   60.00%      
    Specificity                     A/(A+B)   98.79%      
    Positive predictive value       D/(B+D)   37.50%      
    Negative predictive value       A/(A+C)   99.51%      
    -------------------------------------------------
    False positive rate             B/(A+B)    1.21%      
    False negative rate             C/(C+D)   40.00%      
    -------------------------------------------------
    Correctly classified      A+C/(A+B+C+D)   98.33%      
    -------------------------------------------------
    Effect strength for sensitivity           58.79%      
    -------------------------------------------------
    ROC area                                  0.7940      
    -------------------------------------------------

    Comment


    • #3
      Adam: You don't give a data example (FAQ Advice #12) but as it happens we can reproduce the relevant part of your data.

      There's an easier way to use your matrix elements; note that a matrix name is, whatever you do here so far as I can see, one you would have to retype four times, so a short name helps.

      For the syntax used here see

      Code:
      help macro

      Code:
       
       . clear   . tabi 5738 70 \ 28 42              |          col        row |         1          2 |     Total -----------+----------------------+----------          1 |     5,738         70 |     5,808           2 |        28         42 |        70  -----------+----------------------+----------      Total |     5,766        112 |     5,878              Fisher's exact =                 0.000    1-sided Fisher's exact =                 0.000  . rename (row col) (prUnd isUnd)  . expand pop  (5,874 observations created)  . tab prUnd isUnd, matcell(t1)             |         isUnd      prUnd |         1          2 |     Total -----------+----------------------+----------          1 |     5,738         70 |     5,808           2 |        28         42 |        70  -----------+----------------------+----------      Total |     5,766        112 |     5,878    .  . classtabi `=t1[1,1]' `=t1[1,2]' `=t1[2,1]' `=t1[2,2]'              |          col        row |         0          1 |     Total -----------+----------------------+----------          0 |     5,738         70 |     5,808           1 |        28         42 |        70  -----------+----------------------+----------      Total |     5,766        112 |     5,878     ------------------------------------------------- Sensitivity                     D/(C+D)   60.00%       Specificity                     A/(A+B)   98.79%       Positive predictive value       D/(B+D)   37.50%       Negative predictive value       A/(A+C)   99.51%       ------------------------------------------------- False positive rate             B/(A+B)    1.21%       False negative rate             C/(C+D)   40.00%       ------------------------------------------------- Correctly classified      A+C/(A+B+C+D)   98.33%       ------------------------------------------------- Effect strength for sensitivity           58.79%       ------------------------------------------------- ROC area                                  0.7940       -------------------------------------------------

      Comment


      • #4
        Ariel Linden I think I have discovered a bug in classtabi because when I type 0 the output is instead 1; why is that? I am using Stata/MP 15.0 for Windows (64-bit x86-64) on Windows 7.

        Attempt to reproduce example from R package RecordLinkage using the classification table on page 64 in the published article at https://journal.r-project.org/archiv...riyar+Borg.pdf :

        Code:
        . classtabi 1172 0 3 46
        
                   |          col
               row |         0          1 |     Total
        -----------+----------------------+----------
                 0 |     1,172          1 |     1,173
                 1 |         3         46 |        49
        -----------+----------------------+----------
             Total |     1,175         47 |     1,222
        
        
        
        -------------------------------------------------
        Sensitivity                     D/(C+D)   93.88%      
        Specificity                     A/(A+B)  100.00%      
        Positive predictive value       D/(B+D)  100.00%      
        Negative predictive value       A/(A+C)   99.74%      
        -------------------------------------------------
        False positive rate             B/(A+B)    0.00%      
        False negative rate             C/(C+D)    6.12%      
        -------------------------------------------------
        Correctly classified      A+C/(A+B+C+D)   99.75%      
        -------------------------------------------------
        Effect strength for sensitivity           93.88%      
        -------------------------------------------------
        ROC area                                  0.9690      
        -------------------------------------------------
        Edit: The problem above is shown for #b, false positive. The example on page 65 classtabi 1159 13 0 49 with 0 for #c, false negative, also wrongly displays 1.
        Last edited by Anders Alexandersson; 03 Oct 2017, 08:06.

        Comment


        • #5
          Following up on Anders in #4:

          Code:
          Code:
          clear
          input a b c d
          1172 0 3 46
          1172 1 3 46
          end
          
          generate N    = a+b+c+d
          generate sens = d/(c+d)
          generate spec = a/(a+b)
          generate ppv  = d/(b+d)
          generate npv  = a/(a+c)
          generate fpr  = b/(a+b)
          generate fnr  = c/(c+d)
          generate acc  = (a+d)/N
          format sens-acc %8.4f
          
          classtabi 1172 0 3 46
          list a-N
          list sens-acc
          Output:
          Code:
          . classtabi 1172 0 3 46
          
                     |          col
                 row |         0          1 |     Total
          -----------+----------------------+----------
                   0 |     1,172          1 |     1,173
                   1 |         3         46 |        49
          -----------+----------------------+----------
               Total |     1,175         47 |     1,222
          
          
          
          -------------------------------------------------
          Sensitivity                     D/(C+D)   93.88%      
          Specificity                     A/(A+B)  100.00%      
          Positive predictive value       D/(B+D)  100.00%      
          Negative predictive value       A/(A+C)   99.74%      
          -------------------------------------------------
          False positive rate             B/(A+B)    0.00%      
          False negative rate             C/(C+D)    6.12%      
          -------------------------------------------------
          Correctly classified      A+C/(A+B+C+D)   99.75%      
          -------------------------------------------------
          Effect strength for sensitivity           93.88%      
          -------------------------------------------------
          ROC area                                  0.9690      
          -------------------------------------------------
          
          . list a-N
          
               +--------------------------+
               |    a   b   c    d      N |
               |--------------------------|
            1. | 1172   0   3   46   1221 |
            2. | 1172   1   3   46   1222 |
               +--------------------------+
          
          . list sens-acc
          
               +--------------------------------------------------------------+
               |   sens     spec      ppv      npv      fpr      fnr      acc |
               |--------------------------------------------------------------|
            1. | 0.9388   1.0000   1.0000   0.9974   0.0000   0.0612   0.9975 |
            2. | 0.9388   0.9991   0.9787   0.9974   0.0009   0.0612   0.9967 |
               +--------------------------------------------------------------+
          The results from classtabi match those in the first row of output, where b=0. So perhaps it's just a problem with displaying the cell counts properly?

          Note that the output from classtabi shows classification accuracy = A+C/(A+B+C+D). It should be (A+D)/(A+B+C+D).


          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Sorry, just noticed that #3 is messed up. Let's try again:

            Code:
            . clear  
            
            . tabi 5738 70 \ 28 42      
            
                       |          col
                   row |         1          2 |     Total
            -----------+----------------------+----------
                     1 |     5,738         70 |     5,808
                     2 |        28         42 |        70
            -----------+----------------------+----------
                 Total |     5,766        112 |     5,878
            
                       Fisher's exact =                 0.000
               1-sided Fisher's exact =                 0.000
            
            . rename (row col) (prUnd isUnd)
            
            . tab prUnd isUnd, matcell(t1)  
            
                       |         isUnd
                 prUnd |         1          2 |     Total
            -----------+----------------------+----------
                     1 |         1          1 |         2
                     2 |         1          1 |         2
            -----------+----------------------+----------
                 Total |         2          2 |         4
            
            
            . classtabi `=t1[1,1]' `=t1[1,2]' `=t1[2,1]' `=t1[2,2]', rowlabel(Predicted) collabel(Actual)
            
                       |        Actual
             Predicted |         0          1 |     Total
            -----------+----------------------+----------
                     0 |         1          1 |         2
                     1 |         1          1 |         2
            -----------+----------------------+----------
                 Total |         2          2 |         4
            
            
            
            -------------------------------------------------
            Sensitivity                     D/(C+D)   50.00%      
            Specificity                     A/(A+B)   50.00%      
            Positive predictive value       D/(B+D)   50.00%      
            Negative predictive value       A/(A+C)   50.00%      
            -------------------------------------------------
            False positive rate             B/(A+B)   50.00%      
            False negative rate             C/(C+D)   50.00%      
            -------------------------------------------------
            Correctly classified      A+C/(A+B+C+D)   50.00%      
            -------------------------------------------------
            Effect strength for sensitivity            0.00%      
            -------------------------------------------------
            ROC area                                  0.5000      
            -------------------------------------------------

            Comment


            • #7
              Originally posted by Bruce Weaver View Post
              Following up on Anders in #4:

              [...]

              So perhaps it's just a problem with displaying the cell counts properly?
              No, it is a bug. When classtabi expands the dataset from tabi internally, Ariel needs to drop the observations with zero weight. Here is the relevant part of what happens inside classtabi

              Code:
              . // create the dataset from contingency table
              . tabi 1172 0 \ 3 46 , replace
              
                         |          col
                     row |         1          2 |     Total
              -----------+----------------------+----------
                       1 |     1,172          0 |     1,172
                       2 |         3         46 |        49
              -----------+----------------------+----------
                   Total |     1,175         46 |     1,221
              
                         Fisher's exact =                 0.000
                 1-sided Fisher's exact =                 0.000
              
              .
              . // how does it look like
              . list
              
                   +------------------+
                   | row   col    pop |
                   |------------------|
                1. |   1     1   1172 |
                2. |   1     2      0 |
                3. |   2     1      3 |
                4. |   2     2     46 |
                   +------------------+
              
              .
              . // now expand and tabulate again (this is what -classtabi- does)
              . expand pop
              (1 zero count ignored; observation not deleted)
              (1218 observations created)
              
              . tabulate row col
              
                         |          col
                     row |         1          2 |     Total
              -----------+----------------------+----------
                       1 |     1,172          1 |     1,173
                       2 |         3         46 |        49
              -----------+----------------------+----------
                   Total |     1,175         47 |     1,222
              
              
              .
              . // this is what should be done
              . drop if !pop
              (1 observation deleted)
              
              . tabulate row col
              
                         |          col
                     row |         1          2 |     Total
              -----------+----------------------+----------
                       1 |     1,172          0 |     1,172
                       2 |         3         46 |        49
              -----------+----------------------+----------
                   Total |     1,175         46 |     1,221
              
              
              .
              . // alternatively use frequency weights
              . tabi 1172 0 \ 3 46 , replace
              
                         |          col
                     row |         1          2 |     Total
              -----------+----------------------+----------
                       1 |     1,172          0 |     1,172
                       2 |         3         46 |        49
              -----------+----------------------+----------
                   Total |     1,175         46 |     1,221
              
                         Fisher's exact =                 0.000
                 1-sided Fisher's exact =                 0.000
              
              . tabulate row col [ fweight = pop ]
              
                         |          col
                     row |         1          2 |     Total
              -----------+----------------------+----------
                       1 |     1,172          0 |     1,172
                       2 |         3         46 |        49
              -----------+----------------------+----------
                   Total |     1,175         46 |     1,221
              Note that Nick's code above should have included frequency weights in the middle part (or expanded the data in memory).

              Best
              Daniel
              Last edited by daniel klein; 03 Oct 2017, 10:52.

              Comment


              • #8
                Daniel's right. Sorry about that.

                Code:
                . clear  
                
                . tabi 5738 70 \ 28 42      
                
                           |          col
                       row |         1          2 |     Total
                -----------+----------------------+----------
                         1 |     5,738         70 |     5,808 
                         2 |        28         42 |        70 
                -----------+----------------------+----------
                     Total |     5,766        112 |     5,878 
                
                           Fisher's exact =                 0.000
                   1-sided Fisher's exact =                 0.000
                
                . rename (row col) (prUnd isUnd)
                
                . expand pop 
                (5,874 observations created)
                
                . tab prUnd isUnd, matcell(t1)  
                
                           |         isUnd
                     prUnd |         1          2 |     Total
                -----------+----------------------+----------
                         1 |     5,738         70 |     5,808 
                         2 |        28         42 |        70 
                -----------+----------------------+----------
                     Total |     5,766        112 |     5,878 
                
                
                . classtabi `=t1[1,1]' `=t1[1,2]' `=t1[2,1]' `=t1[2,2]', rowlabel(Predicted) collabel(Actual)
                
                           |        Actual
                 Predicted |         0          1 |     Total
                -----------+----------------------+----------
                         0 |     5,738         70 |     5,808 
                         1 |        28         42 |        70 
                -----------+----------------------+----------
                     Total |     5,766        112 |     5,878 
                
                
                
                -------------------------------------------------
                Sensitivity                     D/(C+D)   60.00%      
                Specificity                     A/(A+B)   98.79%      
                Positive predictive value       D/(B+D)   37.50%      
                Negative predictive value       A/(A+C)   99.51%      
                -------------------------------------------------
                False positive rate             B/(A+B)    1.21%      
                False negative rate             C/(C+D)   40.00%      
                -------------------------------------------------
                Correctly classified      A+C/(A+B+C+D)   98.33%      
                -------------------------------------------------
                Effect strength for sensitivity           58.79%      
                -------------------------------------------------
                ROC area                                  0.7940      
                -------------------------------------------------

                Comment


                • #9
                  daniel klein, thanks for clarifying (in #7).
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment


                  • #10
                    Thank you all for identifying the bug in -classtabi- as well as the code for including values from a matrix. I'll revise the command and let you know when it's up on SSC.

                    Ariel

                    Comment


                    • #11
                      Hi All,

                      A revised version of -classtabi- is now available for download from SSC (type: ssc install classtabi, replace)

                      I would like to acknowledge those individuals on this list that contributed to this revision: Adam Ross Nelson suggested that classtabi accept matrix arguments, and Nicholas J. Cox supplied some elegant code to do just that! Anders Alexandersson found a bug when the user enters a zero into any cell, and Daniel Klein found the source of the bug and provided a simple fix. Bruce Weaver noted an error in the output label for "Correctly classified".

                      Thank you all for your support!

                      Ariel


                      Comment

                      Working...
                      X