Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulating 2x2 with no data in one line

    Dear Statlisters,

    May I ask for some advice please.

    Background: I am running a loop that tabulates different subsets of two binary variables (A and B) each with levels called (for the sake of argument) "2" and "3" which should result in a number of 2x2 tables. I am exporting the frequency tables via putexcel.
    Some of the tabulations contain no data points for one of the crosstabulation rows eg variable A contains only 2's in that subset of data.
    Stata then very sensibly provides me with a 1x2 frequency table of results - why should Stata know that it should add an (essentially arbitrary) line of zeros labelled with "3"?

    Question 1) How do I ask stata to present me with a 2x2 matrix (with two adjacent horizontal/vertical cells containing zeros) for all occurrences this happens in my loop?
    I gather in other languages, one can define the data type as binary and give labels to the levels to suggest to that language that it should present both levels even if one level does not appear. In stata, I have been unable to figure out if going down this route is possible/sensible (as far as I can make out, all the data types are different numeric types without one specifically being binary).

    Any help solving this problem would be hugely appreciated.

    Follow up questions:
    I am using these tabulations to work out correlation coefficients (as the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix which I can turn into a heatmap using the command heatplot, overlaying the correlation coefficient and p-value).

    Question 2a) Since the Kramer's V statistic is calculated from the Chi2 statistic, the Chi2 p-value, would be valid to use as an indicator for statistical significance of Kramers V statistic - am I correct in my thinking? The correlation is assessed by Chi2 and a p-value produced, but I am choosing to use a transformation of the Chi2 statistic to Kramers to give a more accurate measure of the strength of correlation.

    Question 2b) There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give confidence to the result is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier to the reader to present these 2x2 frequency tables in a similar layout to the correlation matrix which contains the correlation coefficients and p-values. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.

    Kind regards
    Robert Shaw

  • #2
    Probably best to ask different questions in different threads. The longer is your post, the less likely that people will be willing to answer. Also, it should not be difficult to provide a data example of your first question, which I encourage you to do. Include one group with both levels and one without to compare the created matrices.
    Last edited by Andrew Musau; 30 Jun 2023, 09:23.

    Comment


    • #3
      Dear Andrew,
      Thank you for your response and your advice. Please find the first part of this question with data (20 observations only) via dataex to give an idea of data structure, followed by code with result on the complete dataset:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
      1 1 0 1
      1 1 0 1
      1 1 0 0
      1 1 0 1
      1 1 0 0
      1 1 0 0
      1 1 0 1
      1 1 1 0
      1 1 1 0
      1 1 0 1
      1 1 0 1
      1 1 0 1
      1 1 0 0
      1 0 1 0
      1 1 0 1
      1 1 1 0
      1 1 0 1
      1 1 0 0
      1 0 0 0
      1 1 0 1
      end
      Code:
      . tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 ,matcell(cellcounts)
      
      TrendB_D0_ |
      D56_1_1_1l |  TrendB_D0_D56_1_1_2
            abel |         0          1 |     Total
      -----------+----------------------+----------
               1 |         4        103 |       107 
      -----------+----------------------+----------
           Total |         4        103 |       107 
      
      . matrix list cellcounts
      
      cellcounts[1,2]
           c1   c2
      r1    4  103
      
      . tab TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2 ,matcell(cellcounts)
      
      TrendB_D56 | TrendB_D56_D84_2_2_2
      _D84_1_1_1 |         0          1 |     Total
      -----------+----------------------+----------
               0 |        18         47 |        65 
               1 |        18         24 |        42 
      -----------+----------------------+----------
           Total |        36         71 |       107 
      
      . matrix list cellcounts
      
      cellcounts[2,2]
          c1  c2
      r1  18  47
      r2  18  24
      
      .
      The first table displays only one line (1x2 matrix) with a value of 1, because in the subset of data being tabulated, there is no 0, whereas the second table displays a 2x2 matrix
      Any guidance you could provide on this would be appreciated. I will post the other questions elsewhere

      Kind regards
      Robert Shaw

      Comment


      • #4
        Run the program _2rowmat after each tabulation.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
        1 1 0 1
        1 1 0 1
        1 1 0 0
        1 1 0 1
        1 1 0 0
        1 1 0 0
        1 1 0 1
        1 1 1 0
        1 1 1 0
        1 1 0 1
        1 1 0 1
        1 1 0 1
        1 1 0 0
        1 0 1 0
        1 1 0 1
        1 1 1 0
        1 1 0 1
        1 1 0 0
        1 0 0 0
        1 1 0 1
        end
        
        
        capture program drop _2rowmat
        program define _2rowmat, rclass
        
        if r(r)==1 & k[1,1]==0{
            mat cellcounts= cellcounts\ [0,0]
            mat rowname cellcounts= r1 r2
        }
        if r(r)==1 & k[1,1]==1{ 
            mat cellcounts= [0,0]\cellcounts
            mat rowname cellcounts= r1 r2
        }
        end
        
        tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
        _2rowmat
        mat list cellcounts
        
        tab TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2 ,matcell(cellcounts) matrow(k)
        _2rowmat
        mat l cellcounts
        Res.:

        Code:
        . tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
        
        TrendB_D0_ |  TrendB_D0_D56_1_1_2
         D56_1_1_1 |         0          1 |     Total
        -----------+----------------------+----------
                 1 |         2         18 |        20 
        -----------+----------------------+----------
             Total |         2         18 |        20 
        
        . 
        . _2rowmat
        
        . 
        . mat list cellcounts
        
        cellcounts[2,2]
            c1  c2
        r1   0   0
        r2   2  18
        
        . 
        . 
        . 
        . tab TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2 ,matcell(cellcounts) matrow(k)
        
        TrendB_D56 | TrendB_D56_D84_2_2_2
        _D84_1_1_1 |         0          1 |     Total
        -----------+----------------------+----------
                 0 |         6         10 |        16 
                 1 |         4          0 |         4 
        -----------+----------------------+----------
             Total |        10         10 |        20 
        
        . 
        . _2rowmat
        
        . 
        . mat l cellcounts
        
        cellcounts[2,2]
            c1  c2
        r1   6  10
        r2   4   0
        
        .

        Comment


        • #5
          Thank you Andrew! This looks fantastic! I will implement now. I will try tonight to make an equivalent _2colmat for the cases where the transverse case occurs with only 1 column.
          Rob

          Comment


          • #6
            Dear Andrew,
            Many thanks for this. May I ask for further help please :

            The output I'm sure is what I want, but I am having difficulty modifying my original code to get the correct column and row labels.
            Code:
            capture program drop _2rowmat
            program define _2rowmat, rclass
            
            if r(r)==1 & k[1,1]==0{
                mat cellcounts= cellcounts\ [0,0]
                mat rowname cellcounts= r1 r2
            }
            if r(r)==1 & k[1,1]==1{ 
                mat cellcounts= [0,0]\cellcounts
                mat rowname cellcounts= r1 r2
            }
            end
            
            putexcel set "example.xlsx", modify
            
            local counter=0
            local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
            foreach d of local a {
                local b `" "_1_1_1" "_1_1_2" "_1_2_1" "_1_2_2" "_3_1_1" "_3_1_2" "_3_2_1" "_3_2_2" "'
                local c `" "_1_1_1" "_1_1_2" "_1_2_1" "_1_2_2" "_3_1_1" "_3_1_2" "_3_2_1" "_3_2_2" "'
                foreach e of local b {
                    local c: list c - e
                    foreach f of local c {
                        tab `d'`e' `d'`f' if "`e'"!="`f'" & SerostatusMSD==0 , matcell(cellcounts) matrow(k)
                        _2rowmat
                        levelsof `d'`e', local(g)
                        levelsof `d'`f', local(h)
                        matrix rownames cellcounts = `g'
                        matrix colnames cellcounts = `h'
                        local matlocn=3+`counter'
                        local matlocn="B"+"`matlocn'"
                        putexcel `matlocn' = matrix(cellcounts), names
                        local rowlocn=2+`counter'
                        local rowlocn="A"+"`rowlocn'"
                        putexcel `rowlocn' = "`: variable label `d'`e''"
                        local collocn=1+`counter'
                        local collocn="B"+"`collocn'"
                        putexcel `collocn' = "`: variable label `d'`f''"
                        local counter=`counter'+6
                    }
                }    
            }
            the output of my above code incorrectly labels the rows because I am still extracting the row names from the original cellcounts matrix (I believe), but I haven't quite been able to figure out how to get the new row names:
            TrendB_D0_D56_1_1_2
            TrendB_D0_D56_1_1_1
            0 1
            1 0 0
            1 4 82

            Additionally, in order to prepare for the eventuality that I would need to perform the same code for columns, I naively adapted your code and came up with something non-functional - I am completely unversed in creating stata programs

            Code:
            capture program drop _2colmat
            program define _2colmat, rclass
            
            if c(c)==1 & k[1,1]==0{
                mat cellcounts= cellcounts\ [0,0]
                mat colname cellcounts= c1 c2
            }
            if c(c)==1 & k[1,1]==1{ 
                mat cellcounts= [0,0]\cellcounts
                mat colname cellcounts= c1 c2
            }
            end
            this did not work - may I ask for your help in troubleshooting this?

            Comment


            • #7
              See

              Code:
              help matrix operators
              Here, you want to join columns. You define a column as the transpose of the row vector and use the tabulate option -matcol()- instead of -matrow()-. You can set a condition which selects whether you need to add a column or a row. Do you know this beforehand?

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
              1 1 0 1
              1 1 0 1
              1 1 0 0
              1 1 0 1
              1 1 0 0
              1 1 0 0
              1 1 0 1
              1 1 1 0
              1 1 1 0
              1 1 0 1
              1 1 0 1
              1 1 0 1
              1 1 0 0
              1 0 1 0
              1 1 0 1
              1 1 1 0
              1 1 0 1
              1 1 0 0
              1 0 0 0
              1 1 0 1
              end
              
              
              capture program drop _2colmat
              program define _2colmat, rclass
              
              if r(c)==1 & k[1,1]==0{
                  mat cellcounts= cellcounts,([0,0])'
                  mat colname cellcounts= c1 c2
              }
              if r(c)==1 & k[1,1]==1{ 
                  mat cellcounts= ([0,0])',cellcounts
                  mat colname cellcounts= c1 c2
              }
              end
              
              tab TrendB_D0_D56_1_1_2 TrendB_D0_D56_1_1_1 , matcell(cellcounts) matcol(k)
              _2colmat
              mat list cellcounts
              
              tab TrendB_D56_D84_2_2_2 TrendB_D56_D84_1_1_1,matcell(cellcounts) matcol(k)
              _2colmat
              mat l cellcounts
              Res.:

              Code:
              . tab TrendB_D0_D56_1_1_2 TrendB_D0_D56_1_1_1 , matcell(cellcounts) matcol(k)
              
                         | TrendB_D0_
              TrendB_D0_ | D56_1_1_1
               D56_1_1_2 |         1 |     Total
              -----------+-----------+----------
                       0 |         2 |         2 
                       1 |        18 |        18 
              -----------+-----------+----------
                   Total |        20 |        20 
              
              . 
              . _2colmat
              
              . 
              . mat list cellcounts
              
              cellcounts[2,2]
                  c1  c2
              r1   0   2
              r2   0  18
              
              . 
              . 
              . 
              . tab TrendB_D56_D84_2_2_2 TrendB_D56_D84_1_1_1,matcell(cellcounts) matcol(k)
              
              TrendB_D56 | TrendB_D56_D84_1_1_1
              _D84_2_2_2 |         0          1 |     Total
              -----------+----------------------+----------
                       0 |         6          4 |        10 
                       1 |        10          0 |        10 
              -----------+----------------------+----------
                   Total |        16          4 |        20 
              
              . 
              . _2colmat
              
              . 
              . mat l cellcounts
              
              cellcounts[2,2]
                  c1  c2
              r1   6   4
              r2  10   0
              
              .

              Comment


              • #8
                On the names of rows and columns, r1 and c1 correspond to the level 0 and r2 and c2 correspond to the level 1. I do not follow the question. The order in tabulate defines the row variable and the column variable. Perhaps use the data example in #3 to illustrate what you want.

                Comment


                • #9
                  Dear Andrew,
                  I was being unclear earlier. Thanks for this code. However, in the case of this loop, I am agnostic as to what may be missing and not missing. There are some cases where there is a first row missing, or a second row missing, or a first column missing or a second column missing. Additionally there are cases which are missing both a column and a row (ie only one of the four quadrants is populated).
                  I have manufactured a short bit of data below to demonstrate this:

                  Code:
                  clear
                  input float (A B C D E)
                  0 0 1 1 0
                  0 0 1 1 1
                  0 0 1 1 0
                  0 0 1 1 1
                  0 0 1 1 0
                  0 0 1 1 1
                  0 0 1 1 0
                  0 0 1 1 1
                  0 0 1 1 0
                  0 0 1 1 1
                  0 0 1 1 0
                  0 0 1 1 1
                  end
                  Below are all the potential tabulation outputs that may result. Is there a way to account for all of these in advance?
                  Code:
                  . tab A B
                  
                             |     B
                           A |         0 |     Total
                  -----------+-----------+----------
                           0 |        12 |        12 
                  -----------+-----------+----------
                       Total |        12 |        12 
                  
                  . tab A C
                  
                             |     C
                           A |         1 |     Total
                  -----------+-----------+----------
                           0 |        12 |        12 
                  -----------+-----------+----------
                       Total |        12 |        12 
                  
                  . tab C A
                  
                             |     A
                           C |         0 |     Total
                  -----------+-----------+----------
                           1 |        12 |        12 
                  -----------+-----------+----------
                       Total |        12 |        12 
                  
                  . tab C D
                  
                             |     D
                           C |         1 |     Total
                  -----------+-----------+----------
                           1 |        12 |        12 
                  -----------+-----------+----------
                       Total |        12 |        12 
                  
                  . tab A E
                  
                             |           E
                           A |         0          1 |     Total
                  -----------+----------------------+----------
                           0 |         6          6 |        12 
                  -----------+----------------------+----------
                       Total |         6          6 |        12 
                  
                  . tab C E
                  
                             |           E
                           C |         0          1 |     Total
                  -----------+----------------------+----------
                           1 |         6          6 |        12 
                  -----------+----------------------+----------
                       Total |         6          6 |        12 
                  
                  . tab E A
                  
                             |     A
                           E |         0 |     Total
                  -----------+-----------+----------
                           0 |         6 |         6 
                           1 |         6 |         6 
                  -----------+-----------+----------
                       Total |        12 |        12 
                  
                  . tab E C
                  
                             |     C
                           E |         1 |     Total
                  -----------+-----------+----------
                           0 |         6 |         6 
                           1 |         6 |         6 
                  -----------+-----------+----------
                       Total |        12 |        12 
                  
                  . tab E E
                  
                             |           E
                           E |         0          1 |     Total
                  -----------+----------------------+----------
                           0 |         6          0 |         6 
                           1 |         0          6 |         6 
                  -----------+----------------------+----------
                       Total |         6          6 |        12
                  with regards to my comment regarding row and column names, I was trying to make the code generic to rename the matrix rows and columns as per the levels of the cell counts matrix after it had been acted on by the program you had written. I got myself tied into knots, but then realised, I could just hardcode the 0 and 1 as I know this will be the desired output in this case:
                  Code:
                  matrix rownames cellcounts = 0 1
                  matrix colnames cellcounts = 0 1
                  using the data in 3 to demonstrate what I was trying to achieve:
                  Code:
                  . tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
                  
                  TrendB_D0_ |  TrendB_D0_D56_1_1_2
                   D56_1_1_1 |         0          1 |     Total
                  -----------+----------------------+----------
                           1 |         2         18 |        20 
                  -----------+----------------------+----------
                       Total |         2         18 |        20 
                  
                  . _2rowmat
                  
                  . mat list cellcounts
                  
                  cellcounts[2,2]
                      c1  c2
                  r1   0   0
                  r2   2  18
                  
                  . levelsof(TrendB_D0_D56_1_1_1), local(a)
                  1
                  
                  . levelsof(TrendB_D0_D56_1_1_2), local(b)
                  0 1
                  
                  . matrix rownames cellcounts = `a'
                  
                  . matrix colnames cellcounts = `b'
                  
                  . mat list cellcounts
                  
                  cellcounts[2,2]
                      0   1
                  1   0   0
                  1   2  18
                  I was unable to name the first row as "0"

                  Comment


                  • #10
                    See

                    Code:
                    help naming conventions
                    As with Stata names, matrix column names cannot start with numbers. So add a leading letter or underscore.

                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
                    1 1 0 1
                    1 1 0 1
                    1 1 0 0
                    1 1 0 1
                    1 1 0 0
                    1 1 0 0
                    1 1 0 1
                    1 1 1 0
                    1 1 1 0
                    1 1 0 1
                    1 1 0 1
                    1 1 0 1
                    1 1 0 0
                    1 0 1 0
                    1 1 0 1
                    1 1 1 0
                    1 1 0 1
                    1 1 0 0
                    1 0 0 0
                    1 1 0 1
                    end
                    
                    capture program drop _2rowmat
                    program define _2rowmat, rclass
                    
                    if r(r)==1 & k[1,1]==0{
                        mat cellcounts= cellcounts\ [0,0]
                    }
                    if r(r)==1 & k[1,1]==1{ 
                        mat cellcounts= [0,0]\cellcounts
                    }
                    end
                    
                    tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
                    _2rowmat
                    mat rownames cellcounts= _0 _1
                    mat colnames cellcounts=_0 _1
                    mat list cellcounts
                    Res.:

                    Code:
                    . 
                    . tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
                    
                    TrendB_D0_ |  TrendB_D0_D56_1_1_2
                     D56_1_1_1 |         0          1 |     Total
                    -----------+----------------------+----------
                             1 |         2         18 |        20 
                    -----------+----------------------+----------
                         Total |         2         18 |        20 
                    
                    . 
                    . _2rowmat
                    
                    . 
                    . mat rownames cellcounts= _0 _1
                    
                    . 
                    . mat colnames cellcounts=_0 _1
                    
                    . 
                    . mat list cellcounts
                    
                    cellcounts[2,2]
                        _0  _1
                    _0   0   0
                    _1   2  18
                    
                    .

                    Comment


                    • #11
                      Below are all the potential tabulation outputs that may result. Is there a way to account for all of these in advance?
                      The singletons need more work. I will write something that captures all scenarios when I get the time.

                      Comment


                      • #12
                        Much obliged Andrew. I very much appreciate the time you are giving to help me. I will try reading up on stata programming in the interim fully to understand the above code you have written.
                        Kind regards
                        Robert Shaw

                        Comment


                        • #13
                          Code:
                          clear
                          input float (A B C D E)
                          0 0 1 1 0
                          0 0 1 1 1
                          0 0 1 1 0
                          0 0 1 1 1
                          0 0 1 1 0
                          0 0 1 1 1
                          0 0 1 1 0
                          0 0 1 1 1
                          0 0 1 1 0
                          0 0 1 1 1
                          0 0 1 1 0
                          0 0 1 1 1
                          end
                          
                          capture program drop mat2x2
                          program define mat2x2, rclass
                          if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==0{
                              mat cellcounts= cellcounts, 0\ [0,0]
                          }
                          if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==0{
                              mat cellcounts=  0, cellcounts\ [0,0]
                          }
                          if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==1{
                              mat cellcounts= [0,0]\ 0, cellcounts
                          }
                          if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==1{
                              mat cellcounts= [0,0]\cellcounts, 0
                          } 
                          if r(r)==1 & r(c)==2 & r[1,1]==0{
                              mat cellcounts= cellcounts\ [0,0]
                          }
                          if r(r)==1 & r(c)==2 & r[1,1]==1{
                              mat cellcounts= [0,0]\cellcounts
                          }
                          if r(r)==2 & r(c)==1 & c[1,1]==0{
                              mat cellcounts= cellcounts,([0,0])'
                          }
                          if r(r)==2 & r(c)==1 & c[1,1]==1{
                              mat cellcounts= ([0,0])',cellcounts
                          }
                          mat colnames cellcounts= _0 _1
                          mat rownames cellcounts= _0 _1
                          mat list cellcounts
                          end
                          
                          tab A B, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab A C, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab C A, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab C D, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab A E, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab C E, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab E A, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab E C, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          
                          
                          tab E E, matcell(cellcounts) matcol(c) matrow(r)
                          mat2x2
                          Res.:

                          Code:
                          . tab A B, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |     B
                                   A |         0 |     Total
                          -----------+-----------+----------
                                   0 |        12 |        12 
                          -----------+-----------+----------
                               Total |        12 |        12 
                          
                          . 
                          . mat2x2
                          
                          symmetric cellcounts[2,2]
                              _0  _1
                          _0  12
                          _1   0   0
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab A C, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |     C
                                   A |         1 |     Total
                          -----------+-----------+----------
                                   0 |        12 |        12 
                          -----------+-----------+----------
                               Total |        12 |        12 
                          
                          . 
                          . mat2x2
                          
                          cellcounts[2,2]
                              _0  _1
                          _0   0  12
                          _1   0   0
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab C A, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |     A
                                   C |         0 |     Total
                          -----------+-----------+----------
                                   1 |        12 |        12 
                          -----------+-----------+----------
                               Total |        12 |        12 
                          
                          . 
                          . mat2x2
                          
                          cellcounts[2,2]
                              _0  _1
                          _0   0   0
                          _1  12   0
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab C D, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |     D
                                   C |         1 |     Total
                          -----------+-----------+----------
                                   1 |        12 |        12 
                          -----------+-----------+----------
                               Total |        12 |        12 
                          
                          . 
                          . mat2x2
                          
                          symmetric cellcounts[2,2]
                              _0  _1
                          _0   0
                          _1   0  12
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab A E, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |           E
                                   A |         0          1 |     Total
                          -----------+----------------------+----------
                                   0 |         6          6 |        12 
                          -----------+----------------------+----------
                               Total |         6          6 |        12 
                          
                          . 
                          . mat2x2
                          
                          cellcounts[2,2]
                              _0  _1
                          _0   6   6
                          _1   0   0
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab C E, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |           E
                                   C |         0          1 |     Total
                          -----------+----------------------+----------
                                   1 |         6          6 |        12 
                          -----------+----------------------+----------
                               Total |         6          6 |        12 
                          
                          . 
                          . mat2x2
                          
                          cellcounts[2,2]
                              _0  _1
                          _0   0   0
                          _1   6   6
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab E A, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |     A
                                   E |         0 |     Total
                          -----------+-----------+----------
                                   0 |         6 |         6 
                                   1 |         6 |         6 
                          -----------+-----------+----------
                               Total |        12 |        12 
                          
                          . 
                          . mat2x2
                          
                          cellcounts[2,2]
                              _0  _1
                          _0   6   0
                          _1   6   0
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab E C, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |     C
                                   E |         1 |     Total
                          -----------+-----------+----------
                                   0 |         6 |         6 
                                   1 |         6 |         6 
                          -----------+-----------+----------
                               Total |        12 |        12 
                          
                          . 
                          . mat2x2
                          
                          cellcounts[2,2]
                              _0  _1
                          _0   0   6
                          _1   0   6
                          
                          . 
                          . 
                          . 
                          . 
                          . 
                          . tab E E, matcell(cellcounts) matcol(c) matrow(r)
                          
                                     |           E
                                   E |         0          1 |     Total
                          -----------+----------------------+----------
                                   0 |         6          0 |         6 
                                   1 |         0          6 |         6 
                          -----------+----------------------+----------
                               Total |         6          6 |        12 
                          
                          . 
                          . mat2x2
                          
                          symmetric cellcounts[2,2]
                              _0  _1
                          _0   6
                          _1   0   6
                          
                          .

                          Comment


                          • #14
                            Dear Andrew,
                            many thanks for this. I am away from my computer for a few days, but reading through the code, it is instructive to me as well as solving my problem. Many thanks. I will post the finished code and results covering all the aspects you’ve helped me with over the last few days for the reference of other statalisters.
                            kind regards and thanks again
                            Robert Shaw

                            Comment


                            • #15
                              Dear Andrew,
                              Your code worked excellently. Thank you. I have, below, prepared a summary of my problem, your help and my comments on it, should it be of help to others on statalist in the future. Many thanks again.

                              Background: I am running a loop that cross-tabulates different combinations binary variables, which should result in a number of 2x2 tables. I want to use these tabulations to work out correlation coefficients. Since the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix (rather than cross tabulating a whole series of binary variables as originally planned) which I can turn into a heatmap using the command heatplot, overlaying the correlation coefficient and p-value). See helpful advice by Andrew Musau here: https://www.statalist.org/forums/for...about-heatplot

                              There is, however, a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give confidence to the result is also to show the 2x2 frequency table for each cell in a separate, but parallel 'frequencies' correlation matrix heatplot, as it will be easier to read with the 2x2 matrices in the same orientation as the 'results' heatplot.

                              This step was complicated by the fact that some of the 2x2 cross-tabulations are 'incomplete'. By incomplete, I mean that there are some cases where the first row is missing (as the horizontal cross-tabulation variable does not contain any '0's'), or a second row missing (as the horizontal cross-tabulation variable does not contain any '1's'), or a first column missing (same for the vertical variable) or a second column missing. Additionally there are cases which are missing both a column and a row (ie only one of the four quadrants is populated). In this case, Stata very sensibly provides a 1x2 matrix (or 1x1 matrix) of frequency results - after all, why should Stata know that it should add an (essentially arbitrary) line of zeros labelled with "0" or "1"? In order to demonstrate the data clearly, it would be helpful to have the 2x2 matrices 'completed' with lines of columns of zeros as appropriate.

                              Working backwards, Andrew kindly took the time to make a small program that manipulates the output matrices from tabulate to create 'completed' matrices:

                              Code:
                              capture program drop mat2x2
                              program define mat2x2, rclass
                              if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==0 {
                                  mat cellcounts = cellcounts, 0 \ [0,0]
                              }
                              if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==0 {
                                  mat cellcounts = 0, cellcounts \ [0,0]
                              }
                              if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==1 {
                                  mat cellcounts = [0,0] \ 0, cellcounts
                              }
                              if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==1 {
                                  mat cellcounts = [0,0] \ cellcounts, 0
                              }
                              if r(r)==1 & r(c)==2 & r[1,1]==0 {
                                  mat cellcounts = cellcounts \ [0,0]
                              }
                              if r(r)==1 & r(c)==2 & r[1,1]==1 {
                                  mat cellcounts = [0,0] \ cellcounts
                              }
                              if r(r)==2 & r(c)==1 & c[1,1]==0 {
                                  mat cellcounts = cellcounts, ([0,0])'
                              }
                              if r(r)==2 & r(c)==1 & c[1,1]==1 {
                                  mat cellcounts = ([0,0])', cellcounts
                              }
                              mat colnames cellcounts= 0 1
                              mat rownames cellcounts= 0 1
                              mat list cellcounts
                              end
                              r(r) is the returned result for the number of rows, r(c) is the returned result for the number of columns, the r[1,1] and c[1,1] refer to the rownames and colnames that have been prospectively specified in the tabulate command options ie is the value of the row/column label equal to 0 or 1. The matcell options outputs a separate returned matrix result (in this case called cellcounts) which is what I am trying to access:

                              Code:
                              tab `a' `b' matcell(cellcounts) matrow(r) matcol(c)
                              In order to get these 2x2 matrices into the heatmap format, the output cell counts matrix is then subsetted by line, and put into the cell as text, with a line break to make things clear. Below please find the code for producing both the 'results' heatmap and the 'frequencies' heatmap:

                              Code:
                              local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
                              foreach var1 of local a {
                                  pwcorr `var1'_1_2_2 `var1'_1_2_1 `var1'_1_1_2 `var1'_1_1_1 `var1'_3_2_2 `var1'_3_2_1 `var1'_3_1_2 `var1'_3_1_1, sig
                                  matrix C = r(C)
                                  matrix sig = r(sig)
                              
                                  // apply heatplot without displaying a graph, just to collect information;
                                  // option generate stores the information (coordinates etc) as variables
                                  heatplot C, values(label(sig)) lower nodraw generate nodiagonal
                                  // significance stars
                                  gen str sig = string(_Mlab, "%5.4f")
                                  replace sig = string(_Mlab, "%5.4f")+"*" if _Mlab<0.0004464 // 0.05 / ((7+6+5+4+3+2+1) * 4 timepoint intervals) - this is a manual Bonferroni correction
                                  // second call to heatplot useing addplot to print the marker labels
                                  local b=subinstr("`var1'", "TrendB_", "", .)
                                  heatplot C, colors(blue white red, ipolate(110)) cuts(-1.02(0.02)1.02) lower nodiagonal aspectratio(1) xlabel(,labsize(small) angle(45)) ylabel(,labsize(small)) legend(off) ///
                                  addplot(scatter _Y _X if _Y!=_X, msym(i) mlab(_Z) mlabf(%9.2f) mlabpos(0) mlabc(black) ///
                                  || scatter _Y _X if _Y==_X, msym(i) mlab(_Z) mlabf(%9.0f) mlabpos(0) mlabc(black) ///
                                  || scatter _Y _X if _Y!=_X, msym(i) mlab(sig) mlabpos(6) mlabgap(2) mlabc(black) mlabsize(vsmall))
                                  drop _Z-sig
                              }
                              and the companion frequencies heatplot:

                              Code:
                              local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
                              foreach d of local a {
                                  local counter=1
                                  qui corr `d'_1_2_2 `d'_1_2_1 `d'_1_1_2 `d'_1_1_1 `d'_3_2_2 `d'_3_2_1 `d'_3_1_2 `d'_3_1_1
                                  matrix W = r(C)
                                  local b `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
                                  local i `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
                                  foreach e of local b {
                                      local i: list i - e //to make an all against all cross tabulation without repeats
                                      foreach f of local i {
                                          tab `d'`e' `d'`f' if "`e'"!="`f'" , matcell(cellcounts) matrow(r) matcol(c) //ensure tabulation options correctly specified
                                          mat2x2 //instigate previously defined program
                                          local g=cellcounts[1,1]
                                          local h=cellcounts[2,1]
                                          local a`counter'= "`g'" + "  " + "`h'" //make local for first line of string insertion
                                          local g=cellcounts[1,2]
                                          local h=cellcounts[2,2]
                                          local b`counter'= "`g'" + "  " + "`h'" //make local for second line of string insertion
                                          local counter=`counter'+1
                                      }
                                  }
                                  local offset 0
                                  heatplot W,  colors(blue white red, ipolate(110)) cuts(-1.02(0.02)1.02) lower nodiagonal aspectratio(1) xlabel(,labsize(small) angle(45)) ylabel(,labsize(small))legend(off) ///
                                  text(`=2-`offset'' 1 "`a1'" " " "`b1'", size(2)) ///
                                  text(`=3-`offset'' 1 "`a2'" " " "`b2'", size(2)) ///
                                  text(`=4-`offset'' 1 "`a3'" " " "`b3'", size(2)) ///
                                  text(`=5-`offset'' 1 "`a4'" " " "`b4'", size(2)) ///
                                  text(`=6-`offset'' 1 "`a5'" " " "`b5'", size(2)) ///
                                  text(`=7-`offset'' 1 "`a6'" " " "`b6'", size(2)) ///
                                  text(`=8-`offset'' 1 "`a7'" " " "`b7'", size(2)) ///
                                  text(`=3-`offset'' 2 "`a8'" " " "`b8'", size(2)) ///
                                  text(`=4-`offset'' 2 "`a9'" " " "`b9'", size(2)) ///
                                  text(`=5-`offset'' 2 "`a10'" " " "`b10'", size(2)) ///
                                  text(`=6-`offset'' 2 "`a11'" " " "`b11'", size(2)) ///
                                  text(`=7-`offset'' 2 "`a12'" " " "`b12'", size(2)) ///
                                  text(`=8-`offset'' 2 "`a13'" " " "`b13'", size(2)) ///
                                  text(`=4-`offset'' 3 "`a14'" " " "`b14'", size(2)) ///
                                  text(`=5-`offset'' 3 "`a15'" " " "`b15'", size(2)) ///
                                  text(`=6-`offset'' 3 "`a16'" " " "`b16'", size(2)) ///
                                  text(`=7-`offset'' 3 "`a17'" " " "`b17'", size(2)) ///
                                  text(`=8-`offset'' 3 "`a18'" " " "`b18'", size(2)) ///
                                  text(`=5-`offset'' 4 "`a19'" " " "`b19'", size(2)) ///
                                  text(`=6-`offset'' 4 "`a20'" " " "`b20'", size(2)) ///
                                  text(`=7-`offset'' 4 "`a21'" " " "`b21'", size(2)) ///
                                  text(`=8-`offset'' 4 "`a22'" " " "`b22'", size(2)) ///
                                  text(`=6-`offset'' 5 "`a23'" " " "`b23'", size(2)) ///
                                  text(`=7-`offset'' 5 "`a24'" " " "`b24'", size(2)) ///
                                  text(`=8-`offset'' 5 "`a25'" " " "`b25'", size(2)) ///
                                  text(`=7-`offset'' 6 "`a26'" " " "`b26'", size(2)) ///
                                  text(`=8-`offset'' 6 "`a27'" " " "`b27'", size(2)) ///
                                  text(`=8-`offset'' 7 "`a28'" " " "`b28'", size(2)) ///
                                  plotregion(color(white) margin(zero)) xlab(, noticks) ylab(, noticks)
                              }




                              ​​​​​​​
                              Click image for larger version

Name:	statalist_results_D0_D56.png
Views:	1
Size:	244.0 KB
ID:	1720649


                              Click image for larger version

Name:	statalist_frequency_TrendB_D0_D56.png
Views:	1
Size:	240.1 KB
ID:	1720650

                              Comment

                              Working...
                              X