Tabulating 2x2 with no data in one line

Robert Shaw

Join Date: Nov 2021

Posts: 37
#1

Tabulating 2x2 with no data in one line

30 Jun 2023, 07:34

Dear Statlisters,

May I ask for some advice please.

Background: I am running a loop that tabulates different subsets of two binary variables (A and B) each with levels called (for the sake of argument) "2" and "3" which should result in a number of 2x2 tables. I am exporting the frequency tables via putexcel.
Some of the tabulations contain no data points for one of the crosstabulation rows eg variable A contains only 2's in that subset of data.
Stata then very sensibly provides me with a 1x2 frequency table of results - why should Stata know that it should add an (essentially arbitrary) line of zeros labelled with "3"?

Question 1) How do I ask stata to present me with a 2x2 matrix (with two adjacent horizontal/vertical cells containing zeros) for all occurrences this happens in my loop?
I gather in other languages, one can define the data type as binary and give labels to the levels to suggest to that language that it should present both levels even if one level does not appear. In stata, I have been unable to figure out if going down this route is possible/sensible (as far as I can make out, all the data types are different numeric types without one specifically being binary).

Any help solving this problem would be hugely appreciated.

Follow up questions:
I am using these tabulations to work out correlation coefficients (as the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix which I can turn into a heatmap using the command heatplot, overlaying the correlation coefficient and p-value).

Question 2a) Since the Kramer's V statistic is calculated from the Chi2 statistic, the Chi2 p-value, would be valid to use as an indicator for statistical significance of Kramers V statistic - am I correct in my thinking? The correlation is assessed by Chi2 and a p-value produced, but I am choosing to use a transformation of the Chi2 statistic to Kramers to give a more accurate measure of the strength of correlation.

Question 2b) There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give confidence to the result is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier to the reader to present these 2x2 frequency tables in a similar layout to the correlation matrix which contains the correlation coefficients and p-values. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.

Kind regards
Robert Shaw
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 9945
#2

30 Jun 2023, 09:21

Probably best to ask different questions in different threads. The longer is your post, the less likely that people will be willing to answer. Also, it should not be difficult to provide a data example of your first question, which I encourage you to do. Include one group with both levels and one without to compare the created matrices.

Last edited by Andrew Musau; 30 Jun 2023, 09:23.
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

30 Jun 2023, 10:10

Dear Andrew,
Thank you for your response and your advice. Please find the first part of this question with data (20 observations only) via dataex to give an idea of data structure, followed by code with result on the complete dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
1 1 0 1
1 1 0 1
1 1 0 0
1 1 0 1
1 1 0 0
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
1 1 0 1
1 1 0 0
1 0 1 0
1 1 0 1
1 1 1 0
1 1 0 1
1 1 0 0
1 0 0 0
1 1 0 1
end

Code:

. tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 ,matcell(cellcounts)

TrendB_D0_ |
D56_1_1_1l |  TrendB_D0_D56_1_1_2
      abel |         0          1 |     Total
-----------+----------------------+----------
         1 |         4        103 |       107 
-----------+----------------------+----------
     Total |         4        103 |       107 

. matrix list cellcounts

cellcounts[1,2]
     c1   c2
r1    4  103

. tab TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2 ,matcell(cellcounts)

TrendB_D56 | TrendB_D56_D84_2_2_2
_D84_1_1_1 |         0          1 |     Total
-----------+----------------------+----------
         0 |        18         47 |        65 
         1 |        18         24 |        42 
-----------+----------------------+----------
     Total |        36         71 |       107 

. matrix list cellcounts

cellcounts[2,2]
    c1  c2
r1  18  47
r2  18  24

.

The first table displays only one line (1x2 matrix) with a value of 1, because in the subset of data being tabulated, there is no 0, whereas the second table displays a 2x2 matrix
Any guidance you could provide on this would be appreciated. I will post the other questions elsewhere

Kind regards
Robert Shaw

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

30 Jun 2023, 10:54

Run the program _2rowmat after each tabulation.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
1 1 0 1
1 1 0 1
1 1 0 0
1 1 0 1
1 1 0 0
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
1 1 0 1
1 1 0 0
1 0 1 0
1 1 0 1
1 1 1 0
1 1 0 1
1 1 0 0
1 0 0 0
1 1 0 1
end


capture program drop _2rowmat
program define _2rowmat, rclass

if r(r)==1 & k[1,1]==0{
    mat cellcounts= cellcounts\ [0,0]
    mat rowname cellcounts= r1 r2
}
if r(r)==1 & k[1,1]==1{ 
    mat cellcounts= [0,0]\cellcounts
    mat rowname cellcounts= r1 r2
}
end

tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
_2rowmat
mat list cellcounts

tab TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2 ,matcell(cellcounts) matrow(k)
_2rowmat
mat l cellcounts

Res.:

Code:

. tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)

TrendB_D0_ |  TrendB_D0_D56_1_1_2
 D56_1_1_1 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2         18 |        20 
-----------+----------------------+----------
     Total |         2         18 |        20 

. 
. _2rowmat

. 
. mat list cellcounts

cellcounts[2,2]
    c1  c2
r1   0   0
r2   2  18

. 
. 
. 
. tab TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2 ,matcell(cellcounts) matrow(k)

TrendB_D56 | TrendB_D56_D84_2_2_2
_D84_1_1_1 |         0          1 |     Total
-----------+----------------------+----------
         0 |         6         10 |        16 
         1 |         4          0 |         4 
-----------+----------------------+----------
     Total |        10         10 |        20 

. 
. _2rowmat

. 
. mat l cellcounts

cellcounts[2,2]
    c1  c2
r1   6  10
r2   4   0

.

Comment

Robert Shaw

Join Date: Nov 2021

Posts: 37
#5

30 Jun 2023, 11:22

Thank you Andrew! This looks fantastic! I will implement now. I will try tonight to make an equivalent _2colmat for the cases where the transverse case occurs with only 1 column.
Rob
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

30 Jun 2023, 17:34

Dear Andrew,
Many thanks for this. May I ask for further help please :

The output I'm sure is what I want, but I am having difficulty modifying my original code to get the correct column and row labels.

Code:

capture program drop _2rowmat
program define _2rowmat, rclass

if r(r)==1 & k[1,1]==0{
    mat cellcounts= cellcounts\ [0,0]
    mat rowname cellcounts= r1 r2
}
if r(r)==1 & k[1,1]==1{ 
    mat cellcounts= [0,0]\cellcounts
    mat rowname cellcounts= r1 r2
}
end

putexcel set "example.xlsx", modify

local counter=0
local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
foreach d of local a {
    local b `" "_1_1_1" "_1_1_2" "_1_2_1" "_1_2_2" "_3_1_1" "_3_1_2" "_3_2_1" "_3_2_2" "'
    local c `" "_1_1_1" "_1_1_2" "_1_2_1" "_1_2_2" "_3_1_1" "_3_1_2" "_3_2_1" "_3_2_2" "'
    foreach e of local b {
        local c: list c - e
        foreach f of local c {
            tab `d'`e' `d'`f' if "`e'"!="`f'" & SerostatusMSD==0 , matcell(cellcounts) matrow(k)
            _2rowmat
            levelsof `d'`e', local(g)
            levelsof `d'`f', local(h)
            matrix rownames cellcounts = `g'
            matrix colnames cellcounts = `h'
            local matlocn=3+`counter'
            local matlocn="B"+"`matlocn'"
            putexcel `matlocn' = matrix(cellcounts), names
            local rowlocn=2+`counter'
            local rowlocn="A"+"`rowlocn'"
            putexcel `rowlocn' = "`: variable label `d'`e''"
            local collocn=1+`counter'
            local collocn="B"+"`collocn'"
            putexcel `collocn' = "`: variable label `d'`f''"
            local counter=`counter'+6
        }
    }    
}

the output of my above code incorrectly labels the rows because I am still extracting the row names from the original cellcounts matrix (I believe), but I haven't quite been able to figure out how to get the new row names:

	TrendB_D0_D56_1_1_2
TrendB_D0_D56_1_1_1
		0	1
	1	0	0
	1	4	82

Additionally, in order to prepare for the eventuality that I would need to perform the same code for columns, I naively adapted your code and came up with something non-functional - I am completely unversed in creating stata programs

Code:

capture program drop _2colmat
program define _2colmat, rclass

if c(c)==1 & k[1,1]==0{
    mat cellcounts= cellcounts\ [0,0]
    mat colname cellcounts= c1 c2
}
if c(c)==1 & k[1,1]==1{ 
    mat cellcounts= [0,0]\cellcounts
    mat colname cellcounts= c1 c2
}
end

this did not work - may I ask for your help in troubleshooting this?

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

30 Jun 2023, 18:46

See

Code:

help matrix operators

Here, you want to join columns. You define a column as the transpose of the row vector and use the tabulate option -matcol()- instead of -matrow()-. You can set a condition which selects whether you need to add a column or a row. Do you know this beforehand?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
1 1 0 1
1 1 0 1
1 1 0 0
1 1 0 1
1 1 0 0
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
1 1 0 1
1 1 0 0
1 0 1 0
1 1 0 1
1 1 1 0
1 1 0 1
1 1 0 0
1 0 0 0
1 1 0 1
end


capture program drop _2colmat
program define _2colmat, rclass

if r(c)==1 & k[1,1]==0{
    mat cellcounts= cellcounts,([0,0])'
    mat colname cellcounts= c1 c2
}
if r(c)==1 & k[1,1]==1{ 
    mat cellcounts= ([0,0])',cellcounts
    mat colname cellcounts= c1 c2
}
end

tab TrendB_D0_D56_1_1_2 TrendB_D0_D56_1_1_1 , matcell(cellcounts) matcol(k)
_2colmat
mat list cellcounts

tab TrendB_D56_D84_2_2_2 TrendB_D56_D84_1_1_1,matcell(cellcounts) matcol(k)
_2colmat
mat l cellcounts

Res.:

Code:

. tab TrendB_D0_D56_1_1_2 TrendB_D0_D56_1_1_1 , matcell(cellcounts) matcol(k)

           | TrendB_D0_
TrendB_D0_ | D56_1_1_1
 D56_1_1_2 |         1 |     Total
-----------+-----------+----------
         0 |         2 |         2 
         1 |        18 |        18 
-----------+-----------+----------
     Total |        20 |        20 

. 
. _2colmat

. 
. mat list cellcounts

cellcounts[2,2]
    c1  c2
r1   0   2
r2   0  18

. 
. 
. 
. tab TrendB_D56_D84_2_2_2 TrendB_D56_D84_1_1_1,matcell(cellcounts) matcol(k)

TrendB_D56 | TrendB_D56_D84_1_1_1
_D84_2_2_2 |         0          1 |     Total
-----------+----------------------+----------
         0 |         6          4 |        10 
         1 |        10          0 |        10 
-----------+----------------------+----------
     Total |        16          4 |        20 

. 
. _2colmat

. 
. mat l cellcounts

cellcounts[2,2]
    c1  c2
r1   6   4
r2  10   0

.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 9945
#8

30 Jun 2023, 18:56

On the names of rows and columns, r1 and c1 correspond to the level 0 and r2 and c2 correspond to the level 1. I do not follow the question. The order in tabulate defines the row variable and the column variable. Perhaps use the data example in #3 to illustrate what you want.
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

01 Jul 2023, 16:25

Dear Andrew,
I was being unclear earlier. Thanks for this code. However, in the case of this loop, I am agnostic as to what may be missing and not missing. There are some cases where there is a first row missing, or a second row missing, or a first column missing or a second column missing. Additionally there are cases which are missing both a column and a row (ie only one of the four quadrants is populated).
I have manufactured a short bit of data below to demonstrate this:

Code:

clear
input float (A B C D E)
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
end

Below are all the potential tabulation outputs that may result. Is there a way to account for all of these in advance?

Code:

. tab A B

           |     B
         A |         0 |     Total
-----------+-----------+----------
         0 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. tab A C

           |     C
         A |         1 |     Total
-----------+-----------+----------
         0 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. tab C A

           |     A
         C |         0 |     Total
-----------+-----------+----------
         1 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. tab C D

           |     D
         C |         1 |     Total
-----------+-----------+----------
         1 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. tab A E

           |           E
         A |         0          1 |     Total
-----------+----------------------+----------
         0 |         6          6 |        12 
-----------+----------------------+----------
     Total |         6          6 |        12 

. tab C E

           |           E
         C |         0          1 |     Total
-----------+----------------------+----------
         1 |         6          6 |        12 
-----------+----------------------+----------
     Total |         6          6 |        12 

. tab E A

           |     A
         E |         0 |     Total
-----------+-----------+----------
         0 |         6 |         6 
         1 |         6 |         6 
-----------+-----------+----------
     Total |        12 |        12 

. tab E C

           |     C
         E |         1 |     Total
-----------+-----------+----------
         0 |         6 |         6 
         1 |         6 |         6 
-----------+-----------+----------
     Total |        12 |        12 

. tab E E

           |           E
         E |         0          1 |     Total
-----------+----------------------+----------
         0 |         6          0 |         6 
         1 |         0          6 |         6 
-----------+----------------------+----------
     Total |         6          6 |        12

with regards to my comment regarding row and column names, I was trying to make the code generic to rename the matrix rows and columns as per the levels of the cell counts matrix after it had been acted on by the program you had written. I got myself tied into knots, but then realised, I could just hardcode the 0 and 1 as I know this will be the desired output in this case:

Code:

matrix rownames cellcounts = 0 1
matrix colnames cellcounts = 0 1

using the data in 3 to demonstrate what I was trying to achieve:

Code:

. tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)

TrendB_D0_ |  TrendB_D0_D56_1_1_2
 D56_1_1_1 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2         18 |        20 
-----------+----------------------+----------
     Total |         2         18 |        20 

. _2rowmat

. mat list cellcounts

cellcounts[2,2]
    c1  c2
r1   0   0
r2   2  18

. levelsof(TrendB_D0_D56_1_1_1), local(a)
1

. levelsof(TrendB_D0_D56_1_1_2), local(b)
0 1

. matrix rownames cellcounts = `a'

. matrix colnames cellcounts = `b'

. mat list cellcounts

cellcounts[2,2]
    0   1
1   0   0
1   2  18

I was unable to name the first row as "0"

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

#10

02 Jul 2023, 13:03

See

Code:

help naming conventions

As with Stata names, matrix column names cannot start with numbers. So add a leading letter or underscore.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 TrendB_D56_D84_1_1_1 TrendB_D56_D84_2_2_2)
1 1 0 1
1 1 0 1
1 1 0 0
1 1 0 1
1 1 0 0
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
1 1 0 1
1 1 0 0
1 0 1 0
1 1 0 1
1 1 1 0
1 1 0 1
1 1 0 0
1 0 0 0
1 1 0 1
end

capture program drop _2rowmat
program define _2rowmat, rclass

if r(r)==1 & k[1,1]==0{
    mat cellcounts= cellcounts\ [0,0]
}
if r(r)==1 & k[1,1]==1{ 
    mat cellcounts= [0,0]\cellcounts
}
end

tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)
_2rowmat
mat rownames cellcounts= _0 _1
mat colnames cellcounts=_0 _1
mat list cellcounts

Res.:

Code:

. 
. tab TrendB_D0_D56_1_1_1 TrendB_D0_D56_1_1_2 , matcell(cellcounts) matrow(k)

TrendB_D0_ |  TrendB_D0_D56_1_1_2
 D56_1_1_1 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2         18 |        20 
-----------+----------------------+----------
     Total |         2         18 |        20 

. 
. _2rowmat

. 
. mat rownames cellcounts= _0 _1

. 
. mat colnames cellcounts=_0 _1

. 
. mat list cellcounts

cellcounts[2,2]
    _0  _1
_0   0   0
_1   2  18

.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 9945
#11

02 Jul 2023, 13:20

Below are all the potential tabulation outputs that may result. Is there a way to account for all of these in advance?

The singletons need more work. I will write something that captures all scenarios when I get the time.
Comment
Robert Shaw

Join Date: Nov 2021

Posts: 37
#12

02 Jul 2023, 14:00

Much obliged Andrew. I very much appreciate the time you are giving to help me. I will try reading up on stata programming in the interim fully to understand the above code you have written.
Kind regards
Robert Shaw
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

#13

03 Jul 2023, 01:49

Code:

clear
input float (A B C D E)
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
0 0 1 1 0
0 0 1 1 1
end

capture program drop mat2x2
program define mat2x2, rclass
if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==0{
    mat cellcounts= cellcounts, 0\ [0,0]
}
if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==0{
    mat cellcounts=  0, cellcounts\ [0,0]
}
if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==1{
    mat cellcounts= [0,0]\ 0, cellcounts
}
if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==1{
    mat cellcounts= [0,0]\cellcounts, 0
} 
if r(r)==1 & r(c)==2 & r[1,1]==0{
    mat cellcounts= cellcounts\ [0,0]
}
if r(r)==1 & r(c)==2 & r[1,1]==1{
    mat cellcounts= [0,0]\cellcounts
}
if r(r)==2 & r(c)==1 & c[1,1]==0{
    mat cellcounts= cellcounts,([0,0])'
}
if r(r)==2 & r(c)==1 & c[1,1]==1{
    mat cellcounts= ([0,0])',cellcounts
}
mat colnames cellcounts= _0 _1
mat rownames cellcounts= _0 _1
mat list cellcounts
end

tab A B, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab A C, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab C A, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab C D, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab A E, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab C E, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab E A, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab E C, matcell(cellcounts) matcol(c) matrow(r)
mat2x2


tab E E, matcell(cellcounts) matcol(c) matrow(r)
mat2x2

Res.:

Code:

. tab A B, matcell(cellcounts) matcol(c) matrow(r)

           |     B
         A |         0 |     Total
-----------+-----------+----------
         0 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. 
. mat2x2

symmetric cellcounts[2,2]
    _0  _1
_0  12
_1   0   0

. 
. 
. 
. 
. 
. tab A C, matcell(cellcounts) matcol(c) matrow(r)

           |     C
         A |         1 |     Total
-----------+-----------+----------
         0 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. 
. mat2x2

cellcounts[2,2]
    _0  _1
_0   0  12
_1   0   0

. 
. 
. 
. 
. 
. tab C A, matcell(cellcounts) matcol(c) matrow(r)

           |     A
         C |         0 |     Total
-----------+-----------+----------
         1 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. 
. mat2x2

cellcounts[2,2]
    _0  _1
_0   0   0
_1  12   0

. 
. 
. 
. 
. 
. tab C D, matcell(cellcounts) matcol(c) matrow(r)

           |     D
         C |         1 |     Total
-----------+-----------+----------
         1 |        12 |        12 
-----------+-----------+----------
     Total |        12 |        12 

. 
. mat2x2

symmetric cellcounts[2,2]
    _0  _1
_0   0
_1   0  12

. 
. 
. 
. 
. 
. tab A E, matcell(cellcounts) matcol(c) matrow(r)

           |           E
         A |         0          1 |     Total
-----------+----------------------+----------
         0 |         6          6 |        12 
-----------+----------------------+----------
     Total |         6          6 |        12 

. 
. mat2x2

cellcounts[2,2]
    _0  _1
_0   6   6
_1   0   0

. 
. 
. 
. 
. 
. tab C E, matcell(cellcounts) matcol(c) matrow(r)

           |           E
         C |         0          1 |     Total
-----------+----------------------+----------
         1 |         6          6 |        12 
-----------+----------------------+----------
     Total |         6          6 |        12 

. 
. mat2x2

cellcounts[2,2]
    _0  _1
_0   0   0
_1   6   6

. 
. 
. 
. 
. 
. tab E A, matcell(cellcounts) matcol(c) matrow(r)

           |     A
         E |         0 |     Total
-----------+-----------+----------
         0 |         6 |         6 
         1 |         6 |         6 
-----------+-----------+----------
     Total |        12 |        12 

. 
. mat2x2

cellcounts[2,2]
    _0  _1
_0   6   0
_1   6   0

. 
. 
. 
. 
. 
. tab E C, matcell(cellcounts) matcol(c) matrow(r)

           |     C
         E |         1 |     Total
-----------+-----------+----------
         0 |         6 |         6 
         1 |         6 |         6 
-----------+-----------+----------
     Total |        12 |        12 

. 
. mat2x2

cellcounts[2,2]
    _0  _1
_0   0   6
_1   0   6

. 
. 
. 
. 
. 
. tab E E, matcell(cellcounts) matcol(c) matrow(r)

           |           E
         E |         0          1 |     Total
-----------+----------------------+----------
         0 |         6          0 |         6 
         1 |         0          6 |         6 
-----------+----------------------+----------
     Total |         6          6 |        12 

. 
. mat2x2

symmetric cellcounts[2,2]
    _0  _1
_0   6
_1   0   6

.

Comment

Robert Shaw

Join Date: Nov 2021

Posts: 37
#14

03 Jul 2023, 03:37

Dear Andrew,
many thanks for this. I am away from my computer for a few days, but reading through the code, it is instructive to me as well as solving my problem. Many thanks. I will post the finished code and results covering all the aspects you’ve helped me with over the last few days for the reference of other statalisters.
kind regards and thanks again
Robert Shaw
Comment

Robert Shaw

Join Date: Nov 2021
Posts: 37

#15

15 Jul 2023, 16:11

Dear Andrew,
Your code worked excellently. Thank you. I have, below, prepared a summary of my problem, your help and my comments on it, should it be of help to others on statalist in the future. Many thanks again.

Background: I am running a loop that cross-tabulates different combinations binary variables, which should result in a number of 2x2 tables. I want to use these tabulations to work out correlation coefficients. Since the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix (rather than cross tabulating a whole series of binary variables as originally planned) which I can turn into a heatmap using the command heatplot, overlaying the correlation coefficient and p-value). See helpful advice by Andrew Musau here: https://www.statalist.org/forums/for...about-heatplot

There is, however, a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give confidence to the result is also to show the 2x2 frequency table for each cell in a separate, but parallel 'frequencies' correlation matrix heatplot, as it will be easier to read with the 2x2 matrices in the same orientation as the 'results' heatplot.

This step was complicated by the fact that some of the 2x2 cross-tabulations are 'incomplete'. By incomplete, I mean that there are some cases where the first row is missing (as the horizontal cross-tabulation variable does not contain any '0's'), or a second row missing (as the horizontal cross-tabulation variable does not contain any '1's'), or a first column missing (same for the vertical variable) or a second column missing. Additionally there are cases which are missing both a column and a row (ie only one of the four quadrants is populated). In this case, Stata very sensibly provides a 1x2 matrix (or 1x1 matrix) of frequency results - after all, why should Stata know that it should add an (essentially arbitrary) line of zeros labelled with "0" or "1"? In order to demonstrate the data clearly, it would be helpful to have the 2x2 matrices 'completed' with lines of columns of zeros as appropriate.

Working backwards, Andrew kindly took the time to make a small program that manipulates the output matrices from tabulate to create 'completed' matrices:

Code:

capture program drop mat2x2
program define mat2x2, rclass
if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==0 {
    mat cellcounts = cellcounts, 0 \ [0,0]
}
if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==0 {
    mat cellcounts = 0, cellcounts \ [0,0]
}
if r(r)==1 & r(c)==1 & c[1,1]==1 & r[1,1]==1 {
    mat cellcounts = [0,0] \ 0, cellcounts
}
if r(r)==1 & r(c)==1 & c[1,1]==0 & r[1,1]==1 {
    mat cellcounts = [0,0] \ cellcounts, 0
}
if r(r)==1 & r(c)==2 & r[1,1]==0 {
    mat cellcounts = cellcounts \ [0,0]
}
if r(r)==1 & r(c)==2 & r[1,1]==1 {
    mat cellcounts = [0,0] \ cellcounts
}
if r(r)==2 & r(c)==1 & c[1,1]==0 {
    mat cellcounts = cellcounts, ([0,0])'
}
if r(r)==2 & r(c)==1 & c[1,1]==1 {
    mat cellcounts = ([0,0])', cellcounts
}
mat colnames cellcounts= 0 1
mat rownames cellcounts= 0 1
mat list cellcounts
end

r(r) is the returned result for the number of rows, r(c) is the returned result for the number of columns, the r[1,1] and c[1,1] refer to the rownames and colnames that have been prospectively specified in the tabulate command options ie is the value of the row/column label equal to 0 or 1. The matcell options outputs a separate returned matrix result (in this case called cellcounts) which is what I am trying to access:

Code:

tab `a' `b' matcell(cellcounts) matrow(r) matcol(c)

In order to get these 2x2 matrices into the heatmap format, the output cell counts matrix is then subsetted by line, and put into the cell as text, with a line break to make things clear. Below please find the code for producing both the 'results' heatmap and the 'frequencies' heatmap:

Code:

local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
foreach var1 of local a {
    pwcorr `var1'_1_2_2 `var1'_1_2_1 `var1'_1_1_2 `var1'_1_1_1 `var1'_3_2_2 `var1'_3_2_1 `var1'_3_1_2 `var1'_3_1_1, sig
    matrix C = r(C)
    matrix sig = r(sig)

    // apply heatplot without displaying a graph, just to collect information;
    // option generate stores the information (coordinates etc) as variables
    heatplot C, values(label(sig)) lower nodraw generate nodiagonal
    // significance stars
    gen str sig = string(_Mlab, "%5.4f")
    replace sig = string(_Mlab, "%5.4f")+"*" if _Mlab<0.0004464 // 0.05 / ((7+6+5+4+3+2+1) * 4 timepoint intervals) - this is a manual Bonferroni correction
    // second call to heatplot useing addplot to print the marker labels
    local b=subinstr("`var1'", "TrendB_", "", .)
    heatplot C, colors(blue white red, ipolate(110)) cuts(-1.02(0.02)1.02) lower nodiagonal aspectratio(1) xlabel(,labsize(small) angle(45)) ylabel(,labsize(small)) legend(off) ///
    addplot(scatter _Y _X if _Y!=_X, msym(i) mlab(_Z) mlabf(%9.2f) mlabpos(0) mlabc(black) ///
    || scatter _Y _X if _Y==_X, msym(i) mlab(_Z) mlabf(%9.0f) mlabpos(0) mlabc(black) ///
    || scatter _Y _X if _Y!=_X, msym(i) mlab(sig) mlabpos(6) mlabgap(2) mlabc(black) mlabsize(vsmall))
    drop _Z-sig
}

and the companion frequencies heatplot:

Code:

local a `" "TrendB_D0_D56" "TrendB_D56_D84" "TrendB_D84_D112" "TrendB_D112_D182" "'
foreach d of local a {
    local counter=1
    qui corr `d'_1_2_2 `d'_1_2_1 `d'_1_1_2 `d'_1_1_1 `d'_3_2_2 `d'_3_2_1 `d'_3_1_2 `d'_3_1_1
    matrix W = r(C)
    local b `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
    local i `" "_1_2_2" "_1_2_1" "_1_1_2" "_1_1_1" "_3_2_2" "_3_2_1" "_3_1_2" "_3_1_1" "'
    foreach e of local b {
        local i: list i - e //to make an all against all cross tabulation without repeats
        foreach f of local i {
            tab `d'`e' `d'`f' if "`e'"!="`f'" , matcell(cellcounts) matrow(r) matcol(c) //ensure tabulation options correctly specified
            mat2x2 //instigate previously defined program
            local g=cellcounts[1,1]
            local h=cellcounts[2,1]
            local a`counter'= "`g'" + "  " + "`h'" //make local for first line of string insertion
            local g=cellcounts[1,2]
            local h=cellcounts[2,2]
            local b`counter'= "`g'" + "  " + "`h'" //make local for second line of string insertion
            local counter=`counter'+1
        }
    }
    local offset 0
    heatplot W,  colors(blue white red, ipolate(110)) cuts(-1.02(0.02)1.02) lower nodiagonal aspectratio(1) xlabel(,labsize(small) angle(45)) ylabel(,labsize(small))legend(off) ///
    text(`=2-`offset'' 1 "`a1'" " " "`b1'", size(2)) ///
    text(`=3-`offset'' 1 "`a2'" " " "`b2'", size(2)) ///
    text(`=4-`offset'' 1 "`a3'" " " "`b3'", size(2)) ///
    text(`=5-`offset'' 1 "`a4'" " " "`b4'", size(2)) ///
    text(`=6-`offset'' 1 "`a5'" " " "`b5'", size(2)) ///
    text(`=7-`offset'' 1 "`a6'" " " "`b6'", size(2)) ///
    text(`=8-`offset'' 1 "`a7'" " " "`b7'", size(2)) ///
    text(`=3-`offset'' 2 "`a8'" " " "`b8'", size(2)) ///
    text(`=4-`offset'' 2 "`a9'" " " "`b9'", size(2)) ///
    text(`=5-`offset'' 2 "`a10'" " " "`b10'", size(2)) ///
    text(`=6-`offset'' 2 "`a11'" " " "`b11'", size(2)) ///
    text(`=7-`offset'' 2 "`a12'" " " "`b12'", size(2)) ///
    text(`=8-`offset'' 2 "`a13'" " " "`b13'", size(2)) ///
    text(`=4-`offset'' 3 "`a14'" " " "`b14'", size(2)) ///
    text(`=5-`offset'' 3 "`a15'" " " "`b15'", size(2)) ///
    text(`=6-`offset'' 3 "`a16'" " " "`b16'", size(2)) ///
    text(`=7-`offset'' 3 "`a17'" " " "`b17'", size(2)) ///
    text(`=8-`offset'' 3 "`a18'" " " "`b18'", size(2)) ///
    text(`=5-`offset'' 4 "`a19'" " " "`b19'", size(2)) ///
    text(`=6-`offset'' 4 "`a20'" " " "`b20'", size(2)) ///
    text(`=7-`offset'' 4 "`a21'" " " "`b21'", size(2)) ///
    text(`=8-`offset'' 4 "`a22'" " " "`b22'", size(2)) ///
    text(`=6-`offset'' 5 "`a23'" " " "`b23'", size(2)) ///
    text(`=7-`offset'' 5 "`a24'" " " "`b24'", size(2)) ///
    text(`=8-`offset'' 5 "`a25'" " " "`b25'", size(2)) ///
    text(`=7-`offset'' 6 "`a26'" " " "`b26'", size(2)) ///
    text(`=8-`offset'' 6 "`a27'" " " "`b27'", size(2)) ///
    text(`=8-`offset'' 7 "`a28'" " " "`b28'", size(2)) ///
    plotregion(color(white) margin(zero)) xlab(, noticks) ylab(, noticks)
}

Announcement