Dear Statlisters,
May I ask for some advice please.
Background: I am running a loop that tabulates different subsets of two binary variables (A and B) each with levels called (for the sake of argument) "2" and "3" which should result in a number of 2x2 tables. I am exporting the frequency tables via putexcel.
Some of the tabulations contain no data points for one of the crosstabulation rows eg variable A contains only 2's in that subset of data.
Stata then very sensibly provides me with a 1x2 frequency table of results - why should Stata know that it should add an (essentially arbitrary) line of zeros labelled with "3"?
Question 1) How do I ask stata to present me with a 2x2 matrix (with two adjacent horizontal/vertical cells containing zeros) for all occurrences this happens in my loop?
I gather in other languages, one can define the data type as binary and give labels to the levels to suggest to that language that it should present both levels even if one level does not appear. In stata, I have been unable to figure out if going down this route is possible/sensible (as far as I can make out, all the data types are different numeric types without one specifically being binary).
Any help solving this problem would be hugely appreciated.
Follow up questions:
I am using these tabulations to work out correlation coefficients (as the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix which I can turn into a heatmap using the command heatplot, overlaying the correlation coefficient and p-value).
Question 2a) Since the Kramer's V statistic is calculated from the Chi2 statistic, the Chi2 p-value, would be valid to use as an indicator for statistical significance of Kramers V statistic - am I correct in my thinking? The correlation is assessed by Chi2 and a p-value produced, but I am choosing to use a transformation of the Chi2 statistic to Kramers to give a more accurate measure of the strength of correlation.
Question 2b) There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give confidence to the result is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier to the reader to present these 2x2 frequency tables in a similar layout to the correlation matrix which contains the correlation coefficients and p-values. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.
Kind regards
Robert Shaw
May I ask for some advice please.
Background: I am running a loop that tabulates different subsets of two binary variables (A and B) each with levels called (for the sake of argument) "2" and "3" which should result in a number of 2x2 tables. I am exporting the frequency tables via putexcel.
Some of the tabulations contain no data points for one of the crosstabulation rows eg variable A contains only 2's in that subset of data.
Stata then very sensibly provides me with a 1x2 frequency table of results - why should Stata know that it should add an (essentially arbitrary) line of zeros labelled with "3"?
Question 1) How do I ask stata to present me with a 2x2 matrix (with two adjacent horizontal/vertical cells containing zeros) for all occurrences this happens in my loop?
I gather in other languages, one can define the data type as binary and give labels to the levels to suggest to that language that it should present both levels even if one level does not appear. In stata, I have been unable to figure out if going down this route is possible/sensible (as far as I can make out, all the data types are different numeric types without one specifically being binary).
Any help solving this problem would be hugely appreciated.
Follow up questions:
I am using these tabulations to work out correlation coefficients (as the data is binary I am using Kramer's V as my correlation coefficient, which, handily, in this case, gives the same numeric value as if I were to use Pearson's correlation coefficient, thus allowing me to use pwcorr to produce a correlation matrix which I can turn into a heatmap using the command heatplot, overlaying the correlation coefficient and p-value).
Question 2a) Since the Kramer's V statistic is calculated from the Chi2 statistic, the Chi2 p-value, would be valid to use as an indicator for statistical significance of Kramers V statistic - am I correct in my thinking? The correlation is assessed by Chi2 and a p-value produced, but I am choosing to use a transformation of the Chi2 statistic to Kramers to give a more accurate measure of the strength of correlation.
Question 2b) There is a risk that a high correlation coefficient can be driven by most of the results falling into one cell of the 2x2 matrix. The only way I can think of to give confidence to the result is also to show the 2x2 frequency table for each cell in the correlation matrix heatplot. Rather than presenting a long list of 2x2 frequency matrices, it would be easier to the reader to present these 2x2 frequency tables in a similar layout to the correlation matrix which contains the correlation coefficients and p-values. I am at a complete loss as to how it might be possible to do this - any advice would be hugely appreciated.
Kind regards
Robert Shaw
Comment