Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation between two categorical variables

    I have two questions regarding the correlation between two categorical variables.

    1) If using Cramer's V, I know that you can interpret the strength and direction of the relationship. However, how do you interpret say a positive relationship between two categorical variables that are both yes/no or 1/0 variables?

    2) I just want to confirm if the two categorical variables have two levels each (yes/no or 1/0), is Cramer's V appropriate or should I be using a different test?

    Thank you!

  • #2
    There are many measures of association one could use for a 2x2 table. Here are some common ones:
    • Risk ratio
    • Odds ratio
    • Risk difference
    • Phi coefficient (i.e., Pearson r computed on two dichotomies)
    Which one(s) you choose will likely depend on the context, including things like the discipline, whether one variable is an outcome and the other explanatory (vs simple association between variables with no such clear roles).

    HTH.
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Cramér's V is a measure of association corresponding to a chi-square test. If you want a test, use the latter or Fisher's exact test. The orthodox position seems to be that the latter is more focused on the specific problem but I've seen push-back against that. A different kind of problem is that the chi-square test is always easily computable but that's not necessarily true of FIsher's exact test.

      If you have two binary variables, the sign of any relationship just depends on conventions about which state is coded 0 and which 1. There is a grey area between a convention being natural and it being familiar. If anything is even a smidgen towards being causal, it seems usual to code both binaries to yield positive association. So being a smoker and getting lung cancer would be both be coded 1, and their opposites 0, but I guess associating with other code choices would at worst be thought awkward rather than wrong. And there are plenty of negative associations too.

      Comment


      • #4
        I so appreciate all of your responses this is very helpful! As a follow up question, could I use Cramer's V if I have one variable that has two categories and another variables that has four categories? So a 2x4. If not, what test of association would be appropriate?

        Comment


        • #5
          Again, as a test of association chi-square and Fisher's test could both be used. If the 4-category variable is ordered, there are more tests on offer.

          Comment


          • #6
            Yes, I understand the chi-square and Fisher's test can be used. However, my student is wanting to follow up on that to assess the strength and direction of the relationship.

            Comment


            • #7
              Here are some notes that may be helpful.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Seems to me that your resources include non-parametric methods books and introductory categorical data analysis books. You may have to shop around to find which is most congenial.

                Comment


                • #9
                  The mention of "direction" here would imply that the 4 category variable is ordered, and 2 categories are always ordered, so if one of these variables is regarded as explanatory and the other as response, I would strongly recommend Somers' D, about which see -ssc describe somersd- D is a measure of association, for which a test and CI exist. If there is no such explanatory/response distinction, I'd use Goodman and Kruskal's gamma, -tabulate Y X, gamma-, which is the original statistic on which D was based. If both variables are nominal, and the explanatory/response distinction holds, I'd strongly recommend Goodman and Kruskal's tau, which is known (but not well) to be an explained variation measure based on Simpson's measure of nominal variation. There's no Stata program for G & K's tau but it's only mildly tedious to calculate by hand, and can be aided by -ssc entropyetc-

                  The locus classicus is:
                  Goodman, L. A., and W. H. Kruskal. 1954. Measures of association for cross classifications. Journal of the American Statistical Association 49: 732–764.
                  Sociological statistics texts in the 1960s and 70s commonly treated the G & K statistics.


                  Finally, I'd note that there is also a ordinal X nominal measure of association, similar in concept to G & K's tau, about which see the article cited in my -r2o- package, -ssc describe r2o-.


                  Comment

                  Working...
                  X