Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • appropriate correlation for binary and county variables

    Hello,

    I have two variables that I want to test the correlation of. One is binary, one is a county variable that goes up to three. My understanding is that, generally speaking, for continuous variables use Pearson's (code: pwcorr) for ordinal variables use Spearmans (code: spearman), and for binary variables use tetrachoric. Based on this understanding, I was going to go with spearman. But I'm not sure if it is appropriate if one of my variable is binary, and the other only goes up to three. What is the best way forward? Thanks!
    Last edited by Nora Romeo; 08 Mar 2022, 10:57.

  • #2
    Nora:
    I would go Spearman's or Kendall's correlations (see -help spearman-).
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      What's the county variable coded 1 2 3? If it's just an arbitrary code, it's not clear to me that any correlation makes sense. You might just look at the mean of your binary variable by each county.

      Comment


      • #4
        Just to check, by "county", did you mean (ordered) "categorical"? If it is, Spearman is acceptable.

        Tetra and polychoric correlations are a specialist thing. People interested in psychometrics/measurement would use these. I think they're not necessary for the general audience. If you are doing factor analysis, then I think they are necessary, but again that's a specialist application.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Given that Spearman's rho is a Pearson's r computed on ranks, and considering that with ordered categorical data as in the current situation ranks don't seem very meaningful, I'd recommend some categorical measure of association here. If there is no sense of a causal ordering between the two variables, I'd use Kendall's tau-b or gamma (-help ktau- or -tabulate-); if there is a meaningful causal ordering, I'd use Somers' D (-ssc describe somersd-), which is a too-little recognized but very useful measure. All of these measures are functions of the number of concordant and discordant pairs and their sense does not rest on assigning ranks. (In this context, I'd say that the reference in -help ktau- to "Kendall's rank correlation coefficients" is unfortunate.) To wit:
          Code:
          sysuse auto, clear
          ktau foreign rep78
          tabulate foreign rep78, gamma
          somersd foreign rep78 // presumes predictor first, response variable(s) following
          One afterthought: I suppose the county variable might *not* be ordered (although "up to three" implies it is). In that case, there are nominal measures of association to be considered.

          Comment

          Working...
          X