Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Measuring overlap of two dichotomous variables

    Not specifically a Stata issue, but here are so many experts in this friendly Forum:

    I am looking for a measure of overlap of two dichotomous variables that is symmetric (no causality claims can be made) and that is independent from the marginal distribution or the prevalence of either variable.

    The context: How do two variables measuring the prevalence of victimization overlap and how do groups (countries) differ in this respect? Ideally I would like to investigate (and describe) what factors are associated with these differences.

    I could describe the percentage of respondents being victim of both kinds of victimization and compare the percentage across groups. However, this percentage is affected by the prevalence of victimization in these groups. Another measure could be a simple correlation coefficient (Pearson), but as can be shown the size of the correlation is dependent on the base rate. This is not the case with association measures based on the odds ration such as Yule’s Y. But I wonder whether there are better alternatives.

    Bonett & Price suggest a generalized Yule coefficient (Bonnett, D. G., & Price, R. M. (2007). Statistical inference for generalized Yule coefficients in 2 × 2 contingency tables. Sociological Methods & Research, 35(3), 429–446. https://doi.org/10.1177/0049124106292358). Although I could write Stata syntax to calculate this coefficient that replicates their examples, I was not able to calculate (or reproduce) the standard error they report, partly because I do not fully understand their procedure. Perhaps there is someone out there who did this already or can assist?

    Matthijs Warrens (2008) published an article discussing various measures of associations of dichotomous variables (Warrens, M. J. (2008). On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73(4), 777–789. https://doi.org/10.1007/s11336-008-9070-3). Based on this Loevinger’s H might be another alternative. It is the proportion of observed to expected Guttman errors in the cell 0,1 or 1,0 -- depending on the ranking of difficulty (= prevalence) of the items. I am not sure whether this could be a better alternative than Yule’s Y or the generalized Yule coefficient.

    Any ideas of suggestions?


  • #2
    What about treating this as an *agreement* problem, and using some kind of margin-adjusted agreement measure?

    Comment


    • #3
      Thank you for the suggestion! Do you have a specific measure in mind? Perhaps one candidate is the tetrachoric correlation? If you think about measures of inter-rater reliability: The problem here is that I have dichotomous variables and that the "ratings" are not independent.

      Comment


      • #4
        I was thinking of (Cohen's) kappa, to start with, which involves comparing the level of observed agreement between the two diagnostic "judges" versus the amount of agreement that would be expected by chance based on the marginal distributions. That might or might not be the kind of adjustment for marginals that's relevant for your problem, and my knowledge here is not deep enough to go further in that direction. The best Stata source here is Daniel Klein's -kappatec- package (-search kappaetc-) because it offers a lot of alternatives and gives excellent documentation of the relevant literature. Although tetrachoric correlation would presumably be a relevant measure, I think the logic of measures like kappa would be more desirable.

        Comment

        Working...
        X