Measuring overlap of two dichotomous variables

Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#1

Measuring overlap of two dichotomous variables

17 Jan 2025, 13:43

Not specifically a Stata issue, but here are so many experts in this friendly Forum:

I am looking for a measure of overlap of two dichotomous variables that is symmetric (no causality claims can be made) and that is independent from the marginal distribution or the prevalence of either variable.

The context: How do two variables measuring the prevalence of victimization overlap and how do groups (countries) differ in this respect? Ideally I would like to investigate (and describe) what factors are associated with these differences.

I could describe the percentage of respondents being victim of both kinds of victimization and compare the percentage across groups. However, this percentage is affected by the prevalence of victimization in these groups. Another measure could be a simple correlation coefficient (Pearson), but as can be shown the size of the correlation is dependent on the base rate. This is not the case with association measures based on the odds ration such as Yule’s Y. But I wonder whether there are better alternatives.

Bonett & Price suggest a generalized Yule coefficient (Bonnett, D. G., & Price, R. M. (2007). Statistical inference for generalized Yule coefficients in 2 × 2 contingency tables. Sociological Methods & Research, 35(3), 429–446. https://doi.org/10.1177/0049124106292358). Although I could write Stata syntax to calculate this coefficient that replicates their examples, I was not able to calculate (or reproduce) the standard error they report, partly because I do not fully understand their procedure. Perhaps there is someone out there who did this already or can assist?

Matthijs Warrens (2008) published an article discussing various measures of associations of dichotomous variables (Warrens, M. J. (2008). On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73(4), 777–789. https://doi.org/10.1007/s11336-008-9070-3). Based on this Loevinger’s H might be another alternative. It is the proportion of observed to expected Guttman errors in the cell 0,1 or 1,0 -- depending on the ranking of difficulty (= prevalence) of the items. I am not sure whether this could be a better alternative than Yule’s Y or the generalized Yule coefficient.

Any ideas of suggestions?
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2400
#2

17 Jan 2025, 20:12

What about treating this as an *agreement* problem, and using some kind of margin-adjusted agreement measure?
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#3

18 Jan 2025, 06:02

Thank you for the suggestion! Do you have a specific measure in mind? Perhaps one candidate is the tetrachoric correlation? If you think about measures of inter-rater reliability: The problem here is that I have dichotomous variables and that the "ratings" are not independent.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2400
#4

18 Jan 2025, 07:59

I was thinking of (Cohen's) kappa, to start with, which involves comparing the level of observed agreement between the two diagnostic "judges" versus the amount of agreement that would be expected by chance based on the marginal distributions. That might or might not be the kind of adjustment for marginals that's relevant for your problem, and my knowledge here is not deep enough to go further in that direction. The best Stata source here is Daniel Klein's -kappatec- package (-search kappaetc-) because it offers a lot of alternatives and gives excellent documentation of the relevant literature. Although tetrachoric correlation would presumably be a relevant measure, I think the logic of measures like kappa would be more desirable.
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#5

15 Feb 2025, 09:15

Finally I did decide to use Loevinger's H (see also here) as a measure of overlap because it is "more directed to positive association than to negative association" (Warrens, 2008, p. 787) -- note that I am interested to compare the cell percentage of cell d of a 2x2-table across groups. Otherwise I could also have used the tetrachoric correlation. In this context, Loevinger (1948, p. 524) states: "The tetrachoric coefficient will be unity in the case of items in a test homogeneous as to content but not as to difficulty."

References:
Loevinger, J. A. (1948). The technique of homogeneous tests compared with some aspects of scale analysis and factor analysis. Psychological Bulletin, 45, 507–530.

Warrens, M. J. (2008). On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73(4), 777-789. https://doi.org/10.1007/s11336-008-9070-3

Last edited by Dirk Enzmann; 15 Feb 2025, 09:18.
1 like
Comment

Announcement

Measuring overlap of two dichotomous variables

Comment

Comment

Comment

Comment