Not specifically a Stata issue, but here are so many experts in this friendly Forum:
I am looking for a measure of overlap of two dichotomous variables that is symmetric (no causality claims can be made) and that is independent from the marginal distribution or the prevalence of either variable.
The context: How do two variables measuring the prevalence of victimization overlap and how do groups (countries) differ in this respect? Ideally I would like to investigate (and describe) what factors are associated with these differences.
I could describe the percentage of respondents being victim of both kinds of victimization and compare the percentage across groups. However, this percentage is affected by the prevalence of victimization in these groups. Another measure could be a simple correlation coefficient (Pearson), but as can be shown the size of the correlation is dependent on the base rate. This is not the case with association measures based on the odds ration such as Yule’s Y. But I wonder whether there are better alternatives.
Bonett & Price suggest a generalized Yule coefficient (Bonnett, D. G., & Price, R. M. (2007). Statistical inference for generalized Yule coefficients in 2 × 2 contingency tables. Sociological Methods & Research, 35(3), 429–446. https://doi.org/10.1177/0049124106292358). Although I could write Stata syntax to calculate this coefficient that replicates their examples, I was not able to calculate (or reproduce) the standard error they report, partly because I do not fully understand their procedure. Perhaps there is someone out there who did this already or can assist?
Matthijs Warrens (2008) published an article discussing various measures of associations of dichotomous variables (Warrens, M. J. (2008). On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73(4), 777–789. https://doi.org/10.1007/s11336-008-9070-3). Based on this Loevinger’s H might be another alternative. It is the proportion of observed to expected Guttman errors in the cell 0,1 or 1,0 -- depending on the ranking of difficulty (= prevalence) of the items. I am not sure whether this could be a better alternative than Yule’s Y or the generalized Yule coefficient.
Any ideas of suggestions?
I am looking for a measure of overlap of two dichotomous variables that is symmetric (no causality claims can be made) and that is independent from the marginal distribution or the prevalence of either variable.
The context: How do two variables measuring the prevalence of victimization overlap and how do groups (countries) differ in this respect? Ideally I would like to investigate (and describe) what factors are associated with these differences.
I could describe the percentage of respondents being victim of both kinds of victimization and compare the percentage across groups. However, this percentage is affected by the prevalence of victimization in these groups. Another measure could be a simple correlation coefficient (Pearson), but as can be shown the size of the correlation is dependent on the base rate. This is not the case with association measures based on the odds ration such as Yule’s Y. But I wonder whether there are better alternatives.
Bonett & Price suggest a generalized Yule coefficient (Bonnett, D. G., & Price, R. M. (2007). Statistical inference for generalized Yule coefficients in 2 × 2 contingency tables. Sociological Methods & Research, 35(3), 429–446. https://doi.org/10.1177/0049124106292358). Although I could write Stata syntax to calculate this coefficient that replicates their examples, I was not able to calculate (or reproduce) the standard error they report, partly because I do not fully understand their procedure. Perhaps there is someone out there who did this already or can assist?
Matthijs Warrens (2008) published an article discussing various measures of associations of dichotomous variables (Warrens, M. J. (2008). On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73(4), 777–789. https://doi.org/10.1007/s11336-008-9070-3). Based on this Loevinger’s H might be another alternative. It is the proportion of observed to expected Guttman errors in the cell 0,1 or 1,0 -- depending on the ranking of difficulty (= prevalence) of the items. I am not sure whether this could be a better alternative than Yule’s Y or the generalized Yule coefficient.
Any ideas of suggestions?
Comment