Dear Statalisters,
I have a cross-sectional survey dataset (N ~ 500,000) with approximately 50% non-response. I have demographic data on almost all individuals in the original ~ 500,000 sample, whether they responded to the survery or not. I've run a logistic regression to examine which characteristics (e.g., ethnicity) are associated with non-response:
logistic response i.ethnicity
As an example, this gives me an odds ratio of 0.83 (i.e., ethnicity x was less likely than ethnicity y to respond to the survey)
I want to incorporate inverse probability weights to generate an new set of adjusted odds ratios that take into account the increased rate of non-response in certain demographics, following methods described by Hoefler et al., 2005 (https://pubmed.ncbi.nlm.nih.gov/15834780/).
I've calculated weights as follows:
logistic response i.ethnicity
predict ipw
replace ipw = 1 - ipw if response == 1
replace ipw = 1 / (1-ipw) if response == 0
However, when I run the adjusted regression:
logistic response i.ethnicity [pw = ipw]
The output returns an odds ratio of 1 for all ethnic groups, which means I've almost certainly misunderstood how to apply the weights correctly. The weights themselves look OK to me (i.e., they are larger for observations from ethnic groups that are less preponderant, and vice versa). What's going on here? I'd expect the OR to be similar to the unweighted estimate.
Any advice would be greatly appreciated.
I have a cross-sectional survey dataset (N ~ 500,000) with approximately 50% non-response. I have demographic data on almost all individuals in the original ~ 500,000 sample, whether they responded to the survery or not. I've run a logistic regression to examine which characteristics (e.g., ethnicity) are associated with non-response:
logistic response i.ethnicity
As an example, this gives me an odds ratio of 0.83 (i.e., ethnicity x was less likely than ethnicity y to respond to the survey)
I want to incorporate inverse probability weights to generate an new set of adjusted odds ratios that take into account the increased rate of non-response in certain demographics, following methods described by Hoefler et al., 2005 (https://pubmed.ncbi.nlm.nih.gov/15834780/).
I've calculated weights as follows:
logistic response i.ethnicity
predict ipw
replace ipw = 1 - ipw if response == 1
replace ipw = 1 / (1-ipw) if response == 0
However, when I run the adjusted regression:
logistic response i.ethnicity [pw = ipw]
The output returns an odds ratio of 1 for all ethnic groups, which means I've almost certainly misunderstood how to apply the weights correctly. The weights themselves look OK to me (i.e., they are larger for observations from ethnic groups that are less preponderant, and vice versa). What's going on here? I'd expect the OR to be similar to the unweighted estimate.
Any advice would be greatly appreciated.
Comment