Inverse probability weighted logistic regression

Konrad Heller

Join Date: Jun 2022

Posts: 9
#1

Inverse probability weighted logistic regression

01 Sep 2022, 11:43

Dear Statalisters,

I have a cross-sectional survey dataset (N ~ 500,000) with approximately 50% non-response. I have demographic data on almost all individuals in the original ~ 500,000 sample, whether they responded to the survery or not. I've run a logistic regression to examine which characteristics (e.g., ethnicity) are associated with non-response:

logistic response i.ethnicity

As an example, this gives me an odds ratio of 0.83 (i.e., ethnicity x was less likely than ethnicity y to respond to the survey)

I want to incorporate inverse probability weights to generate an new set of adjusted odds ratios that take into account the increased rate of non-response in certain demographics, following methods described by Hoefler et al., 2005 (https://pubmed.ncbi.nlm.nih.gov/15834780/).

I've calculated weights as follows:

logistic response i.ethnicity
predict ipw

replace ipw = 1 - ipw if response == 1
replace ipw = 1 / (1-ipw) if response == 0

However, when I run the adjusted regression:

logistic response i.ethnicity [pw = ipw]

The output returns an odds ratio of 1 for all ethnic groups, which means I've almost certainly misunderstood how to apply the weights correctly. The weights themselves look OK to me (i.e., they are larger for observations from ethnic groups that are less preponderant, and vice versa). What's going on here? I'd expect the OR to be similar to the unweighted estimate.

Any advice would be greatly appreciated.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

01 Sep 2022, 12:04

You calculated the weights using the wrong outcome variable. What you need here are weights that reflect the inverse probability of having a non-missing response. So:

Code:

gen byte responded = !missing(response) logistic responded i.ethnicity predict ipw replace ipw = 1/ipw if responded == 1 replace ipw = 1 / (1-ipw) if responded == 0 logistic response i.ethnicity [pw = ipw]

Note that the weights are calculated from the new variable responded, not the original response variable. Note also that when responded == 1. the weight should be 1/ipw, not 1-ipw.
1 like
Comment
Konrad Heller

Join Date: Jun 2022

Posts: 9
#3

01 Sep 2022, 12:52

Thanks a lot Clyde for your help, but I'm not sure I understand.

The variable response (my outcome variable) is a binary indicator (1 if they responded, 0 otherwise). It doesn't contain any missing values. I'm sorry that I wasn't clear about this initially. Thus, when I run the code you suggested:

Code:

gen byte responded = !missing(response)

The new responded variable consequently has the value 1 for all observations in the dataset, which means that it can't run the logistic regression:

Code:

logistic responded i.ethnicity

As the outcome doesn't vary. Or am I missing something?

Note also that when responded == 1. the weight should be 1/ipw, not 1-ipw

Thanks, that was a typo on my part. It's correct in my STATA environment, which I access through a VPN that prevents me from copying over lines of code.

Is there more information that I can provide to provide add clarity perhaps?

Many thanks again.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

01 Sep 2022, 12:55

Then what did you mean when you said

I have demographic data on almost all individuals in the original ~ 500,000 sample, whether they responded to the survery or not.

For the ones who did not respond to the survey, the response variable would be missing, right?
Comment
Konrad Heller

Join Date: Jun 2022

Posts: 9
#5

01 Sep 2022, 14:08

I've probably complicated things by trying to simplify them! Basically, I have a datset A, which I've merged with a dataset B (N ~ 500,000). Around 50% of observations from B match records in A. I'm trying to generate a list of predictors for successful matching. Since individuals with certain characeristics might be underrepresented in the matched portion of dataset B, I thought I could correct for that using weights.

I've generated an equivalent example using simulated data. Perhaps that will clarify where I'm going wrong with this:

Code:

clear set obs 1000 set seed 11 gen response = runiformint(0,1) set seed 24 gen sex = runiformint(0,1) * Massaging the data here a bit so that there's an effect of sex replace response = 0 in 150/300 replace sex = 0 in 150/300 label def sexlab 0 "Male" 1 "Female label val sex sexlab logistic response i.sex predict ipw replace ipw = 1/ipw if response == 1 replace ipw = 1 / (1-ipw) if response == 0 logistic response i.sex [pw = ipw]

This is essentially equivalent to what my data looks like, although it contains some missing data for some of the demographic variables that I want to use as predictors. Perhaps this will clarify where I'm going wrong and where I'm misunderstanding things.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

01 Sep 2022, 16:55

Originally posted by Konrad Heller View Post

...

Code:

logistic response i.ethnicity predict ipw replace ipw = 1 - ipw if response == 1 replace ipw = 1 / (1-ipw) if response == 0 [logistic response i.ethnicity [pw = ipw]

The output returns an odds ratio of 1 for all ethnic groups, which means I've almost certainly misunderstood how to apply the weights correctly. The weights themselves look OK to me (i.e., they are larger for observations from ethnic groups that are less preponderant, and vice versa). What's going on here? I'd expect the OR to be similar to the unweighted estimate.

Any advice would be greatly appreciated.

Let's take a step back. We might choose IPW to correct for non-response bias when we fit a model on a different outcome.

Here, you are creating an IPW model using race. You're then applying the race-based weights to a model for the probability of response - which is the same thing you based the weights on. I don't know how to explain it exactly, but intuitively, the weights are based on race, and they're cancelling out the effect of race in the regression.

If you apply those IPWs to a regression for a different outcome, you're fine. It's just that you are applying IPWs for non-response based on race to a regression on the probability of response. This is not an interesting question. If you were interested in the probability of response by race, you should have stopped at the first regression, or built on it.

I have a feeling that if you pick any Stata dataset and you try to recreate this type of example, you'll get similar results.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Inverse probability weighted logistic regression

Comment

Comment

Comment

Comment

Comment