Logistic Regression Pairwise Deletion

Jenny Guadamuz

Join Date: Nov 2015

Posts: 18
#1

Logistic Regression Pairwise Deletion

06 Apr 2016, 23:30

Hello Statalist Experts,

I was wondering if it is possible to run a logistic regression with pairwise deletion of missing observations instead of case deletion. I am running a logistic regression which has a lot missing information in several of the independent variables (10%-20% among 6 key variables). The dependent variable also has a lot of missingness 20% among individuals from region and less than1% in the second region. I know I can impute this variables, however I am unsure if I should impute some variables and not others. For example, I am not comfortable imputing the dependent variable.

Background: I am using two health surveys from two different to examine the use of medicines among individuals who are indicated to use said medicine. All the analysis need to account for the survey weights.

Thanks for the advise!
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#2

07 Apr 2016, 02:21

Pairwise deletion is technically possible in linear regression, but it will lead to biased results. So that is not a good solution. You might be able to get a similarly biased estimate if you have a fully saturated model, but a) a fully saturated model is not that interesting, and b) biased estimates are not that interesting. So that is an even worse solution. Instead, you should really look into multiple imputation. Imputing the dependent variable is not a problem (actually not imputing the dependent variable is a problem). A short readable text on this is: Paul D. Allison (2002) Missing Data. Thousand Oaks: Sage.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#3

07 Apr 2016, 08:39

[quqote]Pairwise deletion is technically possible in linear regression, [/quote]
Maarten Buis: I'm curious what you mean by that. One could calculate a pairwise cross-product matrix and then attempt to solve the "normal equations using that. But pairwise cross-product matrices can fail to be positive definite, so I don't know where you would go from there. How is it technically possible, except in lucky cases?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#4

07 Apr 2016, 08:43

That is exactly what I meant. I haven't tried it myself, I only read about it in Allison (2002) (reference in #1).

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4945
#5

07 Apr 2016, 11:38

One could calculate a pairwise cross-product matrix and then attempt to solve the "normal equations using that. But pairwise cross-product matrices can fail to be positive definite, so I don't know where you would go from there. How is it technically possible, except in lucky cases?

I don't know that you have to be that lucky, e.g. I suspect that you can often/usually get a positive definite matrix with pairwise deletion. But actually, you may be more unlucky if that does happen, because you may get nonsensical results but it won't be obvious to you that that is the case.

Anyway, if, in Stata, somebody really wanted to use pairwise deletion, I think you could create the correlation matrix with pwcorr and then use corr2data to create a data set with the specified correlations. I don't know how you decide what N is though -- maybe use the smallest N for any of the correlations?

Anyway, I agree that pairwise deletion is generally a bad idea and I would recommend multiple imputation instead. Or maybe even just listwise deletion depending on how big the hit is. Paul Allson says "If listwise deletion still leaves you with a large sample, you might reasonably prefer it over maximum likelihood or multiple imputation. At the least, you should think carefully about the relative advantages and disadvantages of these methods, and not dismiss listwise deletion out of hand." See

http://statisticalhorizons.com/listw...n-its-not-evil

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jenny Guadamuz

Join Date: Nov 2015

Posts: 18
#6

07 Apr 2016, 12:53

Thank you Maarten, Clyde, and Richard for your advise.

You've sold me on not using pairwise deletion. However, I cannot use listwise deletion because it drops my number of observations from approximately 4000 observations with about 2000 observations coming from each country to 1800 observations with 1200 observations from one country and the second country only providing 600 complete cases.

Do any of you have an opinion on adding a categorical to indicate that the variable is missing in these observations?

In the mean time, I am going to start reading Missing Data to determine how to complete multiple imputation correctly

Last edited by Jenny Guadamuz; 07 Apr 2016, 13:05.
Comment
Jenny Guadamuz

Join Date: Nov 2015

Posts: 18
#7

07 Apr 2016, 13:09

Do any of you have an opinion on adding a categorical to indicate that the variable is missing in these observations?

After starting to read Missing Data I understand why this is probably not appropriate due to the biased coefficients that are produced.

I will update you all after I try to run multiple imputation on the dependent and independent variables while accounting for the survey weights.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4945
#8

07 Apr 2016, 13:26

Multiple imputation on the dependent variable tends to gain you little or nothing.

Allison's book is excellent. Other possible references include

https://www.ssc.wisc.edu/sscc/pubs/stata_mi_intro.htm

http://www.statalist.org/forums/foru...vey-data/page2

http://www3.nd.edu/~rwilliam/xsoc73994/MD02.pdf

If you want to use mi and svy together, the correct sequence is

mi estimate: svy: estimation_command ...

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

Logistic Regression Pairwise Deletion

Comment

Comment

Comment

Comment

Comment

Comment

Comment