Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic Regression Pairwise Deletion

    Hello Statalist Experts,

    I was wondering if it is possible to run a logistic regression with pairwise deletion of missing observations instead of case deletion. I am running a logistic regression which has a lot missing information in several of the independent variables (10%-20% among 6 key variables). The dependent variable also has a lot of missingness 20% among individuals from region and less than1% in the second region. I know I can impute this variables, however I am unsure if I should impute some variables and not others. For example, I am not comfortable imputing the dependent variable.

    Background: I am using two health surveys from two different to examine the use of medicines among individuals who are indicated to use said medicine. All the analysis need to account for the survey weights.

    Thanks for the advise!

  • #2
    Pairwise deletion is technically possible in linear regression, but it will lead to biased results. So that is not a good solution. You might be able to get a similarly biased estimate if you have a fully saturated model, but a) a fully saturated model is not that interesting, and b) biased estimates are not that interesting. So that is an even worse solution. Instead, you should really look into multiple imputation. Imputing the dependent variable is not a problem (actually not imputing the dependent variable is a problem). A short readable text on this is: Paul D. Allison (2002) Missing Data. Thousand Oaks: Sage.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      [quqote]Pairwise deletion is technically possible in linear regression, [/quote]
      Maarten Buis: I'm curious what you mean by that. One could calculate a pairwise cross-product matrix and then attempt to solve the "normal equations using that. But pairwise cross-product matrices can fail to be positive definite, so I don't know where you would go from there. How is it technically possible, except in lucky cases?

      Comment


      • #4
        That is exactly what I meant. I haven't tried it myself, I only read about it in Allison (2002) (reference in #1).
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          One could calculate a pairwise cross-product matrix and then attempt to solve the "normal equations using that. But pairwise cross-product matrices can fail to be positive definite, so I don't know where you would go from there. How is it technically possible, except in lucky cases?
          I don't know that you have to be that lucky, e.g. I suspect that you can often/usually get a positive definite matrix with pairwise deletion. But actually, you may be more unlucky if that does happen, because you may get nonsensical results but it won't be obvious to you that that is the case.

          Anyway, if, in Stata, somebody really wanted to use pairwise deletion, I think you could create the correlation matrix with pwcorr and then use corr2data to create a data set with the specified correlations. I don't know how you decide what N is though -- maybe use the smallest N for any of the correlations?

          Anyway, I agree that pairwise deletion is generally a bad idea and I would recommend multiple imputation instead. Or maybe even just listwise deletion depending on how big the hit is. Paul Allson says "If listwise deletion still leaves you with a large sample, you might reasonably prefer it over maximum likelihood or multiple imputation. At the least, you should think carefully about the relative advantages and disadvantages of these methods, and not dismiss listwise deletion out of hand." See

          http://statisticalhorizons.com/listw...n-its-not-evil
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Thank you Maarten, Clyde, and Richard for your advise.

            You've sold me on not using pairwise deletion. However, I cannot use listwise deletion because it drops my number of observations from approximately 4000 observations with about 2000 observations coming from each country to 1800 observations with 1200 observations from one country and the second country only providing 600 complete cases.

            Do any of you have an opinion on adding a categorical to indicate that the variable is missing in these observations?

            In the mean time, I am going to start reading Missing Data to determine how to complete multiple imputation correctly
            Last edited by Jenny Guadamuz; 07 Apr 2016, 13:05.

            Comment


            • #7
              Do any of you have an opinion on adding a categorical to indicate that the variable is missing in these observations?
              After starting to read Missing Data I understand why this is probably not appropriate due to the biased coefficients that are produced.

              I will update you all after I try to run multiple imputation on the dependent and independent variables while accounting for the survey weights.

              Comment


              • #8
                Multiple imputation on the dependent variable tends to gain you little or nothing.

                Allison's book is excellent. Other possible references include

                https://www.ssc.wisc.edu/sscc/pubs/stata_mi_intro.htm

                http://www.statalist.org/forums/foru...vey-data/page2

                http://www3.nd.edu/~rwilliam/xsoc73994/MD02.pdf

                If you want to use mi and svy together, the correct sequence is

                mi estimate: svy: estimation_command ...
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X