Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to handle Missing Values for Skipped or logic Questions in cross-sectional dataset

    Hi,

    I recently worked with HIV dataset. I have a question: how to handle Missing Values for Skipped or logic Questions? I searched in Google, but still not find the answer for this spefic question. The detailed description of this question as followed.

    Two questions:

    1. Eversex: Have you ever had sexual intercourse?
    Yes............................................... .......... 1
    No................................................ .......... 0
    No response 9

    2. Condom_use: Did you and/or your partner use condoms at last sex?
    Yes............................................... .......... 1
    No................................................ .......... 0

    Only when Eversex=1, participants were asked the question of Condom_use.

    So there are many missing data in Condom_use, If I delete the missing data, I will lose great than 70% samples, which will decrease the statistical power.
    I need to put Eversex and Condom_use in one model, but when I put them in one model, the Stata will report error if Ido not handle the missing values in Condom_use.

    I heard about some missing data techniques, like Multiple imputation, but I do not know in this case, what is the best method to handle the missing data showed in Condom_use due to skipped or logic question.

    Can I just recode the missing=2 in Condom_use, and then I choose to ignore the OR about Condom_use=2, only focused on the OR about Condom_use=1

    If I leave them as missing, it still have problem when I fit Condom_use in logistic models.

    The biggest question is that in my Multivariable logistic regression model:
    logistic Eversex Age Condom_use x1 x2 x3 x4 x5

    Stata reported wrong when I execute this model.

    The second question is that if I put Condom_use in any other models, the Number of obs = 215 out of 760.

    Could you please give me some advise?

    Thanks very much.

  • #2
    Hui:
    welcome to this forum.
    Questions regarding intimacy are often skipped by compilers.
    Most questionnnaires offer guidance on how to deal with missing domains.
    Otherwise, if there's no chance in heaven (and/or elsewhere, where the temperature is usually high(er)) ) to retrieve the missing data, -mi- is the way to go, with the warning that >70% missing data is kinda overkill.
    Last edited by Carlo Lazzaro; 29 Mar 2022, 01:52.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I slightly and respectfully disagree with Carlo. Multiple imputation assumes (in a substantive sense rather than purely statistical) that the missing values "mask" some underlying "true" but unknown value. Now, there are two kinds of missing values for Condom use: those who report not having had sex and those who refuse to answer the question (about sex and/or condom use). Only for the latter group is the answer unknown. The true value for the first group is known (given that they did really not have sex): No, they have never used a condom. There is no uncertainty here. You can simply plug-in the correct and known value. Whether using this group in the analyses is reasonable, I cannot tell. The model that you show predicts having sex as a function of condom use, which makes little sense to me.

      Comment


      • #4
        Daniel is obviously correct and I was not precise in my previous reply..
        Hui is actually dealing with two types of unreported values: only those who refused to reply are missing.
        As far as -logistic- regerssion is concerned, Hui did not report the error/warning message thrown by Stata.
        Does it depends on missing values (and related casewise deletion)? Is it an outcome perfect prediction issue?
        Last edited by Carlo Lazzaro; 29 Mar 2022, 04:13.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Hi Dr. Lazzaro and klein,

          Thanks for your reply.

          When I run the model, the Stata reported error as followed:



          . logistic Eversex Age Condom_use
          outcome does not vary; remember:
          0 = negative outcome,
          all other nonmissing values = positive outcome

          r(2000);

          end of do-file

          r(2000);

          Comment


          • #6
            Hi Dr. klein,

            You mean that
            Eversex and Condom_use should not put in one model, is it right?

            And I should recode the missing data=0 due to reporting not having had sex.

            Thanks.

            Comment


            • #7
              Hui:
              the error mesage is sel-explained.
              Probably due to missing values (that Stata rules out from -logistic- via listwise deletion), you ended up with a subsample of your original, dataset that shows an invariant regressand.
              Therefore, the -logistic- machinery cannot work.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Dr. Lazzaro,

                My boss ask me to run the model: logistic Eversex Age Condom_use

                I think it does not make sense to put the Eversex and Condom_use in one model, but I do not know how to give my boss a reasonable explain. So I think my current work is to handle the missing value due to reporting not having had sex.

                Thanks.

                Comment


                • #9
                  I think it does not make sense to put the Eversex and Condom_use in one model, but I do not know how to give my boss a reasonable explain. So I think my current work is to handle the missing value due to reporting not having had sex.
                  And what will you do if your boss asks you to buy a bus ticket from New York to London? When a boss asks for something that is mathematically or physically impossible, your responsibility is to educate your boss about the problem and propose with an alternative. I think what we have here is an X-Y problem. Your boss' goal is X and he or she mistakenly thinks that doing Y will accomplish that goal. It will not, in this case because Y is impossible. So you need to find a way to accomplish X, different from Y, that is actually feasible.

                  Moreover, as Daniel Klein has helpfully pointed out, modeling Eversex as the outcome variable with condom use as a predictor makes no sense. I cannot help thinking that some other outcome is intended, with both Eversex and Condom use as predictors of that outcome.

                  So I think you first need to clarify what outcome is actually under study. Then, in order to capture both effects of condom use and having ever had sex as predictors, since condom use is not possible in the absence of having had sex, I would combine those two variables into a single, three category variable taking on values 0 = Eversex = No, condom use response irrelevant; 1 = Eversex = Yes, condom use response = No; 2 = Eversex = yes, Condom use response = Yes. For the combination of Eversex = Yes and Condom use response missing, this variable should itself be missing.

                  As you have not provided example data and there is clearly much confusion as to what is going on here, I'm not going to try to write code for this at this time. If you need more specific advice, when posting back with questions please show example data, and use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                  Comment

                  Working...
                  X