Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect of missing data and outliers in logistic regression model

    Hello

    First, I am using a survey data for logistic regression analysis. I noticed that my predictor variable as well as some con-founders had some outliers and missing values. However, the missing values are less than 10% for all the predictors but my concern is if the missing values and outliers had a significant effect on the overall result of the model. I have run the logistic regression after adjusting for these con-founders and the results were non-significant.

    Second, after fitting the model using the code - svy: logistic outcome x y z a b - I used the code estat gof to check the fitness of my model and the result was
    F(9,41) = 0.55
    Prob > F = 0.8326
    Please can someone interpret the meaning of this. when are we supposed to say that the model fit very well? at what p-value? also, am expected to use the command with "svy" or just the way I have done it?

  • #2
    Chinonso:
    welcome to this forum.
    Unfortunately, questions like the one you posted are at high risk of being left unreplied.
    We do not know your data and we cannot see the outcome tha Stata gave you back.
    You state that you have adjusted your -logit- for thos confounders, but I cannot understand from your description whether you dealt with the missing data or not (and if yes, in which way).
    Set aside apparent mistakes in data entry, ouliers are often a matter of fact: some variables have long tails.
    The outcome of the -gof- test tells you that your model fits your data well.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Many thanks for your reply Carlo

      I am using the NHANES data ( which is a survey data). The predictor (the urinary phthalate concentrations - about six different phthalates- had 242 missing data out of 7765 measured concentration. Some of my confounding variable such as poverty to income ratio, waist circumference, cotinine level, urinary creatinine all had missing data. Please how would I resolved the missing data before using it for the logistic regression modelling given that I am using survey data? for example see below

      651.4 | 1 0.01 96.87
      730.25 | 1 0.01 96.88
      . | 242 3.12 100.00
      ------------+-----------------------------------
      Total | 7,765 100.00

      .


      Comment


      • #4
        Chinonso:
        any missing data issue requires investigating the mechanism and the pattern underlying the missingness and, only eventually, deciding how to deal with it.
        You can start from the -mi- entry in Stata .pdf manual, that also reports some useful reference.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I am a beginner for stata.... when I doing logistic regression the following happen ........outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome I have also try to see on the forum discussion but

          Comment


          • #6
            Zerihun:
            welcome to this forum.
            Stata messages means exactly what it tells: your dependent variables (coded 0/1) does not vary across observations, making -logit- or -logistic- estimations unfeasible.
            Just:
            Code:
            table <depvar>
            and take a look at what's the matter with your data.
            As an aside, for the future, please start a new thread with an informative subject, as your query has nothing to do with the original one. Thanks.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Many thanks for your prompt reply Carlo
              table AWARNESS

              ----------------------
              Awarness |
              of the |
              two |
              Woreda | Freq.
              ----------+-----------
              0 | 325
              1 | 193

              output like this
              regards

              Comment


              • #8
                Zerihun:
                can you please share an excerpt/example of your data via -dataex-? Thanks.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Many thanks for your patience Carlo
                  . dataex AWARENESS

                  ----------------------- copy starting from the next line -----------------------
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte AWARNESS
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  1
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  0
                  1
                  1
                  1
                  0
                  end
                  ------------------ copy up to and including the previous line ------------------

                  Listed 100 out of 518 observations
                  Use the count() option to list more



                  Comment


                  • #10
                    Zerihun:
                    my bad, I was probably unclear.
                    The excerpt of your data should include predictors, too. Thanks.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Here is some of my data
                      Thank you
                      Attached Files

                      Comment


                      • #12
                        Zerihun:
                        That's what I got running -logit- with a handful of your predictors:
                        Code:
                        . logit AWARNESS i.age_group i.WOREDA i.RELIGION i.MSTATUS i.OSTATUS
                        
                        note: 1.age_group != 0 predicts failure perfectly
                              1.age_group dropped and 3 obs not used
                        
                        note: 6.age_group != 0 predicts failure perfectly
                              6.age_group dropped and 7 obs not used
                        
                        note: 2.RELIGION != 0 predicts success perfectly
                              2.RELIGION dropped and 1 obs not used
                        
                        note: 3.RELIGION != 1 predicts failure perfectly
                              3.RELIGION dropped and 6 obs not used
                        
                        note: 3.age_group != 0 predicts success perfectly
                              3.age_group dropped and 4 obs not used
                        
                        note: 7.age_group != 0 predicts success perfectly
                              7.age_group dropped and 1 obs not used
                        
                        note: 2.MSTATUS != 1 predicts failure perfectly
                              2.MSTATUS dropped and 1 obs not used
                        
                        note: 2.age_group != 0 predicts success perfectly
                              2.age_group dropped and 3 obs not used
                        
                        note: 5.age_group omitted because of collinearity
                        note: 1.WOREDA omitted because of collinearity
                        note: 4.RELIGION omitted because of collinearity
                        note: 4.MSTATUS omitted because of collinearity
                        note: 2.OSTATUS omitted because of collinearity
                        note: 4.OSTATUS omitted because of collinearity
                        Iteration 0:   log likelihood = -2.2493406
                        Iteration 1:   log likelihood = -2.2493406
                        
                        Logistic regression                             Number of obs     =          4
                                                                        LR chi2(0)        =       0.00
                                                                        Prob > chi2       =          .
                        Log likelihood = -2.2493406                     Pseudo R2         =     0.0000
                        
                        ------------------------------------------------------------------------------------
                                  AWARNESS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------------+----------------------------------------------------------------
                                 age_group |
                                        1  |          0  (empty)
                                        2  |          0  (empty)
                                        3  |          0  (empty)
                                        5  |          0  (omitted)
                                        6  |          0  (empty)
                                        7  |          0  (empty)
                                           |
                                    WOREDA |
                        Comparison Woreda  |          0  (omitted)
                                           |
                                  RELIGION |
                                   MUSLIM  |          0  (empty)
                                 ORTHODOX  |          0  (omitted)
                              PROTENSTANT  |          0  (empty)
                                           |
                                   MSTATUS |
                                   SINGLE  |          0  (empty)
                                           |
                                   OSTATUS |
                                EMPLOYEED  |          0  (empty)
                                   FARMER  |          0  (omitted)
                                 MARCHANT  |          0  (empty)
                                           |
                                     _cons |   1.098612   1.154701     0.95   0.341    -1.164559    3.361784
                        ------------------------------------------------------------------------------------
                        
                        .
                        Your dataset has some critical issues:
                        - perfect prediction of the regressand for many independent variables;
                        - missing values;
                        - collinerrity (this holds particularly true for categorical variables).

                        You may want to try to deal with missing values (see -mi- suite of commands in Stata .pdf manual) and reduce the categorical variables that show perfect collinearity (see -estat vce, corr-)..
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Thank you very Much Carlo
                          I will back to you after ...mi- suite of commands in Stata...
                          With Kind Regards,
                          Zerihun

                          Comment


                          • #14
                            Carlo
                            Thank you for your Last time Advice.
                            I try to clean my data and try to analysis ..... and I got a problem to save mrtab by putdocx

                            With Kind Regards,
                            Zerihun

                            Comment


                            • #15
                              Carlo
                              My other question is by which statical test can I test a Quasi-experimental Designs with Comparison Group's posttest only?

                              With Kind Regards,
                              Zerihun

                              Comment

                              Working...
                              X