Effect of missing data and outliers in logistic regression model

Chinonso Odebeatu

Join Date: Jul 2018

Posts: 18
#1

Effect of missing data and outliers in logistic regression model

19 Jul 2018, 10:25

Hello

First, I am using a survey data for logistic regression analysis. I noticed that my predictor variable as well as some con-founders had some outliers and missing values. However, the missing values are less than 10% for all the predictors but my concern is if the missing values and outliers had a significant effect on the overall result of the model. I have run the logistic regression after adjusting for these con-founders and the results were non-significant.

Second, after fitting the model using the code - svy: logistic outcome x y z a b - I used the code estat gof to check the fitness of my model and the result was
F(9,41) = 0.55
Prob > F = 0.8326
Please can someone interpret the meaning of this. when are we supposed to say that the model fit very well? at what p-value? also, am expected to use the command with "svy" or just the way I have done it?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

19 Jul 2018, 11:02

Chinonso:
welcome to this forum.
Unfortunately, questions like the one you posted are at high risk of being left unreplied.
We do not know your data and we cannot see the outcome tha Stata gave you back.
You state that you have adjusted your -logit- for thos confounders, but I cannot understand from your description whether you dealt with the missing data or not (and if yes, in which way).
Set aside apparent mistakes in data entry, ouliers are often a matter of fact: some variables have long tails.
The outcome of the -gof- test tells you that your model fits your data well.

Kind regards,
Carlo
(Stata 19.0)
Comment
Chinonso Odebeatu

Join Date: Jul 2018

Posts: 18
#3

20 Jul 2018, 03:22

Many thanks for your reply Carlo

I am using the NHANES data ( which is a survey data). The predictor (the urinary phthalate concentrations - about six different phthalates- had 242 missing data out of 7765 measured concentration. Some of my confounding variable such as poverty to income ratio, waist circumference, cotinine level, urinary creatinine all had missing data. Please how would I resolved the missing data before using it for the logistic regression modelling given that I am using survey data? for example see below

651.4 | 1 0.01 96.87
730.25 | 1 0.01 96.88
. | 242 3.12 100.00
------------+-----------------------------------
Total | 7,765 100.00

.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

20 Jul 2018, 04:22

Chinonso:
any missing data issue requires investigating the mechanism and the pattern underlying the missingness and, only eventually, deciding how to deal with it.
You can start from the -mi- entry in Stata .pdf manual, that also reports some useful reference.

Kind regards,
Carlo
(Stata 19.0)
Comment
Zerihun Hordofa

Join Date: May 2020

Posts: 7
#5

21 May 2020, 09:52

I am a beginner for stata.... when I doing logistic regression the following happen ........outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome I have also try to see on the forum discussion but
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

21 May 2020, 09:58

Zerihun:
welcome to this forum.
Stata messages means exactly what it tells: your dependent variables (coded 0/1) does not vary across observations, making -logit- or -logistic- estimations unfeasible.
Just:

Code:

table <depvar>

and take a look at what's the matter with your data.
As an aside, for the future, please start a new thread with an informative subject, as your query has nothing to do with the original one. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Zerihun Hordofa

Join Date: May 2020

Posts: 7
#7

21 May 2020, 10:06

Many thanks for your prompt reply Carlo
table AWARNESS

----------------------
Awarness |
of the |
two |
Woreda | Freq.
----------+-----------
0 | 325
1 | 193

output like this
regards
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

21 May 2020, 10:11

Zerihun:
can you please share an excerpt/example of your data via -dataex-? Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Zerihun Hordofa

Join Date: May 2020

Posts: 7
#9

21 May 2020, 10:54

Many thanks for your patience Carlo
. dataex AWARENESS

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte AWARNESS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 518 observations
Use the count() option to list more
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

21 May 2020, 11:14

Zerihun:
my bad, I was probably unclear.
The excerpt of your data should include predictors, too. Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Zerihun Hordofa

Join Date: May 2020

Posts: 7
#11

21 May 2020, 11:18

Here is some of my data
Thank you
Attached Files

draft.dta (159.3 KB, 1 view)
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

#12

21 May 2020, 11:30

Zerihun:
That's what I got running -logit- with a handful of your predictors:

Code:

. logit AWARNESS i.age_group i.WOREDA i.RELIGION i.MSTATUS i.OSTATUS

note: 1.age_group != 0 predicts failure perfectly
      1.age_group dropped and 3 obs not used

note: 6.age_group != 0 predicts failure perfectly
      6.age_group dropped and 7 obs not used

note: 2.RELIGION != 0 predicts success perfectly
      2.RELIGION dropped and 1 obs not used

note: 3.RELIGION != 1 predicts failure perfectly
      3.RELIGION dropped and 6 obs not used

note: 3.age_group != 0 predicts success perfectly
      3.age_group dropped and 4 obs not used

note: 7.age_group != 0 predicts success perfectly
      7.age_group dropped and 1 obs not used

note: 2.MSTATUS != 1 predicts failure perfectly
      2.MSTATUS dropped and 1 obs not used

note: 2.age_group != 0 predicts success perfectly
      2.age_group dropped and 3 obs not used

note: 5.age_group omitted because of collinearity
note: 1.WOREDA omitted because of collinearity
note: 4.RELIGION omitted because of collinearity
note: 4.MSTATUS omitted because of collinearity
note: 2.OSTATUS omitted because of collinearity
note: 4.OSTATUS omitted because of collinearity
Iteration 0:   log likelihood = -2.2493406
Iteration 1:   log likelihood = -2.2493406

Logistic regression                             Number of obs     =          4
                                                LR chi2(0)        =       0.00
                                                Prob > chi2       =          .
Log likelihood = -2.2493406                     Pseudo R2         =     0.0000

------------------------------------------------------------------------------------
          AWARNESS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         age_group |
                1  |          0  (empty)
                2  |          0  (empty)
                3  |          0  (empty)
                5  |          0  (omitted)
                6  |          0  (empty)
                7  |          0  (empty)
                   |
            WOREDA |
Comparison Woreda  |          0  (omitted)
                   |
          RELIGION |
           MUSLIM  |          0  (empty)
         ORTHODOX  |          0  (omitted)
      PROTENSTANT  |          0  (empty)
                   |
           MSTATUS |
           SINGLE  |          0  (empty)
                   |
           OSTATUS |
        EMPLOYEED  |          0  (empty)
           FARMER  |          0  (omitted)
         MARCHANT  |          0  (empty)
                   |
             _cons |   1.098612   1.154701     0.95   0.341    -1.164559    3.361784
------------------------------------------------------------------------------------

.

Your dataset has some critical issues:
- perfect prediction of the regressand for many independent variables;
- missing values;
- collinerrity (this holds particularly true for categorical variables).

You may want to try to deal with missing values (see -mi- suite of commands in Stata .pdf manual) and reduce the categorical variables that show perfect collinearity (see -estat vce, corr-)..

Kind regards,
Carlo
(Stata 19.0)

Comment

Zerihun Hordofa

Join Date: May 2020

Posts: 7
#13

21 May 2020, 12:27

Thank you very Much Carlo
I will back to you after ...mi- suite of commands in Stata...
With Kind Regards,
Zerihun
Comment
Zerihun Hordofa

Join Date: May 2020

Posts: 7
#14

30 May 2020, 12:25

Carlo
Thank you for your Last time Advice.
I try to clean my data and try to analysis ..... and I got a problem to save mrtab by putdocx

With Kind Regards,
Zerihun
Comment
Zerihun Hordofa

Join Date: May 2020

Posts: 7
#15

30 May 2020, 12:36

Carlo
My other question is by which statical test can I test a Quasi-experimental Designs with Comparison Group's posttest only?

With Kind Regards,
Zerihun
Comment

Announcement