Unusual patterns for Multiple Logistic Regression using DHS data

Sumit Karn

Join Date: Feb 2018

Posts: 3
#1

Unusual patterns for Multiple Logistic Regression using DHS data

28 May 2018, 14:00

Hi,

I am relatively new for stata. I am using STATA 15.1 in mac.

I'm trying to explore the determinants of an outcome of interest (stunting, yes/no) using DHS data. There are more than 50 independent variables. I ran the univariate regression analysis for each independent variable. I have categorized the independent variables into 5 different groups for which I ran multiple logistic regression for each set with stunting as dependent variable. When I am including all independent variables in one model to run the logistic regression, the results (OR, 95% CI) show unusual pattern.

_cons 1.200048 6.594145 0.03 0.974 .000023 62572.93

I'd like to know:

1) Whether the mentioned model is right approach for analysis to answer my research question?
2) How do I know which model best fit for logistic regression analysis?
3) How do I test the statistical significance of different models for best fit?

Your guidance and help will be appreciable.

Thank you so much
Sumit
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

28 May 2018, 14:29

Your questions presume that the reader knows what, specifically, your research goals are, what regression command you actually used to create that output, and what exactly you consider unusual about that one line of output you showed.

To get a helpful response to question 1, you need to provide much more information. At the very least, show the exact command you ran, and the full output you got from Stata, and explain what bothers you about the results for the constant term.

As for question 2, unlike linear regression models, there are two distinct aspects of model fit with logistic regression: discrimination and calibration. So choosing a "best fit" model may be a compromise between those: among a set of models the one with best discrimination is not necessarily the one that has best calibration. So this is a decision that requires a judgment based on your understanding of the importance of discrimination and calibration for your research questions and cannot be answered by an outsider. The -lroc- command will give you the area under the ROC curve, which measures discrimination, and -estat gof, group(#) table- (where you specify an appropriate value for #) will give you the Hosmer Lemeshow calibration statistics). If you are not familiar with these statistics, consult a textbook on logistic regression.
Comment
Sumit Karn

Join Date: Feb 2018

Posts: 3
#3

28 May 2018, 14:48

Dear Clyde,

Thank you so much for your response !

For running the model, I used following command:
. svy:logistic stunted i.age_mc i.bsize i.age_mw i.bmi_cat_women i.ht_women ///
> i.b_intrvl b_order i.wdds i.wsmoke i.treat_water i.ODF i.handwash i.cooking_fuel i.access_hf i.p_delivery i.anc ///
> i.s653c i.exp_media i.radio_health i.v025 i.v024 i.v190 i.secoreg i.v106 i.m_occupation i.hh_size ///
> i.foodsec i.ethnicity i.int_usew i.v169a i.decide i.EIBF i.EBF i.MMF if hw1 < 60 & hv103==1 & hw70 < 9990

and the output is:
Number of strata = 14 Number of obs = 235
Number of PSUs = 157 Population size = 227.937422
Design df = 143
F( 58, 86) = 1.36
Prob > F = 0.0963

---------------------------------------------------------------------------------------------
| Linearized
stunted | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
age_mc |
12-17 | 2.017029 1.289555 1.10 0.274 .5699876 7.137707
18-23 | 5.900952 3.381117 3.10 0.002 1.901251 18.3149
24-35 | 1 (empty)
|
bsize |
Normal | 1.349742 1.188813 0.34 0.734 .2366677 7.697731
Larger | .6039572 .4938866 -0.62 0.538 .1199484 3.041011
|
age_mw |
25-34 | 4.070497 1.934415 2.95 0.004 1.591036 10.41394
35-49 | 1.888084 1.771321 0.68 0.499 .295559 12.06142
|
bmi_cat_women |
Normal | .8060038 .4934095 -0.35 0.725 .240332 2.703103
Over-weight/Obese | .7124681 .5463428 -0.44 0.659 .1564827 3.243877
|
ht_women |
Normal Height | .1276058 .1108724 -2.37 0.019 .0229075 .7108252
2.b_intrvl | .2583964 .155609 -2.25 0.026 .0785794 .8496969
b_order | .9273665 1.213641 -0.06 0.954 .0697878 12.32319
|
wdds |
Yes | .9963103 .5701445 -0.01 0.995 .3214638 3.087857
|
wsmoke |
Does not smoke | 3.433101 3.336068 1.27 0.206 .5029025 23.43632
|
treat_water |
Water treated | .6969647 .4657874 -0.54 0.590 .1859904 2.611747
1.ODF | .6039407 .6105564 -0.50 0.619 .0818698 4.455174
1.handwash | .5817551 .4421034 -0.71 0.477 .1295241 2.612942
|
cooking_fuel |
Solid fuel | .5671581 .3844455 -0.84 0.404 .148524 2.165767
|
access_hf |
30-60 minutes | .9245167 .5460138 -0.13 0.894 .28768 2.971117
60+ minutes | .3924719 .361534 -1.02 0.312 .0635351 2.424396
|
p_delivery |
Health Facilities | 1.413626 .9945139 0.49 0.623 .3518731 5.679142
|
anc |
1-3 ANC Visit | 18.60609 37.79989 1.44 0.152 .3354334 1032.058
4+ ANC Visit | 13.70322 26.87426 1.33 0.184 .2839418 661.3264
|
s653c |
yes | .8206634 .4767046 -0.34 0.734 .2603163 2.587192
|
exp_media |
1 | .2550635 .2039232 -1.71 0.090 .0525176 1.238774
2 | .2907852 .2374019 -1.51 0.133 .0579037 1.460288
|
1.radio_health | 2.184033 1.684558 1.01 0.313 .4754652 10.03228
|
v025 |
rural | 1.126197 .5962387 0.22 0.823 .3954753 3.207077
|
v024 |
province 2 | .1987846 .1923115 -1.67 0.097 .0293678 1.345534
province 3 | 1.628694 1.99975 0.40 0.692 .1438116 18.44529
province 4 | 2.235545 2.684912 0.67 0.504 .2081405 24.01101
province 5 | 2.844037 2.242557 1.33 0.187 .5984447 13.51594
province 6 | 8.740124 10.87519 1.74 0.084 .747037 102.257
province 7 | 1.704679 1.288971 0.71 0.482 .3824055 7.599078
|
v190 |
poorer | .9211403 .979187 -0.08 0.939 .112658 7.53164
middle | .2865868 .2944614 -1.22 0.226 .0376012 2.184293
richer | .9089295 1.072379 -0.08 0.936 .0882448 9.362056
richest | .0700056 .0926425 -2.01 0.046 .0051177 .9576183
|
secoreg |
hill | .2445847 .2860631 -1.20 0.231 .0242314 2.468769
terai | .8369094 .8070718 -0.18 0.854 .1243993 5.630395
|
v106 |
primary | .4375976 .3207962 -1.13 0.261 .1027415 1.86382
secondary | 1.274379 .8788347 0.35 0.726 .3260503 4.980954
higher | .4546286 .4008257 -0.89 0.373 .0795766 2.597337
|
m_occupation |
Non agricultural | 2.045075 1.47392 0.99 0.323 .492037 8.500033
Agricultural self employed | 1.187476 .6808507 0.30 0.765 .3823096 3.688373
|
hh_size |
More than 4 | 2.362742 1.39428 1.46 0.147 .7359113 7.585897
|
foodsec |
Mildy food insecure | 1.171193 .8326693 0.22 0.824 .2872729 4.774878
Moderately food insecure | 1.466394 1.032713 0.54 0.588 .3644792 5.899683
Severely food insecure | .6072398 .8385066 -0.36 0.718 .039623 9.306212
|
ethnicity |
Terai Other Caste | 1.391769 1.376773 0.33 0.739 .1969477 9.835212
Dalit | 4.052073 3.897078 1.45 0.148 .6054075 27.12106
Newar | 1 (empty)
Janajati | .5714239 .4179498 -0.77 0.445 .1346048 2.425807
Muslim | 3.810826 4.593549 1.11 0.269 .3517453 41.28668
|
int_usew |
Not used in last 12 months | .7915789 .5040145 -0.37 0.714 .224849 2.786746
|
v169a |
yes | 1.190233 .7436998 0.28 0.781 .3461237 4.092912
1.decide | 2.440345 1.500539 1.45 0.149 .7237531 8.228336
2.EIBF | 1.605054 .7764488 0.98 0.330 .6168849 4.176139
|
EBF |
No EBF | .0467964 .0773093 -1.85 0.066 .0017865 1.225829
1.MMF | 1.845237 1.238318 0.91 0.363 .4897164 6.952796
_cons | 1.200048 6.594145 0.03 0.974 .000023 62572.93
---------------------------------------------------------------------------------------------

I would like to know why there is wide variations in CI for some of the variables (for example anc, ethnicity) unlike in my previous model when I include a set of independent variables.

Thanks
SAK
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

28 May 2018, 15:01

Wide confidence intervals just mean that your data identify the parameters you are estimating very imprecisely. Given that you have only 235 observations and are fitting a model with 58 parameters, I'm actually amazed that the confidence intervals are as narrow as they are: I would expect much worse with such a poor observations to variables ratio. There just isn't enough information in the data to give you precise estimates.
Comment
Sumit Karn

Join Date: Feb 2018

Posts: 3
#5

28 May 2018, 15:17

Thank you once again ! So, what would be your suggestion about how to go about for the analysis. I dropped some of the variables which restrict the analysis with 235 observation, the model run the analysis with 640 observation with less wider CIs.

Please bear with my naive questions !
Comment

Announcement

Unusual patterns for Multiple Logistic Regression using DHS data

Comment

Comment

Comment

Comment