Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unusual patterns for Multiple Logistic Regression using DHS data

    Hi,

    I am relatively new for stata. I am using STATA 15.1 in mac.

    I'm trying to explore the determinants of an outcome of interest (stunting, yes/no) using DHS data. There are more than 50 independent variables. I ran the univariate regression analysis for each independent variable. I have categorized the independent variables into 5 different groups for which I ran multiple logistic regression for each set with stunting as dependent variable. When I am including all independent variables in one model to run the logistic regression, the results (OR, 95% CI) show unusual pattern.

    _cons 1.200048 6.594145 0.03 0.974 .000023 62572.93

    I'd like to know:

    1) Whether the mentioned model is right approach for analysis to answer my research question?
    2) How do I know which model best fit for logistic regression analysis?
    3) How do I test the statistical significance of different models for best fit?

    Your guidance and help will be appreciable.

    Thank you so much
    Sumit

  • #2
    Your questions presume that the reader knows what, specifically, your research goals are, what regression command you actually used to create that output, and what exactly you consider unusual about that one line of output you showed.

    To get a helpful response to question 1, you need to provide much more information. At the very least, show the exact command you ran, and the full output you got from Stata, and explain what bothers you about the results for the constant term.

    As for question 2, unlike linear regression models, there are two distinct aspects of model fit with logistic regression: discrimination and calibration. So choosing a "best fit" model may be a compromise between those: among a set of models the one with best discrimination is not necessarily the one that has best calibration. So this is a decision that requires a judgment based on your understanding of the importance of discrimination and calibration for your research questions and cannot be answered by an outsider. The -lroc- command will give you the area under the ROC curve, which measures discrimination, and -estat gof, group(#) table- (where you specify an appropriate value for #) will give you the Hosmer Lemeshow calibration statistics). If you are not familiar with these statistics, consult a textbook on logistic regression.

    Comment


    • #3
      Dear Clyde,

      Thank you so much for your response !

      For running the model, I used following command:
      . svy:logistic stunted i.age_mc i.bsize i.age_mw i.bmi_cat_women i.ht_women ///
      > i.b_intrvl b_order i.wdds i.wsmoke i.treat_water i.ODF i.handwash i.cooking_fuel i.access_hf i.p_delivery i.anc ///
      > i.s653c i.exp_media i.radio_health i.v025 i.v024 i.v190 i.secoreg i.v106 i.m_occupation i.hh_size ///
      > i.foodsec i.ethnicity i.int_usew i.v169a i.decide i.EIBF i.EBF i.MMF if hw1 < 60 & hv103==1 & hw70 < 9990

      and the output is:
      Number of strata = 14 Number of obs = 235
      Number of PSUs = 157 Population size = 227.937422
      Design df = 143
      F( 58, 86) = 1.36
      Prob > F = 0.0963

      ---------------------------------------------------------------------------------------------
      | Linearized
      stunted | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
      ----------------------------+----------------------------------------------------------------
      age_mc |
      12-17 | 2.017029 1.289555 1.10 0.274 .5699876 7.137707
      18-23 | 5.900952 3.381117 3.10 0.002 1.901251 18.3149
      24-35 | 1 (empty)
      |
      bsize |
      Normal | 1.349742 1.188813 0.34 0.734 .2366677 7.697731
      Larger | .6039572 .4938866 -0.62 0.538 .1199484 3.041011
      |
      age_mw |
      25-34 | 4.070497 1.934415 2.95 0.004 1.591036 10.41394
      35-49 | 1.888084 1.771321 0.68 0.499 .295559 12.06142
      |
      bmi_cat_women |
      Normal | .8060038 .4934095 -0.35 0.725 .240332 2.703103
      Over-weight/Obese | .7124681 .5463428 -0.44 0.659 .1564827 3.243877
      |
      ht_women |
      Normal Height | .1276058 .1108724 -2.37 0.019 .0229075 .7108252
      2.b_intrvl | .2583964 .155609 -2.25 0.026 .0785794 .8496969
      b_order | .9273665 1.213641 -0.06 0.954 .0697878 12.32319
      |
      wdds |
      Yes | .9963103 .5701445 -0.01 0.995 .3214638 3.087857
      |
      wsmoke |
      Does not smoke | 3.433101 3.336068 1.27 0.206 .5029025 23.43632
      |
      treat_water |
      Water treated | .6969647 .4657874 -0.54 0.590 .1859904 2.611747
      1.ODF | .6039407 .6105564 -0.50 0.619 .0818698 4.455174
      1.handwash | .5817551 .4421034 -0.71 0.477 .1295241 2.612942
      |
      cooking_fuel |
      Solid fuel | .5671581 .3844455 -0.84 0.404 .148524 2.165767
      |
      access_hf |
      30-60 minutes | .9245167 .5460138 -0.13 0.894 .28768 2.971117
      60+ minutes | .3924719 .361534 -1.02 0.312 .0635351 2.424396
      |
      p_delivery |
      Health Facilities | 1.413626 .9945139 0.49 0.623 .3518731 5.679142
      |
      anc |
      1-3 ANC Visit | 18.60609 37.79989 1.44 0.152 .3354334 1032.058
      4+ ANC Visit | 13.70322 26.87426 1.33 0.184 .2839418 661.3264
      |
      s653c |
      yes | .8206634 .4767046 -0.34 0.734 .2603163 2.587192
      |
      exp_media |
      1 | .2550635 .2039232 -1.71 0.090 .0525176 1.238774
      2 | .2907852 .2374019 -1.51 0.133 .0579037 1.460288
      |
      1.radio_health | 2.184033 1.684558 1.01 0.313 .4754652 10.03228
      |
      v025 |
      rural | 1.126197 .5962387 0.22 0.823 .3954753 3.207077
      |
      v024 |
      province 2 | .1987846 .1923115 -1.67 0.097 .0293678 1.345534
      province 3 | 1.628694 1.99975 0.40 0.692 .1438116 18.44529
      province 4 | 2.235545 2.684912 0.67 0.504 .2081405 24.01101
      province 5 | 2.844037 2.242557 1.33 0.187 .5984447 13.51594
      province 6 | 8.740124 10.87519 1.74 0.084 .747037 102.257
      province 7 | 1.704679 1.288971 0.71 0.482 .3824055 7.599078
      |
      v190 |
      poorer | .9211403 .979187 -0.08 0.939 .112658 7.53164
      middle | .2865868 .2944614 -1.22 0.226 .0376012 2.184293
      richer | .9089295 1.072379 -0.08 0.936 .0882448 9.362056
      richest | .0700056 .0926425 -2.01 0.046 .0051177 .9576183
      |
      secoreg |
      hill | .2445847 .2860631 -1.20 0.231 .0242314 2.468769
      terai | .8369094 .8070718 -0.18 0.854 .1243993 5.630395
      |
      v106 |
      primary | .4375976 .3207962 -1.13 0.261 .1027415 1.86382
      secondary | 1.274379 .8788347 0.35 0.726 .3260503 4.980954
      higher | .4546286 .4008257 -0.89 0.373 .0795766 2.597337
      |
      m_occupation |
      Non agricultural | 2.045075 1.47392 0.99 0.323 .492037 8.500033
      Agricultural self employed | 1.187476 .6808507 0.30 0.765 .3823096 3.688373
      |
      hh_size |
      More than 4 | 2.362742 1.39428 1.46 0.147 .7359113 7.585897
      |
      foodsec |
      Mildy food insecure | 1.171193 .8326693 0.22 0.824 .2872729 4.774878
      Moderately food insecure | 1.466394 1.032713 0.54 0.588 .3644792 5.899683
      Severely food insecure | .6072398 .8385066 -0.36 0.718 .039623 9.306212
      |
      ethnicity |
      Terai Other Caste | 1.391769 1.376773 0.33 0.739 .1969477 9.835212
      Dalit | 4.052073 3.897078 1.45 0.148 .6054075 27.12106
      Newar | 1 (empty)
      Janajati | .5714239 .4179498 -0.77 0.445 .1346048 2.425807
      Muslim | 3.810826 4.593549 1.11 0.269 .3517453 41.28668
      |
      int_usew |
      Not used in last 12 months | .7915789 .5040145 -0.37 0.714 .224849 2.786746
      |
      v169a |
      yes | 1.190233 .7436998 0.28 0.781 .3461237 4.092912
      1.decide | 2.440345 1.500539 1.45 0.149 .7237531 8.228336
      2.EIBF | 1.605054 .7764488 0.98 0.330 .6168849 4.176139
      |
      EBF |
      No EBF | .0467964 .0773093 -1.85 0.066 .0017865 1.225829
      1.MMF | 1.845237 1.238318 0.91 0.363 .4897164 6.952796
      _cons | 1.200048 6.594145 0.03 0.974 .000023 62572.93
      ---------------------------------------------------------------------------------------------

      I would like to know why there is wide variations in CI for some of the variables (for example anc, ethnicity) unlike in my previous model when I include a set of independent variables.

      Thanks
      SAK

      Comment


      • #4
        Wide confidence intervals just mean that your data identify the parameters you are estimating very imprecisely. Given that you have only 235 observations and are fitting a model with 58 parameters, I'm actually amazed that the confidence intervals are as narrow as they are: I would expect much worse with such a poor observations to variables ratio. There just isn't enough information in the data to give you precise estimates.

        Comment


        • #5
          Thank you once again ! So, what would be your suggestion about how to go about for the analysis. I dropped some of the variables which restrict the analysis with 235 observation, the model run the analysis with 640 observation with less wider CIs.

          Please bear with my naive questions !

          Comment

          Working...
          X