Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression for categorical variables

    When running a logistic regression, I'm unable to see all the categories under my variable. For eg: for the variable race, I can't see all 7 groups that I had, I'm only able to see 6. Without all 7 I can't state the odds ratio for each.

    Any help is appreciated!

    Thank you

  • #2
    If I understand your question correctly, the constant, or some linear combination including the constant, will give you the estimate for the missing category. It would be helpful if you posted some example code and briefly describe the data.

    Comment


    • #3
      BE STATA is a good way to approach quant issues, but conflicts with our preference for real given and family names (please, see the FAQ and re-register accordingly, if you want. Thanks. Obviously, this is by no means mandatory, It's only a preferable way to introduce yourself to other listers. Probably, if I had chosen Spiderman as a username, I would have been considered as a peculiar (old) guy).
      That said, it is expected and safe that Stata report n-1 levels of your categorical predictors, so that you're shelter from the so called dummy trap (
      https://en.wikipedia.org/wiki/Dummy_variable_(statistics)
      ).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        The odds ratio for the category you are not seeing is 1. That is why it is not shown. This is on all fours with why logit won't show you a coefficient for the base category, as it is identically zero.

        Code:
        help factor variable
        explains the flexibility of selecting different categories as base. It is usually a good idea to choose a common category as base, but your research problem may imply otherwise.


        Please note our longstanding request to use full real names here. See https://www.statalist.org/forums/help#realnames

        Comment


        • #5
          Thank you all!

          Comment


          • #6
            If you like, you can make Stata show you the otherwise omitted base categories. It can be helpful if it isn't obvious what the base is. Indeed, I find it helpful when tables explicitly include the base category so I don't have to refer back to the text to see what it is. Example:

            Code:
            . webuse nhanes2f, clear
            
            . logit diabetes i.race, allbase or nolog
            
            Logistic regression                                     Number of obs = 10,335
                                                                    LR chi2(2)    =  21.79
                                                                    Prob > chi2   = 0.0000
            Log likelihood = -1988.1717                             Pseudo R2     = 0.0055
            
            ------------------------------------------------------------------------------
                diabetes | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    race |
                  White  |          1  (base)
                  Black  |   1.840272   .2270245     4.94   0.000      1.44502    2.343636
                  Other  |   1.008307   .3477382     0.02   0.981     .5129039    1.982209
                         |
                   _cons |   .0467322   .0023787   -60.18   0.000      .042295    .0516349
            ------------------------------------------------------------------------------
            Note: _cons estimates baseline odds.
            
            .
            Other sometimes-useful display options are documented at

            Code:
            help estimation options##display_options
            For example,

            Code:
            . logit diabetes i.race, allbase or nolog cformat(%8.3f)
            
            Logistic regression                                     Number of obs = 10,335
                                                                    LR chi2(2)    =  21.79
                                                                    Prob > chi2   = 0.0000
            Log likelihood = -1988.1717                             Pseudo R2     = 0.0055
            
            ------------------------------------------------------------------------------
                diabetes | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    race |
                  White  |      1.000  (base)
                  Black  |      1.840      0.227     4.94   0.000        1.445       2.344
                  Other  |      1.008      0.348     0.02   0.981        0.513       1.982
                         |
                   _cons |      0.047      0.002   -60.18   0.000        0.042       0.052
            ------------------------------------------------------------------------------
            Note: _cons estimates baseline odds.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thank you!!

              Comment

              Working...
              X