Logistic regression for categorical variables

BE STATA

Join Date: Jul 2021

Posts: 5
#1

Logistic regression for categorical variables

12 Jul 2021, 23:58

When running a logistic regression, I'm unable to see all the categories under my variable. For eg: for the variable race, I can't see all 7 groups that I had, I'm only able to see 6. Without all 7 I can't state the odds ratio for each.

Any help is appreciated!

Thank you
Tags: categorical, regression
Al Perez

Join Date: Oct 2020

Posts: 10
#2

13 Jul 2021, 00:15

If I understand your question correctly, the constant, or some linear combination including the constant, will give you the estimate for the missing category. It would be helpful if you posted some example code and briefly describe the data.
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#3

13 Jul 2021, 00:16

BE STATA is a good way to approach quant issues, but conflicts with our preference for real given and family names (please, see the FAQ and re-register accordingly, if you want. Thanks. Obviously, this is by no means mandatory, It's only a preferable way to introduce yourself to other listers. Probably, if I had chosen Spiderman as a username, I would have been considered as a peculiar (old) guy).
That said, it is expected and safe that Stata report n-1 levels of your categorical predictors, so that you're shelter from the so called dummy trap (
https://en.wikipedia.org/wiki/Dummy_variable_(statistics)
).

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#4

13 Jul 2021, 00:19

The odds ratio for the category you are not seeing is 1. That is why it is not shown. This is on all fours with why logit won't show you a coefficient for the base category, as it is identically zero.

Code:

help factor variable

explains the flexibility of selecting different categories as base. It is usually a good idea to choose a common category as base, but your research problem may imply otherwise.

Please note our longstanding request to use full real names here. See https://www.statalist.org/forums/help#realnames
2 likes
Comment
BE STATA

Join Date: Jul 2021

Posts: 5
#5

13 Jul 2021, 01:26

Thank you all!
Comment

Richard Williams

Join Date: Apr 2014
Posts: 4945

13 Jul 2021, 06:47

If you like, you can make Stata show you the otherwise omitted base categories. It can be helpful if it isn't obvious what the base is. Indeed, I find it helpful when tables explicitly include the base category so I don't have to refer back to the text to see what it is. Example:

Code:

. webuse nhanes2f, clear

. logit diabetes i.race, allbase or nolog

Logistic regression                                     Number of obs = 10,335
                                                        LR chi2(2)    =  21.79
                                                        Prob > chi2   = 0.0000
Log likelihood = -1988.1717                             Pseudo R2     = 0.0055

------------------------------------------------------------------------------
    diabetes | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        race |
      White  |          1  (base)
      Black  |   1.840272   .2270245     4.94   0.000      1.44502    2.343636
      Other  |   1.008307   .3477382     0.02   0.981     .5129039    1.982209
             |
       _cons |   .0467322   .0023787   -60.18   0.000      .042295    .0516349
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

.

Other sometimes-useful display options are documented at

Code:

help estimation options##display_options

For example,

Code:

. logit diabetes i.race, allbase or nolog cformat(%8.3f)

Logistic regression                                     Number of obs = 10,335
                                                        LR chi2(2)    =  21.79
                                                        Prob > chi2   = 0.0000
Log likelihood = -1988.1717                             Pseudo R2     = 0.0055

------------------------------------------------------------------------------
    diabetes | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        race |
      White  |      1.000  (base)
      Black  |      1.840      0.227     4.94   0.000        1.445       2.344
      Other  |      1.008      0.348     0.02   0.981        0.513       1.982
             |
       _cons |      0.047      0.002   -60.18   0.000        0.042       0.052
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

BE STATA

Join Date: Jul 2021

Posts: 5
#7

13 Jul 2021, 10:53

Thank you!!
Comment

Announcement