Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata omitted a group in a categorical variable

    Hi everyone, I have a question and have not been able to find out why. I am running models with melogit, I put the interaction of two categorical variables and the Stata output shows that some groups are empty and some are omitted. Does anyone know why this happens? Here is part of the output. Thanks!
    1#Europe 102.5953 263.8209 1.80 0.072 .6641806 15847.79
    1#LAC .3899114 .1499569 -2.45 0.014 .1834849 .8285743
    1#Middle East .0625043 .0707475 -2.45 0.014 .0067991 .5746059
    1#North America .0139065 .0177014 -3.36 0.001 .0011474 .1685395
    1#Oceania .0034114 .0045219 -4.29 0.000 .0002539 .0458395
    1#South Asia .0138068 .0112544 -5.25 0.000 .0027941 .0682244
    1#Southeast Asia .212798 .1172752 -2.81 0.005 .0722539 .62672
    2#Africa 1 (empty)
    2#Central Asia 1 (empty)
    2#East Asia .5425233 .3912497 -0.85 0.396 .1319958 2.229854
    2#Europe 1560.857 4028.473 2.85 0.004 9.918933 245618.5
    2#LAC 7.766363 4.845364 3.29 0.001 2.286445 26.37999
    2#Middle East .247236 .3453624 -1.00 0.317 .0159983 3.82076
    2#North America .0307444 .0447312 -2.39 0.017 .0017755 .5323688
    2#Oceania .3679295 .3906408 -0.94 0.346 .045922 2.947871
    2#South Asia 1 (empty)
    2#Southeast Asia 3.689175 4.898811 0.98 0.326 .2732934 49.80002
    3#Africa 1 (empty)
    3#Central Asia 1 (empty)
    3#East Asia 1.342666 .9926419 0.40 0.690 .3152599 5.718301
    3#Europe 3895.888 10095.15 3.19 0.001 24.26308 625557.2
    3#LAC 1 (empty)
    3#Middle East .7352943 1.130309 -0.20 0.841 .0361391 14.96045
    3#North America .0787567 .1034587 -1.93 0.053 .0059994 1.033874
    3#Oceania .3854522 .3676957 -1.00 0.318 .0594266 2.500116
    3#South Asia 1 (empty)
    3#Southeast Asia 1 (empty)
    4#Africa 1 (empty)
    4#Central Asia 1 (empty)
    4#East Asia 4.872969 3.743884 2.06 0.039 1.080983 21.96689
    4#Europe 16537.12 43665.43 3.68 0.000 93.52002 2924254
    4#LAC 1 (empty)
    4#Middle East .6835902 1.047137 -0.25 0.804 .0339554 13.76205
    4#North America .0698741 .0931031 -2.00 0.046 .0051303 .9516821
    4#Oceania 1 (omitted)
    4#South Asia 1 (empty)
    4#Southeast Asia 1 (empty)
    ruanumber .7499106 .1107974 -1.95 0.051 .5613666 1.00178
    _cons 4.50413 5.879984 1.15 0.249 .3486559 58.18684

  • #2
    The ones designated as (empty) are combinations that simply do not occur in the estimation sample. Run -tab var1 var2 if e(sample)- to see this directly. Remember that in any Stata estimation command, any observation that has a missing value for any variable mentioned in the command is excluded from the estimation sample. So, even if your data set has, for example, some observations with var1 = 4 and var2 = Southeast Asia, it may be that all of those observations have missing values for something else mentioned in the -melogit- command.

    As for 4#Oceania being omitted that with any representation of categorical variables by indicator ("dummy") variables (and interactions of categorical variables are included here) there is always some reference category that is omitted. Failure to do that would lead to colinearity of all those indicators with the constant term in the model, and the mode would be unidentifiable and no estimates would be provided. You can select the reference category to be omitted yourself (read -help fvvarlist- to see how) if you prefer, or you can let Stata do it for you. Alternatively, you can add the -noconstant- option to your -melogit- command and that will resolve the colinearity problem by omitting the constant term without omitting any of your indicators.It makes no difference in terms of any estimable statistics derived from the model, though sometimes it is more convenient to have some specific category (categories in the case of interaction) omitted.

    Comment

    Working...
    X