Stata omitted a group in a categorical variable

You Zhang

Join Date: Dec 2021
Posts: 7

Stata omitted a group in a categorical variable

15 Dec 2021, 13:23

Hi everyone, I have a question and have not been able to find out why. I am running models with melogit, I put the interaction of two categorical variables and the Stata output shows that some groups are empty and some are omitted. Does anyone know why this happens? Here is part of the output. Thanks!

1#Europe	102.5953	263.8209	1.80	0.072	.6641806	15847.79
1#LAC	.3899114	.1499569	-2.45	0.014	.1834849	.8285743
1#Middle East	.0625043	.0707475	-2.45	0.014	.0067991	.5746059
1#North America	.0139065	.0177014	-3.36	0.001	.0011474	.1685395
1#Oceania	.0034114	.0045219	-4.29	0.000	.0002539	.0458395
1#South Asia	.0138068	.0112544	-5.25	0.000	.0027941	.0682244
1#Southeast Asia	.212798	.1172752	-2.81	0.005	.0722539	.62672
2#Africa	1	(empty)
2#Central Asia	1	(empty)
2#East Asia	.5425233	.3912497	-0.85	0.396	.1319958	2.229854
2#Europe	1560.857	4028.473	2.85	0.004	9.918933	245618.5
2#LAC	7.766363	4.845364	3.29	0.001	2.286445	26.37999
2#Middle East	.247236	.3453624	-1.00	0.317	.0159983	3.82076
2#North America	.0307444	.0447312	-2.39	0.017	.0017755	.5323688
2#Oceania	.3679295	.3906408	-0.94	0.346	.045922	2.947871
2#South Asia	1	(empty)
2#Southeast Asia	3.689175	4.898811	0.98	0.326	.2732934	49.80002
3#Africa	1	(empty)
3#Central Asia	1	(empty)
3#East Asia	1.342666	.9926419	0.40	0.690	.3152599	5.718301
3#Europe	3895.888	10095.15	3.19	0.001	24.26308	625557.2
3#LAC	1	(empty)
3#Middle East	.7352943	1.130309	-0.20	0.841	.0361391	14.96045
3#North America	.0787567	.1034587	-1.93	0.053	.0059994	1.033874
3#Oceania	.3854522	.3676957	-1.00	0.318	.0594266	2.500116
3#South Asia	1	(empty)
3#Southeast Asia	1	(empty)
4#Africa	1	(empty)
4#Central Asia	1	(empty)
4#East Asia	4.872969	3.743884	2.06	0.039	1.080983	21.96689
4#Europe	16537.12	43665.43	3.68	0.000	93.52002	2924254
4#LAC	1	(empty)
4#Middle East	.6835902	1.047137	-0.25	0.804	.0339554	13.76205
4#North America	.0698741	.0931031	-2.00	0.046	.0051303	.9516821
4#Oceania	1	(omitted)
4#South Asia	1	(empty)
4#Southeast Asia	1	(empty)

ruanumber	.7499106	.1107974	-1.95	0.051	.5613666	1.00178
_cons	4.50413	5.879984	1.15	0.249	.3486559	58.18684

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29805
#2

15 Dec 2021, 14:24

The ones designated as (empty) are combinations that simply do not occur in the estimation sample. Run -tab var1 var2 if e(sample)- to see this directly. Remember that in any Stata estimation command, any observation that has a missing value for any variable mentioned in the command is excluded from the estimation sample. So, even if your data set has, for example, some observations with var1 = 4 and var2 = Southeast Asia, it may be that all of those observations have missing values for something else mentioned in the -melogit- command.

As for 4#Oceania being omitted that with any representation of categorical variables by indicator ("dummy") variables (and interactions of categorical variables are included here) there is always some reference category that is omitted. Failure to do that would lead to colinearity of all those indicators with the constant term in the model, and the mode would be unidentifiable and no estimates would be provided. You can select the reference category to be omitted yourself (read -help fvvarlist- to see how) if you prefer, or you can let Stata do it for you. Alternatively, you can add the -noconstant- option to your -melogit- command and that will resolve the colinearity problem by omitting the constant term without omitting any of your indicators.It makes no difference in terms of any estimable statistics derived from the model, though sometimes it is more convenient to have some specific category (categories in the case of interaction) omitted.
1 like
Comment

Announcement

Stata omitted a group in a categorical variable

Comment