Categorical variable with no baseline

Katerina Novakova

Join Date: Feb 2024

Posts: 12
#1

Categorical variable with no baseline

24 Feb 2024, 02:56

Dear Stata Forum,
I am asking for your knowledge and expertise once again (don't blame me, you are just very helpful :D).

I wanted to ask whether it is possible to use a categorical variable and not have one of the values omitted (and used as a base level).

Let's say I have this code

Code:

xtreg c.DV i.IV i.Control, re vce(robust)

The Control is a categorical variable from 1-4 representing gender (1=male 2=female 3=other 4=prefer not to say).
When running this regression, I find that i1.Control (male) is being used as the base level (and thus not showing in the final regression output).

My question is, is it necessary to have a base level? What exactly does a base level do?

This might be simply a visual matter, as I researched around and found that the results do not marginally change whatever baseline you are using. Nevertheless, I would like to see all the categories of the control in my output table. Is that somehow possible? And if so, what does it mean?

I am able to switch which value is used as the baseline (as per the Stata manual). I thought that changing the code to ibn.Control would help the situation (once again, as per the Stata manual). But it only made Stata take i4.Control as the baseline (thus omitting it in the results table).

Thank you for your help.

Katerina
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

24 Feb 2024, 04:17

Katerina:
no, as it would imply a violation of linear algebra.
Stata omits the categorical variable to shelter you from the so called dummy trap (least squares - Dummy variable trap in OLS with multiple indicator variables - Cross Validated (stackexchange.com)).
Unfortunately, you cannot omit the constant with -xtreg-, which is the only way to make the otherwise reference category coefficient appearing in the regression outcome table.
The trick would work if you were coding a -fe- panel data regression, though (but this is not your case). Switching from -xtreg,fe- to -regress- with -panelid- as a categorical predictor + _cons omission would suppress the _cons coefficient in favour of the reference category one.

Kind regards,
Carlo
(Stata 19.0)
Comment
Katerina Novakova

Join Date: Feb 2024

Posts: 12
#3

24 Feb 2024, 04:28

Hi,

Thank you for your quick response.
Now that you mention it, it makes sense that it is due to the dummy trap.
Do you think there is any value in setting one of the other values of the Control (e.g., i4.Control = prefer not to say) as the baseline? Or would you advise leaving it at the "normal setting" (taking i1.Control = male)?
It might be due to me staring at this dataset for way too long by now, but I cannot think of what the "best" way to go about it would be. Or what implications it would have.

Thank you for your help!

Katerina
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

24 Feb 2024, 07:58

Katerina:
no, there is not.
The difference between the level of your categorical predictors would be the same.
Therefore, there's no such a thing as "the best way" to go about that.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Categorical variable with no baseline

Comment

Comment

Comment