Categorical Variables as Non-linear Interaction Terms

Jeff Thompson

Join Date: Feb 2018

Posts: 30
#1

Categorical Variables as Non-linear Interaction Terms

04 May 2018, 07:33

Hi All,

// Stata-MP 14.2 - (user-written commands not possible)
// Large, "very short" (N>>T) longitudinal data of firm employment and payroll

I want to look at the effects of employment change (emp_change) on average wages (avg_wage), while taking into account size by using a categorical variable (size_cat).
There's good reason to suspect that the relationship between size_cat and emp_change and/or avg_wage is non-linear. However:

Code:

xtreg avg_wage emp_change##i.size_cat##i.size_cat

and

Code:

xtreg avg_wage emp_change##i.size_cat#i.size_cat

and

Code:

xtreg avg_wage emp_change##(i.size_cat i.size_cat#i.size_cat)

and

Code:

xtreg avg_wage emp_change##(size_cat##size_cat) i.size_cat

all yield the same results, and there is no emp_change#size_cat#size_cat in the regression output.

Much appreciated if anyone has an idea on how to keep size_cat as a categorical variable while making it a non-linear interaction term. Reason being that the end goal here is to use it as a divider/decomposer for emp_change in the margins command:

Code:

margins size_cat, dydx(emp_change)

-Jeff
Tags: None
Eric de Souza

Join Date: Mar 2014

Posts: 587
#2

04 May 2018, 08:10

If you type a##b##c you will get a b c a#b a#c b#c a#b#c
Since b and c are the same variables in your case, and since b is a categorical variable, i.b = i.c, a#i.b =a#i.c, i.b#i.c = i.b and a#i.b#i.c = a#i.b =a#i.c
So what you will get is a i.b a#i.b
Is that what are getting?

Last edited by Eric de Souza; 04 May 2018, 08:28. Reason: For future reference, my notation was not precise. I was thinking in terms of indicator variables. I have edited it
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#3

04 May 2018, 08:16

I infer from both the name, size_cat, and your use of the i. prefix, that size_cat is a discrete variable. If so, what you are doing does not make sense.

Your concern is that there is a non-linear relationship between size_cat and your outcome variable, so you have been tempted to use a quadratic model of size_cat. But that is wrong on two levels.

1. If you just use i.emp_change##i.cat_size, Stata will give you separate estimates for each level of the cat_size variable--there is no restriction to a linear relationship with whatever numbers happen to represent cat_size. If the outcome initially rises and then falls as you go up the levels of cat_size, the corresponding estimates in the regression will reflect that. That's because in a categorical variable, the numbers representing the levels are considered to be arbitrary. You could replace them by any other non-negative integers and your regression results would be absolutely unchanged. The values of those numbers have no meaning at all in this context. So there is no linearity; the very concept of linearity is inapplicable.

2. Attempting to take an interaction of a categorical variable with itself goes nowhere. Think about it. The value of, say, 2.cat_size#3.cat_size is always 0 because it is impossible for the vale of cat_size to be both 2 and 3 in the same observation! So all of these "interaction terms" are constant zeroes and are, therefore, omitted from the analysis--leading to what you see.

So just proceed with i.emp_cat##i.cat_size. I also suggest that you follow your regression with

Code:

margins emp_cat#cat_size marginsplot

so you can see exactly what's going on.

Edit: Crossed with #2.
Comment
Jeff Thompson

Join Date: Feb 2018

Posts: 30
#4

04 May 2018, 08:54

Thanks Eric and Clyde,

Eric: Yes I think that's clear logic explaining what Stata is doing in my case and why they're being drop.

Clyde: Correct, size_cat is a discrete variable. It hadn't occurred to me that it's incapable of being treated as either linear or non-linear.

Just to follow up and be clearer, emp_change (sorry not "emp_cat") is a continuous variable, and I tried your code but "only factor variables and their interactions are allowed".
Then, would the margins code I posted originally be the most analogous alternative?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#5

04 May 2018, 09:06

So if emp_change is continuous, then you want to use c.emp_change##i.size_cat. The -margins- code will be a bit different. First you have to pick some "interesting" value of emp_change. Usually those would be some values that more or less span the range of observed values of emp_change in your data. So let's say for the sake of illustration that those values are 1, 3, 5, 7, and 9. Then the -margins- command you want is:

Code:

margins i.size_cat, at(emp_change = (1 3 5 7 9)) marginsplot

Note that -marginsplot- allows pretty much all options available with -graph twoway-, so you can customize the appearance of the graph however you wish. Also, if you get the "wrong" variable on the x-axis from -marginsplot-, specify the -xdimension()- option with the variable you want there.
Comment
Jeff Thompson

Join Date: Feb 2018

Posts: 30
#6

04 May 2018, 09:56

Yes, I'd forgotten the c. prefix was necessary, I thought it was assumed to be continuous unless otherwise stated.

I had tried this approach once before but it gave "boring" linear lines and was a bit backwards of what I'm looking for. However, I just discovered that by grouping the variables by size_cat and not the default emp_change in -marginsplot-, it produces a much clearer look at the different elements at play than dydx() could (at least it appears that plot #2 is a disaggregate of plot #3). Now the "Hallelujah" song is stuck in my head.

Code:

margins i.size_cat, at(emp_change = (.5(.5)2) marginsplot

Code:

margins i.size_cat, at(emp_change = (.5(.5)2) marginsplot, xdimension(size_cat)

Code:

margins size_cat, dydx(emp_change) marginsplot
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#7

04 May 2018, 10:00

Yes, I'd forgotten the c. prefix was necessary, I thought it was assumed to be continuous unless otherwise stated.

Yes, somewhat confusingly, variables that are not part of interactions are assumed continuous unless you specify i., but variables in interaction terms are considered discrete unless you specify c.
1 like
Comment

Announcement

Categorical Variables as Non-linear Interaction Terms

Comment

Comment

Comment

Comment

Comment

Comment