Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical Variables as Non-linear Interaction Terms

    Hi All,

    // Stata-MP 14.2 - (user-written commands not possible)
    // Large, "very short" (N>>T) longitudinal data of firm employment and payroll



    I want to look at the effects of employment change (emp_change) on average wages (avg_wage), while taking into account size by using a categorical variable (size_cat).
    There's good reason to suspect that the relationship between size_cat and emp_change and/or avg_wage is non-linear. However:


    Code:
     xtreg avg_wage emp_change##i.size_cat##i.size_cat
    and
    Code:
     xtreg avg_wage emp_change##i.size_cat#i.size_cat
    and
    Code:
     xtreg avg_wage emp_change##(i.size_cat i.size_cat#i.size_cat)
    and
    Code:
     xtreg avg_wage emp_change##(size_cat##size_cat) i.size_cat
    all yield the same results, and there is no emp_change#size_cat#size_cat in the regression output.



    Much appreciated if anyone has an idea on how to keep size_cat as a categorical variable while making it a non-linear interaction term. Reason being that the end goal here is to use it as a divider/decomposer for emp_change in the margins command:

    Code:
     margins size_cat, dydx(emp_change)
    -Jeff



  • #2
    If you type a##b##c you will get a b c a#b a#c b#c a#b#c
    Since b and c are the same variables in your case, and since b is a categorical variable, i.b = i.c, a#i.b =a#i.c, i.b#i.c = i.b and a#i.b#i.c = a#i.b =a#i.c
    So what you will get is a i.b a#i.b
    Is that what are getting?
    Last edited by Eric de Souza; 04 May 2018, 09:28. Reason: For future reference, my notation was not precise. I was thinking in terms of indicator variables. I have edited it

    Comment


    • #3
      I infer from both the name, size_cat, and your use of the i. prefix, that size_cat is a discrete variable. If so, what you are doing does not make sense.

      Your concern is that there is a non-linear relationship between size_cat and your outcome variable, so you have been tempted to use a quadratic model of size_cat. But that is wrong on two levels.

      1. If you just use i.emp_change##i.cat_size, Stata will give you separate estimates for each level of the cat_size variable--there is no restriction to a linear relationship with whatever numbers happen to represent cat_size. If the outcome initially rises and then falls as you go up the levels of cat_size, the corresponding estimates in the regression will reflect that. That's because in a categorical variable, the numbers representing the levels are considered to be arbitrary. You could replace them by any other non-negative integers and your regression results would be absolutely unchanged. The values of those numbers have no meaning at all in this context. So there is no linearity; the very concept of linearity is inapplicable.

      2. Attempting to take an interaction of a categorical variable with itself goes nowhere. Think about it. The value of, say, 2.cat_size#3.cat_size is always 0 because it is impossible for the vale of cat_size to be both 2 and 3 in the same observation! So all of these "interaction terms" are constant zeroes and are, therefore, omitted from the analysis--leading to what you see.

      So just proceed with i.emp_cat##i.cat_size. I also suggest that you follow your regression with

      Code:
      margins emp_cat#cat_size
      marginsplot
      so you can see exactly what's going on.

      Edit: Crossed with #2.

      Comment


      • #4
        Thanks Eric and Clyde,

        Eric: Yes I think that's clear logic explaining what Stata is doing in my case and why they're being drop.

        Clyde: Correct, size_cat is a discrete variable. It hadn't occurred to me that it's incapable of being treated as either linear or non-linear.

        Just to follow up and be clearer, emp_change (sorry not "emp_cat") is a continuous variable, and I tried your code but "only factor variables and their interactions are allowed".
        Then, would the margins code I posted originally be the most analogous alternative?

        Comment


        • #5
          So if emp_change is continuous, then you want to use c.emp_change##i.size_cat. The -margins- code will be a bit different. First you have to pick some "interesting" value of emp_change. Usually those would be some values that more or less span the range of observed values of emp_change in your data. So let's say for the sake of illustration that those values are 1, 3, 5, 7, and 9. Then the -margins- command you want is:

          Code:
          margins i.size_cat, at(emp_change = (1 3 5 7 9))
          marginsplot
          Note that -marginsplot- allows pretty much all options available with -graph twoway-, so you can customize the appearance of the graph however you wish. Also, if you get the "wrong" variable on the x-axis from -marginsplot-, specify the -xdimension()- option with the variable you want there.

          Comment


          • #6
            Yes, I'd forgotten the c. prefix was necessary, I thought it was assumed to be continuous unless otherwise stated.

            I had tried this approach once before but it gave "boring" linear lines and was a bit backwards of what I'm looking for. However, I just discovered that by grouping the variables by size_cat and not the default emp_change in -marginsplot-, it produces a much clearer look at the different elements at play than dydx() could (at least it appears that plot #2 is a disaggregate of plot #3). Now the "Hallelujah" song is stuck in my head.


            Code:
            margins i.size_cat, at(emp_change = (.5(.5)2) 
             marginsplot
            Click image for larger version

Name:	Screen Shot 2018-05-04 at 5.42.54 PM.png
Views:	1
Size:	151.8 KB
ID:	1442853

            Code:
            margins i.size_cat, at(emp_change = (.5(.5)2)
            marginsplot, xdimension(size_cat)
            Click image for larger version

Name:	Screen Shot 2018-05-04 at 5.36.10 PM.png
Views:	1
Size:	127.0 KB
ID:	1442852

            Code:
             margins size_cat, dydx(emp_change)
            marginsplot

            Click image for larger version

Name:	Screen Shot 2018-05-04 at 5.22.18 PM.png
Views:	1
Size:	55.0 KB
ID:	1442854

            Comment


            • #7
              Yes, I'd forgotten the c. prefix was necessary, I thought it was assumed to be continuous unless otherwise stated.
              Yes, somewhat confusingly, variables that are not part of interactions are assumed continuous unless you specify i., but variables in interaction terms are considered discrete unless you specify c.

              Comment

              Working...
              X