Interaction between first differenced and categorical variables

Romano Tarsia

Join Date: Jul 2022

Posts: 20
#1

Interaction between first differenced and categorical variables

15 Jan 2024, 07:56

Hi all,

I am writing for a syntax-related issue I would like to get some opinion on as I could not find any other posts addressing this specific questions. The issue relates to the interacton between a first differenced and a categorical variables in an OLS regression (using reghdfe from Sergio Correia but I guess this is a general issue). When I try doing this I get the error "the 'D' operator is not allowed with factor variables". Specficallly I ran the following code:

Code:

reghdfe d.log_GO d.yearly_avg_t#i.category d.yearly_avg_t_sq#i.category

I have read this post by Jeff Wooldridge and I understand the STATA logic. However, in my case I am not trying to difference the categorical variable, but I just want to interact it with a different FD variables.

Since I imagine this is due to the D. operator being distriuted over the interaction, I have also tried the following alternative syntaxes:

Code:

reghdfe d.log_GO d.(yearly_avg_t)#i.category d.(yearly_avg_t_sq)#i.category

Which yields the same error as before and:

Code:

reghdfe d.log_GO (d.yearly_avg_t)#i.category (d.yearly_avg_t_sq)#i.category

Which yields the previous error plus: "invalid interaction specification".

Of course I can overcome this issue by manually FDing the "category" variable, but it would be interesting to know whether there is a more elegant solution and/or I am missing something here.

Thenks in advance
Tags: None
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#2

15 Jan 2024, 09:15

Stata's varlist parsing code assumes factor variables when specified in an interaction, so you will have to use the c. operator to prevent that.

Try

Code:

reghdfe d.log_GO c.d.yearly_avg_t#i.category c.d.yearly_avg_t_sq#i.category

The extra dot is not necessary, you could use cd. instead.
3 likes
Comment
Romano Tarsia

Join Date: Jul 2022

Posts: 20
#3

15 Jan 2024, 11:13

Dear Jeff Pitblado (StataCorp) thank you very much, I don't know why I did not think about it!

As a follow up, I have another question although relates to a different command. Sorry about that. I ask it in this post since it still relates to the "interaction between FD and categorical variables".

For some very specific reasons I need to use the command margins with the option expression. However, I am not able to estimate the coefficients at the different levels of the "category" variable.

I use:

Code:

margins i.category, expression(_b[d.c.yearly_avg_t] + /// 2*_b[d.c.yearly_avg_t_sq]*c.d.yearly_avg_t /// at(d.c.yearly_avg_t = (0(5)20)) level(95)

But, although the resulting table contains the various categories of i.category, the coefficients are constant within the four levels of at(d.c.yearly_avg_t = (0(5)20)). It seems that the interactions with i.category are not added up in the estimates (or considered as 0), in fact the reported estimates are those from the uninteracted variables (constant across categories within each level of d.c.yearly_avg_t). Would you have any idea about how to solve this?

Thank you again!
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#4

15 Jan 2024, 12:59

Your expression does not contain category.
Comment

Romano Tarsia

Join Date: Jul 2022
Posts: 20

16 Jan 2024, 05:02

Hi Jeff Pitblado (StataCorp), I posted the previous code as I had previously tried it with with category but it delivered the same result:

Code:

margins, expression(_b[d.c.yearly_avg_t] +                                     ///
                        2*_b[d.c.yearly_avg_t_sq]*c.d.yearly_avg_t    +                ///
                        _b[1.category#d.c.yearly_avg_t] +                                 ///
                        2*_b[1.category#d.c.yearly_avg_t_sq]*c.d.yearly_avg_t +            ///
                        _b[2.category#d.c.yearly_avg_t] +                                 ///
                        2*_b[2.category#d.c.yearly_avg_t_sq]*c.d.yearly_avg_t +            ///
                        _b[3.category#d.c.yearly_avg_t] +                                 ///
                        2*_b[3.category#d.c.yearly_avg_t_sq]*c.d.yearly_avg_t)            ///                        
                        at(d.c.yearly_avg_t  = (0(5)20) category = (1(1)3)) level(95)

I had also tried to use i.category in the main part of the code (between margins and the comma) but had the same result. I thought this was wrong, that is why I pasted the shorter version in my previous comment.

I could not find a solution in the stata documentation. The closest thing I could find in the documentation is the example using "age^1.5" at page 1614. However, in my case I need to use "expression" because I need my estimates to vary over the c.d.yearly_avg_t support.
I would really appreciate any suggestions here, thanks in advance.

Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 700

16 Jan 2024, 07:45

Assuming yearly_avg_t_sq was generated from yearly_avg_t*yearly_avg_t and that you want to compute the marginal effect of yearly_avg_t, I believe the expression should be

Code:

local exp // empty
local plus // empty
levelsof category if e(sample) , local(levels)
foreach i of local levels {
    local exp `exp' `plus' ///
        _b[d.yearly_avg_t#`i'.category]*`i'.category + ///
        2*_b[d.yearly_avg_t_sq#`i'.category]*d.yearly_avg_t*`i'.category
    local plus "+"
}
margins category, expression(`exp') at(d.yearly_avg_t=(0(5)20)) level(95)

The c. notation is not necessary in _b notation or expressions.

Comment

Romano Tarsia

Join Date: Jul 2022

Posts: 20
#7

16 Jan 2024, 08:46

Thank you very much Jeff Pitblado (StataCorp), I really appreciate your solution!
I have one last question on this, I would like to plot the estimates over the variable yearly_avg_t for the different categories of the variable category. Normally I would use mplotoffset, but I cannot here as I have different margins results. Moreover, "estimates store" + coefplot would not work here as the option "at" is not allowed (I guess because of the expression option in margins).

I could save the results in a matrix and plot them using coefplot but I was wondering whether there was a more intuitive option.
Thank you!
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#8

16 Jan 2024, 10:38

Sorry Romano, after spending more time thinking about your model specification, I must conclude that my coding suggestion does not give you the intended marginal effect.

Your use of d.yearly_avg_t_sq complicates things beyond my ability to help.

Let's assume time is synonymous with _n, then we have

Code:

d.yearly_avg_t = yearly_avg_t[_n] - yearly_avg_t[_n-1]

and if

Code:

yearly_avg_t_sq = yearly_avg_t*yearly_avg_t

then

Code:

d.yearly_avg_t_sq = yearly_avg_t[_n]*yearly_avg_t[_n] - yearly_avg_t[_n-1]*yearly_avg_t[_n-1]

Thus my code does not compute the marginal effect with respect to d.yearly_avg_t as I assumed you wanted, because I do not know how to take the partial deriviative of d.yearly_avg_t_sq with respect to d.yearly_avg_t. I rushed my suggestion without thinking about what happens to the lagged values, not realizing I was taking partial derivatives with respect to yearly_avg_t and ignoring the lagged values.

Things would be much simpler if your model specification was

Code:

reghdfe d.log_GO c.d.yearly_avg_t#i.category c.d.yearly_avg_t#c.d.yearly_avg_t#i.category

Then, the marginal effect with respect to d.yearly_avg_t at each level of category would be easy to produce via

Code:

margins category, dydx(d.yearly_avg_t) at(d.yearly_avg_t=(0(5)20))

As for your question in #7, I think you are falling into the same trap I did. You do not have a model or marginal effects that are directly related to the values of yearly_avg_t. Your model is specified in terms of a complicated linear combination of current and lagged values of yearly_avg_t. Even my previous model specification is defined in terms of d.yearly_avg_t instead of yearly_avg_t, so the only appropriate graph would have to be plotted over the values of d.yearly_avg_t, that is over the values 0, 5, 10, 15, 20 as you specify in your initial calls to margins.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#9

16 Jan 2024, 23:28

Originally posted by Jeff Pitblado (StataCorp) View Post

I do not know how to take the partial deriviative of d.yearly_avg_t_sq with respect to d.yearly_avg_t

$$D.(x^2) = x^2 - L1.x^2 = \left(x+ L1.x\right) \times \left(x-L1.x\right) = x D.x + L1.x D.x.$$

so

$$\frac{\partial D.(x^2)}{\partial D.x}= \frac{\partial (x D.x + L1.x D.x)}{\partial D.x}= x+ L1.x.$$
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#10

17 Jan 2024, 08:59

Thanks Andrew, that helps with the expression.

Code:

local e1 // empty local e2 // empty local plus // empty levelsof category if e(sample) , local(levels) foreach i of local levels { local e1 `e1' `plus' _b[d.yearly_avg_t#`i'.category]*`i'.category local e2 `e2' `plus' _b[d.yearly_avg_t_sq#`i'.category]*`i'.category local plus "+" } local exp `e1' + (yearly_avg_t+l.yearly_avg_t)*`e2' margins category, expression(`exp') at(d.yearly_avg_t=(0(5)20)) level(95)

Since yearly_avg_t and l.yearly_avg_t are not in the model
specification, margins will not allow us to fix their values, so the
marginal effects are computed as averages over the observed values.

Revisiting #7, it is still not obvious how to plot the marginal effects over
the variable yearly_avg_t and different levels of category, but
you can use marginsplot to plot the marginal effects over the
at() values of d.yearly_avg_t and different levels of
catgory. Simply call marginsplot after the above call to
margins.
Comment
Romano Tarsia

Join Date: Jul 2022

Posts: 20
#11

17 Jan 2024, 11:22

Hi Jeff Pitblado (StataCorp) and Andrew Musau thank you very much for your support!
As for the replies in comments #9 and #10, this is exactly the partial derivative of D.x^2 wrt D.x, I really appreciate the contribution. However, I don’t think this is what I am looking for. I should have been more clear but it was not clear to me either and I just understood this now after thinking about comments #9 and #10.

I think the case you mention would arise assuming that the estimates of X and L1.X differed and I needed to estimate D.X. In my case, I am still interested in the marginal effect "in levels" and use the FD operator only to remove non-stationarity in my variables. Said differently, I want to constrain the coefficients of X = (x + x^2) to be equal in magnitude and opposite in sign to the coefficients of L1.X - according to Newell et al. (2021) pages 7 and 8.

For these reasons, I think that I would need to estimate the model in FD, then plug the resulting estimates in "margins, equation" as if it was a model in levels (e.g. using _b[d.yearly_avg_t] as if it was _b[yearly_avg_t]). Again because:

\[ \beta_1 T_{i,t} - \beta_1 T_{i,t-1} = \beta_1 (T_{i,t} - T_{i,t-1}) \]
Assuming that this is correct, I came up with the followig code that builds on the code from answer #6.

Code:

foreach i of local levels { margins, expression(_b[d.c.yearly_avg_t] + /// 2*_b[d.c.yearly_avg_t_sq]*c.d.yearly_avg_t + /// _b[d.c.yearly_avg_t#`i'.category] + /// 2*_b[d.c.yearly_avg_t_sq#`i'.category]*c.d.yearly_avg_t) /// at(d.c.yearly_avg_t = (0(5)20)) level(95) saving(marg_exp_`i', replace) }

It would be great to have a feedback on this should you have some more time.

I have also realised that the SSC command combomarginsplot can be used here to plot the saved estimates.

Thanks a lot for your help, I really appreciate it!
Comment

Announcement