Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction between first differenced and categorical variables

    Hi all,

    I am writing for a syntax-related issue I would like to get some opinion on as I could not find any other posts addressing this specific questions. The issue relates to the interacton between a first differenced and a categorical variables in an OLS regression (using reghdfe from Sergio Correia but I guess this is a general issue). When I try doing this I get the error "the 'D' operator is not allowed with factor variables". Specficallly I ran the following code:

    Code:
     reghdfe d.log_GO d.yearly_avg_t#i.category d.yearly_avg_t_sq#i.category
    I have read this post by Jeff Wooldridge and I understand the STATA logic. However, in my case I am not trying to difference the categorical variable, but I just want to interact it with a different FD variables.

    Since I imagine this is due to the D. operator being distriuted over the interaction, I have also tried the following alternative syntaxes:

    Code:
     reghdfe d.log_GO d.(yearly_avg_t)#i.category d.(yearly_avg_t_sq)#i.category
    Which yields the same error as before and:

    Code:
     reghdfe d.log_GO (d.yearly_avg_t)#i.category (d.yearly_avg_t_sq)#i.category
    Which yields the previous error plus: "invalid interaction specification".

    Of course I can overcome this issue by manually FDing the "category" variable, but it would be interesting to know whether there is a more elegant solution and/or I am missing something here.

    Thenks in advance

  • #2
    Stata's varlist parsing code assumes factor variables when specified in an interaction, so you will have to use the c. operator to prevent that.

    Try
    Code:
    reghdfe d.log_GO c.d.yearly_avg_t#i.category c.d.yearly_avg_t_sq#i.category
    The extra dot is not necessary, you could use cd. instead.

    Comment


    • #3
      Dear Jeff Pitblado (StataCorp) thank you very much, I don't know why I did not think about it!

      As a follow up, I have another question although relates to a different command. Sorry about that. I ask it in this post since it still relates to the "interaction between FD and categorical variables".

      For some very specific reasons I need to use the command margins with the option expression. However, I am not able to estimate the coefficients at the different levels of the "category" variable.

      I use:

      Code:
       margins i.category, expression(_b[d.c.yearly_avg_t] +             ///
                              2*_b[d.c.yearly_avg_t_sq]*c.d.yearly_avg_t                  ///
                              at(d.c.yearly_avg_t  = (0(5)20)) level(95)
      But, although the resulting table contains the various categories of i.category, the coefficients are constant within the four levels of at(d.c.yearly_avg_t = (0(5)20)). It seems that the interactions with i.category are not added up in the estimates (or considered as 0), in fact the reported estimates are those from the uninteracted variables (constant across categories within each level of d.c.yearly_avg_t). Would you have any idea about how to solve this?

      Thank you again!

      Comment


      • #4
        Your expression does not contain category.

        Comment


        • #5
          Hi Jeff Pitblado (StataCorp), I posted the previous code as I had previously tried it with with category but it delivered the same result:


          Code:
          margins, expression(_b[d.c.yearly_avg_t] +                                     ///
                                  2*_b[d.c.yearly_avg_t_sq]*c.d.yearly_avg_t    +                ///
                                  _b[1.category#d.c.yearly_avg_t] +                                 ///
                                  2*_b[1.category#d.c.yearly_avg_t_sq]*c.d.yearly_avg_t +            ///
                                  _b[2.category#d.c.yearly_avg_t] +                                 ///
                                  2*_b[2.category#d.c.yearly_avg_t_sq]*c.d.yearly_avg_t +            ///
                                  _b[3.category#d.c.yearly_avg_t] +                                 ///
                                  2*_b[3.category#d.c.yearly_avg_t_sq]*c.d.yearly_avg_t)            ///                        
                                  at(d.c.yearly_avg_t  = (0(5)20) category = (1(1)3)) level(95)
          I had also tried to use i.category in the main part of the code (between margins and the comma) but had the same result. I thought this was wrong, that is why I pasted the shorter version in my previous comment.

          I could not find a solution in the stata documentation. The closest thing I could find in the documentation is the example using "age^1.5" at page 1614. However, in my case I need to use "expression" because I need my estimates to vary over the c.d.yearly_avg_t support.
          I would really appreciate any suggestions here, thanks in advance.

          Comment


          • #6
            Assuming yearly_avg_t_sq was generated from yearly_avg_t*yearly_avg_t and that you want to compute the marginal effect of yearly_avg_t, I believe the expression should be
            Code:
            local exp // empty
            local plus // empty
            levelsof category if e(sample) , local(levels)
            foreach i of local levels {
                local exp `exp' `plus' ///
                    _b[d.yearly_avg_t#`i'.category]*`i'.category + ///
                    2*_b[d.yearly_avg_t_sq#`i'.category]*d.yearly_avg_t*`i'.category
                local plus "+"
            }
            margins category, expression(`exp') at(d.yearly_avg_t=(0(5)20)) level(95)
            The c. notation is not necessary in _b notation or expressions.

            Comment


            • #7
              Thank you very much Jeff Pitblado (StataCorp), I really appreciate your solution!
              I have one last question on this, I would like to plot the estimates over the variable yearly_avg_t for the different categories of the variable category. Normally I would use mplotoffset, but I cannot here as I have different margins results. Moreover, "estimates store" + coefplot would not work here as the option "at" is not allowed (I guess because of the expression option in margins).

              I could save the results in a matrix and plot them using coefplot but I was wondering whether there was a more intuitive option.
              Thank you!

              Comment


              • #8
                Sorry Romano, after spending more time thinking about your model specification, I must conclude that my coding suggestion does not give you the intended marginal effect.

                Your use of d.yearly_avg_t_sq complicates things beyond my ability to help.

                Let's assume time is synonymous with _n, then we have
                Code:
                    d.yearly_avg_t = yearly_avg_t[_n] - yearly_avg_t[_n-1]
                and if
                Code:
                    yearly_avg_t_sq = yearly_avg_t*yearly_avg_t
                then
                Code:
                    d.yearly_avg_t_sq = yearly_avg_t[_n]*yearly_avg_t[_n] - yearly_avg_t[_n-1]*yearly_avg_t[_n-1]
                Thus my code does not compute the marginal effect with respect to d.yearly_avg_t as I assumed you wanted, because I do not know how to take the partial deriviative of d.yearly_avg_t_sq with respect to d.yearly_avg_t. I rushed my suggestion without thinking about what happens to the lagged values, not realizing I was taking partial derivatives with respect to yearly_avg_t and ignoring the lagged values.

                Things would be much simpler if your model specification was
                Code:
                reghdfe d.log_GO c.d.yearly_avg_t#i.category c.d.yearly_avg_t#c.d.yearly_avg_t#i.category
                Then, the marginal effect with respect to d.yearly_avg_t at each level of category would be easy to produce via
                Code:
                margins category, dydx(d.yearly_avg_t) at(d.yearly_avg_t=(0(5)20))
                As for your question in #7, I think you are falling into the same trap I did. You do not have a model or marginal effects that are directly related to the values of yearly_avg_t. Your model is specified in terms of a complicated linear combination of current and lagged values of yearly_avg_t. Even my previous model specification is defined in terms of d.yearly_avg_t instead of yearly_avg_t, so the only appropriate graph would have to be plotted over the values of d.yearly_avg_t, that is over the values 0, 5, 10, 15, 20 as you specify in your initial calls to margins.

                Comment


                • #9
                  Originally posted by Jeff Pitblado (StataCorp) View Post
                  I do not know how to take the partial deriviative of d.yearly_avg_t_sq with respect to d.yearly_avg_t
                  $$D.(x^2) = x^2 - L1.x^2 = \left(x+ L1.x\right) \times \left(x-L1.x\right) = x D.x + L1.x D.x.$$

                  so

                  $$\frac{\partial D.(x^2)}{\partial D.x}= \frac{\partial (x D.x + L1.x D.x)}{\partial D.x}= x+ L1.x.$$

                  Comment


                  • #10
                    Thanks Andrew, that helps with the expression.
                    Code:
                    local e1 // empty
                    local e2 // empty
                    local plus // empty
                    levelsof category if e(sample) , local(levels)
                    foreach i of local levels {
                        local e1 `e1' `plus' _b[d.yearly_avg_t#`i'.category]*`i'.category
                        local e2 `e2' `plus' _b[d.yearly_avg_t_sq#`i'.category]*`i'.category
                        local plus "+"
                    }
                    local exp `e1' + (yearly_avg_t+l.yearly_avg_t)*`e2'
                    margins category, expression(`exp') at(d.yearly_avg_t=(0(5)20)) level(95)
                    Since yearly_avg_t and l.yearly_avg_t are not in the model
                    specification, margins will not allow us to fix their values, so the
                    marginal effects are computed as averages over the observed values.

                    Revisiting #7, it is still not obvious how to plot the marginal effects over
                    the variable yearly_avg_t and different levels of category, but
                    you can use marginsplot to plot the marginal effects over the
                    at() values of d.yearly_avg_t and different levels of
                    catgory. Simply call marginsplot after the above call to
                    margins.

                    Comment


                    • #11
                      Hi Jeff Pitblado (StataCorp) and Andrew Musau thank you very much for your support!
                      As for the replies in comments #9 and #10, this is exactly the partial derivative of D.x^2 wrt D.x, I really appreciate the contribution. However, I don’t think this is what I am looking for. I should have been more clear but it was not clear to me either and I just understood this now after thinking about comments #9 and #10.

                      I think the case you mention would arise assuming that the estimates of X and L1.X differed and I needed to estimate D.X. In my case, I am still interested in the marginal effect "in levels" and use the FD operator only to remove non-stationarity in my variables. Said differently, I want to constrain the coefficients of X = (x + x^2) to be equal in magnitude and opposite in sign to the coefficients of L1.X - according to Newell et al. (2021) pages 7 and 8.

                      For these reasons, I think that I would need to estimate the model in FD, then plug the resulting estimates in "margins, equation" as if it was a model in levels (e.g. using _b[d.yearly_avg_t] as if it was _b[yearly_avg_t]). Again because:

                      \[ \beta_1 T_{i,t} - \beta_1 T_{i,t-1} = \beta_1 (T_{i,t} - T_{i,t-1}) \]
                      Assuming that this is correct, I came up with the followig code that builds on the code from answer #6.

                      Code:
                      foreach i of local levels {        
                      
                          margins, expression(_b[d.c.yearly_avg_t]                                     +         ///
                                            2*_b[d.c.yearly_avg_t_sq]*c.d.yearly_avg_t                +        ///
                                              _b[d.c.yearly_avg_t#`i'.category]                         +        ///
                                            2*_b[d.c.yearly_avg_t_sq#`i'.category]*c.d.yearly_avg_t)            ///
                                              at(d.c.yearly_avg_t  = (0(5)20)) level(95) saving(marg_exp_`i', replace)
                                                  
                      }

                      It would be great to have a feedback on this should you have some more time.

                      I have also realised that the SSC command combomarginsplot can be used here to plot the saved estimates.


                      Thanks a lot for your help, I really appreciate it!

                      Comment

                      Working...
                      X