Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting an interaction term when a growth rate is included

    Dear Statalist, I am wondering if you can help interpret in magnitudes the following interaction term (see below in bold blue: cL.x#cL.newintra2). Here "x" is the growth rate of the capacity a firm has, "newintra2" is standardized, and the dep variable is growth rate of firms' sales. I understand that I only need the interaction to be significant (not the two coefficients separately, right?).

    In this case, the interpretation would it be: increasing 1 standard deviation of "newintra2" will increase the sales of firms by 0.95% for those firms increasing their capacity? Or would it be: increasing 1 standard deviation of "newintra2" will increase the sales of firms by 0.95 percentage points for those firms increasing their capacity? When I talk about firms increasing their capacity, the explanation given in the last sentence is enought? Or should I say: for firms increasing their capacity in 1% (as it is a growth rate).

    For interpreting the magnitude is this enought? Or should I include in the explanation the true value of the standard deviation for "newintra2". If it is, should the std be for the unstandardized variable, or the one after standardizing the variable as in the first table below?

    Thanks a lot for your help!

    Code:
    . sum newintra2
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
       newintra2 |     43,133   -.0562024    .9248564  -.9237571   4.713312
    Code:
    HDFE Linear regression                            Number of obs   =     37,865
    Absorbing 2 HDFE groups                           F(   9,     32) =       2.36
    Statistics robust to heteroskedasticity           Prob > F        =     0.0357
                                                      R-squared       =     0.0511
                                                      Adj R-squared   =    -0.0828
                                                      Within R-sq.    =     0.0003
    Number of clusters (sectors) =         33         Root MSE        =     3.8265
    
                                        (Std. Err. adjusted for 33 clusters in sectors)
    -----------------------------------------------------------------------------------
                      |               Robust
                    y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
                    x |
                  L1. |   .1697371   .2675454     0.63   0.530    -.3752349    .7147092
                      |
            newintra1 |
                  L1. |    .013568   .0385366     0.35   0.727    -.0649286    .0920645
                      |
            newintra2 |
                  L1. |  -.0750884   .0560337    -1.34   0.190    -.1892253    .0390486
                      |
            newinter1 |
                  L1. |   .1201949   .1601248     0.75   0.458    -.2059686    .4463585
                      |
            newinter2 |
                  L1. |    .106954   .1257308     0.85   0.401    -.1491513    .3630593
                      |
    cL.x#cL.newintra1 |  -.6916032   .3277542    -2.11   0.043    -1.359217   -.0239899
                      |
    cL.x#cL.newintra2 |   .9540062   .4114167     2.32   0.027     .1159778    1.792035
                      |
    cL.x#cL.newinter1 |   .2513634   .4661039     0.54   0.593     -.698059    1.200786
                      |
    cL.x#cL.newinter2 |  -.5283464   .2938855    -1.80   0.082    -1.126972    .0702787
                      |
                _cons |  -.1368248   .0107207   -12.76   0.000    -.1586621   -.1149875
    -----------------------------------------------------------------------------------

  • #2
    In this case, the interpretation would it be: increasing 1 standard deviation of "newintra2" will increase the sales of firms by 0.95% for those firms increasing their capacity? Or would it be: increasing 1 standard deviation of "newintra2" will increase the sales of firms by 0.95 percentage points for those firms increasing their capacity?
    Neither. You are looking at an interaction term. When you then ask what differences are associated with a difference in one of the variables, the answer is not expressible in terms of a difference in the level of the outcome variable at all. It is in terms of a difference in the effect of the other variable in the interaction on the outcome variable. So it would be that a 1 SD difference in the variable underlying newintra2 is associated with a 0.95 difference in the slope of the y:x relationship.

    Taking this a step farther, both y and x are growth rates. Though you do not say as much, I infer from what you do say that each of them is denominated as percent per year (or some other time period). So the slope of the y:x relationship will be dimensionless because y and x have the same units. So the difference of 0.95 is just that, it is the number 0.95, not 0.95 anything.

    Notice, too, that I have taken care to avoid using language that implies causality in these relationships: when you are working with observational data, causal inference should be, at best, used with extreme caution, or, in most situation, eschewed altogether.

    Comment


    • #3
      Dear Clyde Schechter, thanks a lot for your help. You are right, these are growth rate from year to year: ln(xt+1) - ln(xt-1+1)

      However, I am very confused right now. I understand your first paragraph (it is a 0.95 difference with respect to the coefficient for "x" alone which is 0.16 but not significant). This would be saying that those increasing their capacity are in a better situation (0.95 + 0.16) than those not increasing the capacity.

      However, the second paragraph says that this 0.95 do not have a unit of measure? I thought that because both variables (x and y) are growth rates, the coefficient for "x" would be read as: increasing 1% the capacity will increase 0.16% the sales. If I am not wrong (very likely I am), this 0.95 above the 0.16 should be read also as %. I just need to understand why am I wrong.

      I am worried that I cannot offer an economic interpretation of my results.

      Comment


      • #4
        If newintra2 were 0 (the mean value), then the slope of the y:x relationship would be 0.1697371 (0.17 to 2 decimal places). When newintra2 is 1 (1 standard deviation up), then the slope of the y:x relationship would be 0.17 + 0.95 = 1.12.

        You have now stated that these "growth rates" are actually ln(xt+1) - ln(xt-1+1). So they are not actually growth rates. Actual growth rates would be (xt - xt-1)/xt-1. By the way, I have never seen this logarithmic formula in relation to growth rates. There is a commonly used approximation formula: ln(xt) - ln(xt-1), which is like your formula but without the +1 terms. Perhaps this is what you meant? When the growth rate is small, ln(xt) - ln(xt-1) is a decent approximation to the actual growth rate. I don't quite understand why this formula is still in use. It made sense when calculations were done by hand and looking up logs in a table and subtracting was easier than dividing. But with computers, division is easier than taking logs. And, more important, while it is a good approximation of the actual growth rate when the growth rate is small, it is terrible when the growth rate is large. But I digress.

        Be that as it may, let's look at the dimensions of these expressions. The logarithmic expression is dimensionless because logarithms themselves are dimensionless. As for the actual growth rate, the numerator is the difference over a year's time of quantities with the same dimensions. So the dimensions of the numerator are the units of x per year. (If these are monthly data, then just substitute month for year everywhere.) The dimensions of the denominator are just the dimensions of x itself. So the dimensions of their ratio, the growth rate, would be units of x per year / units of x = year-1 (i.e. per year).

        Now, whichever version of "growth rate" x and y are, I assume you mean that both are in the same dimensions, either dimensionless altogether, or in units of per year. Now, the slope of the y:x relationship will have dimension units of y / units of x. But y and x have the same units. So the slope is dimensionless.


        I thought that because both variables (x and y) are growth rates, the coefficient for "x" would be read as: increasing 1% the capacity will increase 0.16% the sales.
        That is a correct interpretation of the coefficient for x conditional on newintra2 being 0. This is true because when newintra2 is 0, the interaction term is nullified. But if newintra2 changes to 1, that is associated with two different effects on y. First there is the direct increase of y by 0.01 (units of y) coming from the coefficient of newintra2 itself, and then there is an additional change in y due to the fact that the y:x relationship has now been changed by 0.95 (dimensionless). That actually plays out as the y:x slope times the value of x, which is not a fixed number but depends on x. The units of this change, y:x slope * x, has the units of y (which is either dimensionless or per year as discussed in the third paragraph of this post). If you are trying to interpret just the interaction coefficient, which I took to be your purpose in #1, then the initial 0.01 would be ignored, and only the 0.95 would be mentioned.
        Last edited by Clyde Schechter; 13 Apr 2023, 13:04.

        Comment


        • #5
          Dear Clyde Schechter, this is a very detailed explanation, thanks a lot for it. You are right about the log formula for the growth rate, however, I have seen in some papers from Economics that researchers include +1 to the log for dealing with zero values in the growth rate measures. I do not know other ways of dealing with zero values for a growth rate. And you are also right. My aim is to explain how the effect from newintra2 will be different depending on the type of firms (those increasing their capacity with respect to those not increasing it).

          I took the standardized values of intra and inter variables because I thought that it will be better for comparison among coefficients, but it seem I am in trouble if I cannot explain what is this 0.95.

          Thanks anyway, this has been a very enlightening conversation!

          Comment


          • #6
            You are right about the log formula for the growth rate, however, I have seen in some papers from Economics that researchers include +1 to the log for dealing with zero values in the growth rate measures. I do not know other ways of dealing with zero values for a growth rate.
            Although this is tangential to the original topic of this thread, I don't feel I can let this one go.

            [RANT]
            If a value starts at 0 and changes to, well, anything other than 0, the growth rate is not definable because you can't divide by 0. If you wanted to use language loosely, you could call it an infinite growth rate. But adding the 1's in there is just a really awful kludge. If you were to add 0.01, or 1000 instead of 1, the results would be very different. And, no matter what you add, given that the true growth rate is "infinite" the error involved is infinitely large.

            In other contexts, namely the log-transformation of variables that include 0 values, I have debated with others here about the appropriateness of log(1+x) as a substitute. While I still think it is not a good idea and would not use it myself, I have been persuaded that there are circumstances where it is not entirely unreasonable to use it. But in the context of calculating a growth rate, I really think it is utterly indefensible. I do not work in economics or finance, but even in my minimal exposure to those disciplines here on Statalist, I can't recall ever seeing that +1 variant of the formula used, so I suspect it is seldom used. But I feel strongly it should never be used. For disciplines that get so tightly focused, insisting on consistent or, when possible, unbiased estimation and highly refined calculations to get standard errors that are robust to all sorts of issues in the data, it should be an embarrassment to calculate a meaningless "growth rate" in a context where no actual growth rate can be defined, by applying a dreadful hack to a formula that is, itself, at best, an approximation to the real thing under limited circumstances! I'm truly gobsmacked!
            [/RANT]

            Comment


            • #7
              Clyde Schechter thanks for your comment. I always learn from your posts. Maybe a possible solution would be to try absolute differences? The only thing with that would be that I will treat the same a difference from 50 to 100, than another from 1000 to 1050. Thanks a lot again!

              Comment


              • #8
                Dear all, I am struggling on how to interpret the effect of an interaction when the interaction is significant but the "main" effect is not. I have read several posts pointing that it is irrelevant if the "main" (or separated effects) are not significant. However, another story is to numerically interpret the interaction. For instance, in the next table below I can see that the interaction between X and Z1 is positive and significant (0.06). This would imply that the effect of Z1 is higher when X increases. Here Y is the ln(Y +1) while X is the absolute difference between two consecutive periods (an absolute growth rate) and Z1 is also standardized.

                However, given that the effect for Z1 when X is zero is highly negative (but not significant):

                I am wondering if the total effect of Z1 for those increasing X is still negative (-0.36 + 0.06 = -0.3). Or because the effect for Z1 when X is zero (-0.36) is not significant, this should not count in the computation, and then, the effect would be positive(0.06). Can someone please explain (if possible) why in the computation must (or must not) be included a value even though it is not significant?

                Thanks a lot in advance for your help!

                Code:
                ------------------------------------------------------------------------------
                             |               Robust
                           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                           X |
                         L1. |   .0355801     .01776     2.00   0.054    -.0005959    .0717561
                             |
                          Z1 |
                         L1. |  -.3615165   .2188498    -1.65   0.108     -.807299    .0842661
                             |
                          Z2 |
                         L1. |   .4437837   .1078224     4.12   0.000     .2241567    .6634107
                             |
                          Z3 |
                         L1. |    -.10062    .098738    -1.02   0.316    -.3017427    .1005027
                             |
                  cL.X#cL.Z1 |   .0640406   .0238706     2.68   0.011     .0154177    .1126635
                             |
                  cL.X#cL.Z2 |  -.0574914   .0181502    -3.17   0.003    -.0944622   -.0205206
                             |
                  cL.X#cL.Z3 |   .0096383   .0307078     0.31   0.756    -.0529115    .0721882
                             |
                       _cons |   3.207746   .0122647   261.54   0.000     3.182764    3.232729
                ------------------------------------------------------------------------------

                Comment


                • #9
                  I am wondering if the total effect of Z1 for those increasing X is still negative (-0.36 + 0.06 = -0.3). Or because the effect for Z1 when X is zero (-0.36) is not significant, this should not count in the computation, and then, the effect would be positive(0.06). Can someone please explain (if possible) why in the computation must (or must not) be included a value even though it is not significant?
                  Yes, it absolutely, definitely must, must, must be included in the calculation, whether significant or not. This is just one of the many reasons that the notion of statistical significance is confusing at best. Even for those who take statistical significance seriously (I am not one), properly understood a non-significant result does not mean that the effect is zero or non-existent or any other such nullity. It just means that the results are too imprecise to determine is direction with the desired level of confidence. The mistaken yet widespread idea that non-significant results should be ignored or omitted is one of the most pernicious effects of the statistical significance mindset. And one of the strongest reasons some of us think that the notion of statistical significance itself should be retired from statistical practice.

                  Also, it is incorrect to refer to the sum -0.36 + 0.06 as "the total effect of Z1 for those increasing X." It is the total effect of Z1 when X = 1. There is no such thing in this model as "the total effect of Z1 for those increasing X." Z1 has a different effect for every single value of X. That value can be calculated as -0.36 + 0.06*X. Better still, use -margins- to do it: -margins, dydx(Z1) at (X = (insert_list_of_values_of_X_here))-.
                  Last edited by Clyde Schechter; 08 May 2023, 07:59.

                  Comment


                  • #10
                    Dear Clyde Schechter , thank you once again. It is completely clear now. I remember my classes of econometrics about the nulity of a non-significance of a parameter. But it seems it is not like this. I assume then, that the same would apply if I want to compare two coefficients (equality) and one of them is not significant. I thought that in such a case, nothing have to be done, but following what you said, I think it have to be done through a test (independent of the significance of the parameters).

                    Very good suggestion about the margins. Given that the two variables X (the absolute growth rate) and Z are standardized, X is negative but also positive, and depending on the intensity, the effect could be also positive. Even though, it will be hard to be positive, since X=3 (imply 3 SD above the average that is 0, and this considers around 99% of cases if I am not wrong from my basic statistics). So, still would be negative. All this is clear from the marginsplot I assume.

                    Thanks a lot again.

                    Comment


                    • #11
                      I assume then, that the same would apply if I want to compare two coefficients (equality) and one of them is not significant.
                      Exactly. Just because one is significant and the other is not does not mean that you can automatically assume they are not equal. You have to explicitly test equality between coefficients regardless of their significance or lack of significance.

                      Here's an example you can run:
                      Code:
                      clear*
                      
                      set obs 100
                      
                      set seed 1234
                      
                      gen x1 = rnormal(0, 1)
                      gen x2 = rnormal(0, 0.01)
                      
                      gen y = 1 + 2*x1 + 2*x2 + rnormal(0, 0.5)
                      
                      regress y x1 x2
                      test x1 = x2
                      Note that by design, the true coefficients of x1 and x2 are both exactly 2. The difference betwen x1 and x2 is that x1 has a fair amount of variation, whereas x2 is nearly constant. As a result, the esetimated coefficient of x1 is very very close to 2 and strongly statistically significant, whereas that of x2 is near 2, but not so much as x1, and not even close to statistically significant. A test fails to reject the hypothesis that they are equal.
                      Last edited by Clyde Schechter; 09 May 2023, 06:55.

                      Comment


                      • #12
                        Dear Clyde Schechter , now I have clear about the necessity of statistically test even though one parameter is not significant. Without making this too large or bothering you too much, can I ask about the computation Stata does when doing a linear test of difference between two parameters? In the table below I have two coefficients Z2 and Z22 (one not significant) for which I do a test (same as you show in #11). The test says that the parameters are not statistically different (F = 0.6). However, if I follow the formula for calculate the value (B_Z2 - B_Z22)/(SE_Z2)^2 + (SE_Z22)^2 - 2*COV(Z2, Z22)] I obtain a much higher number (1.74). It is very likely I am doing something wrong, but I do not understand such a difference 0.6 vs 1.74.

                        Code:
                        HDFE Linear regression                            Number of obs   =     39,251
                        Absorbing 2 HDFE groups                           F(  11,     32) =       3.19
                        Statistics robust to heteroskedasticity           Prob > F        =     0.0050
                                                                          R-squared       =     0.5346
                                                                          Adj R-squared   =     0.4688
                                                                          Within R-sq.    =     0.0011
                        Number of clusters (sector1) =         33         Root MSE        =     3.4371
                        
                                                       (Std. Err. adjusted for 33 clusters in sector1)
                        ------------------------------------------------------------------------------
                                     |               Robust
                                   Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                   X |
                                 L1. |   .0322015   .0196845     1.64   0.112    -.0078944    .0722975
                                     |
                                  Z1 |
                                 L1. |   .0596753   .1748308     0.34   0.735    -.2964434     .415794
                                     |
                                 Z11 |
                                 L1. |  -.2043395   .2649355    -0.77   0.446    -.7439955    .3353165
                                     |
                                  Z2 |
                                 L1. |   .3652778    .153347     2.38   0.023     .0529202    .6776355
                                     |
                                 Z22 |
                                 L1. |   .0186922   .4285749     0.04   0.965    -.8542864    .8916708
                                     |
                                  Z3 |
                                 L1. |   -.060815   .1663623    -0.37   0.717    -.3996839     .278054
                                     |
                                 Z33 |
                                 L1. |  -.2419099   .2924218    -0.83   0.414    -.8375535    .3537338
                                     |
                                  C1 |
                                 L1. |  -4.020517   2.552846    -1.57   0.125    -9.220493    1.179459
                                     |
                                  C2 |
                                 L1. |  -3.053938   2.354964    -1.30   0.204    -7.850843    1.742967
                                     |
                                  C3 |
                                 L1. |   .9634577   1.174328     0.82   0.418    -1.428571    3.355486
                                     |
                                  C4 |
                                 L1. |  -.0010472   .0042462    -0.25   0.807    -.0096965     .007602
                                     |
                               _cons |   3.293579   .2106303    15.64   0.000      2.86454    3.722619
                        ------------------------------------------------------------------------------
                        
                        Absorbed degrees of freedom:
                        -----------------------------------------------------+
                         Absorbed FE | Categories  - Redundant  = Num. Coefs |
                        -------------+---------------------------------------|
                                year |        10           0          10     |
                               ident |      4839        4839           0    *|
                        -----------------------------------------------------+
                        * = FE nested within cluster; treated as redundant for DoF computation
                        
                        . 
                        end of do-file
                        
                        . test (_b[L.Z2] = _b[L.Z22])
                        
                         ( 1)  L.Z2 - L.Z22 = 0
                        
                               F(  1,    32) =    0.60
                                    Prob > F =    0.4430
                        
                        . estat vce
                        
                        Covariance matrix of coefficients of reghdfe model
                        
                                     |          L.          L.          L.          L.          L.          L.
                                e(V) |          X          Z1         Z11          Z2         Z22          Z3 
                        -------------+------------------------------------------------------------------------
                                 L.X |  .00038748                                                             
                                L.Z1 |  .00050341   .03056581                                                 
                               L.Z11 | -.00065029   .00930962   .07019084                                     
                                L.Z2 | -.00024767  -.01476616  -.00941136   .02351531                         
                               L.Z22 | -.00357924  -.01653082  -.02616403   .00403565   .18367648             
                                L.Z3 | -.00039759   .00316878   .00468565  -.00348075  -.00737159   .02767642 
                               L.Z33 | -.00234532  -.00932901  -.00867494   .00174343    .1199281  -.00249024 
                                L.C1 | -.00468507  -.07720241  -.05997288  -.08845163  -.01574868  -.15131779 
                                L.C2 |  .00953405   .10630977   .21519308  -.08612315  -.35992764   .01747702 
                                L.C3 |  .00068249   .00255426   .09587857  -.04244434  -.09543457  -.03531601 
                                L.C4 |  5.430e-06  -.00005719   5.576e-06   .00002273  -.00014242   .00014689 
                               _cons | -.00005457   .00533441   .00266053  -.00159308  -.00581798  -.00618888 
                        
                                     |          L.          L.          L.          L.          L.            
                                e(V) |        Z33          C1          C2          C3          C4       _cons 
                        -------------+------------------------------------------------------------------------
                               L.Z33 |  .08551048                                                             
                                L.C1 | -.03953488   6.5170202                                                 
                                L.C2 | -.25675292  -1.5446602   5.5458561                                     
                                L.C3 | -.05437419   .56495606    .2420706    1.379047                         
                                L.C4 | -2.541e-06  -.00099483  -.00195401   .00025957   .00001803             
                               _cons | -.00831684   .03875185   .12848626   .00504087   -.0008826   .04436511 
                        
                        . di (.3652778 - .0186922)/(.153347^2 + .4285749^2 - 2*.00403565)
                        1.7405827

                        Comment


                        • #13
                          Your formula for the F statistic in the hand-calculation isn't quite right. You got the hard part, the denominator, correct. But the numerator should be the square of what you show. If you fix that, you will see that it agrees with -test-'s calclulation:
                          Code:
                          . di ((.3652778 - .0186922)^2)/(.153347^2 + .4285749^2 - 2*.00403565)
                          .60326089

                          Comment


                          • #14
                            Perfect, thanks a lot!

                            Comment

                            Working...
                            X