Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction effect between independent variables (both time variant)

    Hi,
    This is my first post here, I hope it will work!
    I'm working with Panel Data. My dependent variables are: number of disclosures, patents and licenses. For each of these, my main independent variables are: industrial funding and federal funding and my control variables are 3 dummy variables which are all time invariant.


    Example: disclosures = fed_fund it + ind_fundit + dummy1 + dummy2 + dummy 3 + eit

    I'm trying different models: count models (poisson, nb, nb with robust standard error, nb with fixed effect..) and to understand if the industrial funding and federal funding coefficients have a different magnitude and if they interact somehow.

    have a few strong doubts

    1) Should I run two different regressions: one with Industrial funding and one with federal funding, or is better to have them together in the same one?

    2) Is it correct if I ask stata about the interaction effect in this way? Thank you so much in advance!


    xtreg disclosure c.L2.logfedexp#c.L2.logindexp licftes i.year, fe



    Fixed-effects (within) regression Number of obs = 3010
    Group variable: id Number of groups = 229

    R-sq: within = 0.4673 Obs per group: min = 1
    between = 0.8844 avg = 13.1
    overall = 0.7680 max = 22

    F(24,2757) = 100.77
    corr(u_i, Xb) = 0.6853 Prob > F = 0.0000

    ---------------------------------------------------------------------------------------------
    disclosure | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ----------------------------+----------------------------------------------------------------
    cL2.logfedexp#cL2.logindexp | .3689174 .0639286 5.77 0.000 .2435646 .4942701
    |
    licftes | 8.796895 .3210443 27.40 0.000 8.167383 9.426406

    _cons | -68.92083 18.03413 -3.82 0.000 -104.2826 -33.55907
    ----------------------------+----------------------------------------------------------------
    sigma_u | 63.774985
    sigma_e | 42.352111
    rho | .69395724 (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------------
    F test that all u_i=0: F(228, 2757) = 14.43 Prob > F = 0.0000

    Last edited by Roberta Pilgrim; 02 Dec 2016, 14:21. Reason: (I added tags!)

  • #2
    With the caveat that I am not an economist or finance specialist and may be unaware of non-statistical disciplinary issues that would dictate the opposite, my advice is to use the two expenditure categories in the same regression model and include the interaction term. The model you have shown, however, does not do it correctly. You need to include the main effects of those variables along with the interaction. The interaction term by itself does not provide a meaningful model here and does not accomplish your goals. All you need to do to fix that error is replace # by ## in your regression equation. See -help fvvarlist- for the distinction between # and ##.

    The reason for including both and their interaction, as opposed to doing separate models is:

    1) The two levels of expenditure may well be correlated with each other. If so, you could never capture that in two separate models.

    2) One would expect that there could be both synergies at some levels, and interferences at other levels, between these fundings. So an interaction term, at the least is needed. Indeed, these considerations raise the question of whether even more complicated models might be warranted.

    Comment


    • #3
      Clyde, Thank you so much for this response!

      I've been trying many models and it's being hard: in all the examples below, the Z is always too high and the std. error too low, but in this literature, for this kind of research questions, I found all the time Count Models (mostly NB) and Fixed Effect, which with me don't seem to be enough. I wonder what I am doing wrong, if you have any kind of advice I would really appreciate!

      These are all my trials:

      POISSON
      poisson disclosure L2.logfedexp L2.logindexp licftes medschdummy pub_privdummy



      NEGATIVE BINOMIAL
      - nbreg disclosure L2.logfedexp L2.logindexp licftes medschdummy pub_privdummy
      - nbreg disclosure L2.logfedexp L2.logindexp licftes medschdummy pub_privdummy, vce(cluster id)
      - nbreg disclosure L1.logfedexp L1.logindexp licftes medschdummy pub_privdummy, vce(cluster id)
      - xtgee disclosure L1.logfedexp L1.logindexp licftes medschdummy pub_privdummy, family(nb)

      - xtgee disclosure L2.logfedexp L2.logindexp licftes medschdummy pub_privdummy, family(nb)

      RANDOM EFFECTS MODEL


      menbreg disclosure L2.logfedexp L2.logindexp licftes medschdummy pub_privdummy || id:

      FIXED EFFECTS MODEL

      nbreg disclosure logfedexp logindexp i.id if sumdisclosure~=0

      areg disclosure L2.logfedexp L2.logindexp licftes medschdummy pub_privdummy i.year, absorb(id)




      Comment


      • #4
        But these are non-interaction models you are showing. And I don't know what you mean when you say that Z is too high and the standard error too low.

        Another problem is that the -nbreg- and -poisson- commands are not panel-data analysis commands, so it is not taking your panel structure into account and is not a valid estimator for this kind of data (unless you have evidence that observations are independent within panels). You need to use -xtnbreg- or -xtpoisson- for this. You attempted to mimic a panel model with -nbreg disclosure logfedexp logindexp i.id if sumdisclosure~=0-, but the trick of including indicator variables for the panels (i.id) that works for ordinary regression is not valid with this model.

        -areg- won't do anything for you that -xtreg, fe- doesn't do. (It is basically the same command for practical purposes, although it calculates some degrees of freedom differently.) So I wouldn't waste time with it. And include the interaction terms.

        It's panel data, so work with panel estimators: xtreg, xtpoisson, xtnbreg. (xtgee is OK too, but it estimates a different model.) I still recommend including an interaction term.

        Whether that will get you the results you are expecting, I do not know, as I don't know what you expect to find, and I don't know what expectations would even be reasonable for this kind of data--it's out of my area.

        Comment


        • #5
          Thank you!

          1) I always use the command:xtset id year to let stata know that I have panel, and I thought it was unnecessary to use xt in front of the command! Was I wrong? I just tried to run a xtpoisson and the coeffcients are slightly different, same p-value, really low standard errors, too high the test statistic, which I mistakenly called z value and consequently a too low P-Value.

          2) Maybe using the expression "results I am expecting" I wasn't be clear at all, what I meant to say is that even if I see that all my variables are highly significant, I know that the results are not ok and it seems that trying with Robust Standard Errors, GEE, random effects models and fixed effectd models, is not really helping.

          3) The interaction effect between the two variables is important just for some of my hypotheses, this is why you don't seem me using it in all those examples.

          4) About my research, maybe it might be helpful to clarify it: I am sure, and the literature is confirming it too, that the R&D expenditures affect the research output (in this case I'm measuring disclosures). What I would like to ask the data, it's if Industrial and Federal funding affect the research output differently (I know they're both significant, and I might be really weird if they weren't, but I'm interested in the magnitude of the coefficients.
          Example: does the federal funding affect discosures less than industrial funding does? how do these two different kinds of funding interact with each other? Do I find more disclosures when there is a balance between the fundings or is this element completely unrelated to the research output?)


          Thank you so much for your help and for your patience,
          Roberta

          Comment


          • #6
            Sorry, what I meant to say:

            3) The interaction effect between the two variables is important just for some of my hypotheses, this is why you don't see me using it in all those examples. But I have the same issues with or without interaction effect (maybe I should have post two different posts..)

            Comment


            • #7
              1) I always use the command:xtset id year to let stata know that I have panel, and I thought it was unnecessary to use xt in front of the command! Was I wrong?
              Yes, that's wrong. -xtset- does not commit Stata to using -xt- commands. You still have to specify them.

              And I still don't understand what you mean when you say that your standard errors are too small or your pvalues too high. Perhaps you could show us an example of the output your getting and then say what you think it should look like.

              Remember that statistics like standard errors depend on a lot of aspects of the study: sample size and measurement variability. They can also vary in different time periods and settings. The same is necessarily true for p-values, which also depend on the estimated effect size--which, in turn, can vary in different times and settings.

              Comment


              • #8
                Thank you, again and again.

                Here there are a few examples where I understand that there is something wrong but I don't know how to fix. Please, keep in mind that the same dataset has been used in so many different studies, with random/fixed effects and nb models. Even if research questions were a little bit different,I didn't think that it was a problem of sample size or measurement variability because no one ever wrote that in all those papers using this sample.

                I attached some screenshots with the results!
                Attached Files
                Last edited by Roberta Pilgrim; 03 Dec 2016, 17:04.

                Comment


                • #9
                  When I use xtreg and fixed effect, it seems a little bit better but the coefficients seem to be too high again and I don't think it's normal, maybe I am missing something in the way I should interpret them and it's ok if they're so high?
                  Attached Files

                  Comment


                  • #10
                    OK, so it is the coefficients that are unexpectedly large here, not the standard errors and p-values.

                    Now, the first thing that strikes me is that in the -xtgee- model you are using L1.logfedexp and L1.logindexp, whereas in the other models you are using L2.logfedexp and L2.logfedexp. While I wouldn't ordinarily expect a change from two lags to one lag to produce such a dramatic shift in coefficients, it is worth remembering that you are estimating a different model, so you can't expect the results to be the same. But, in this case, there is another difference which I think may be at the root of your concerns. By changing from L2 in the other models to L1 in the -xtgee- model, you are also changing the estimation sample. Notice that the sample size in the -xtgee- model is 3245 whereas in the other models it is 3010. The additional 235 observations are the observations occurring in the second year of your data (one for each id). They enter the model now because being in the second year they do have a first-lag but don't have a second lag. I suspect if you explore the data in these newly added observations you will find some strange outlying data that is throwing things off in a big way. (There is yet a third difference, though I think its impact is small: -xtgee- estimates population averaged effects, whereas the other models you are using estimate individual-level effects. For non-linear models these will be different--but generally by a modest amount.)

                    The other model where you see some rather different coefficients is in the -xtreg, fe- model you show at the end. Here the difference is expected, however. Remember that in the -nbreg- and -poisson- models you are (implicitly) using a log link, whereas in -xtreg- it is linear. That is sufficient to account for the difference you see there.

                    So, to sum it up, the -xtreg, fe- coefficients look reasonable to me because they are estimated on a linear rather than a log scale. The -xtgee- estimates look troublesome, and I suspect the problem lies in problematic data among observations in the second year of the data.

                    Comment


                    • #11
                      Hi Clyde, it's me again!
                      I didn't know if it was better to start another post but I'll try here first.
                      I tried to the Interaction Model but I don't understand why stata says "omitted variables" and I can't find any clear explanation on how I should interpret the interaction coefficient ( c.x1##c.x2 )
                      Thank you




                      Attached Files

                      Comment


                      • #12
                        This is not a problem. In your regression you specified L4.logfedexp and L4.logindexp and c.L4.logfedexp##c.L4.logindexp. When you use ## (instead of #), Stata expands that term to L4.logfedexp and L4.logindexp and c.L4.logfedexp#c.L4.logindexp. But since the L4.logfedexp and L4.logindexp terms were already there, the second copies are redundant and Stata just omits them. The same term can't appear twice in the same regression model. You can avoid this confusion by either using just c.L4.logfedexp##c.L4.logindex (with ##), or using all of L4.logfedexp and L4.logindexp and c.L4.logfedexp#c.L4.logindexp (with just one #). Either way is equivalent and avoids the creation of redundant terms that get omitted. Do read the manuial section on fvvarlist (-help fvvarlist- and then take the link to the documentation from there). It explains this and much more about factor variable notation.

                        As for the interpretation of a continuous by continuous interaction, you can think of it this way. Imagine that you did the regression with only logfedexp (and your covariates licftes, etc.) but not logindexp included. But imagine that you want not a single coefficient for logfedexp, but you want the coefficient of logfedexp to depend linearly on logindexp. That is what the interaction model gives you. There is no such thing as an effect of logfedexp in this model. Rather, there are infinitely many effects of logfedexp, and which one applies to a given observation depends on that observation's value of logindexp by the formula coefficient of logfedexp = 10.242 - 0.2745151*logindexp.

                        By the way, there is nothing special about logfedexp. You could also think of it as a model containg logindexp, where the coefficient of logindexp depends on logfedexp linearly.

                        But really the best way to understand how this model works is with graphs rather than words. So pick some interesting values of logfedexp (say they are 1, 1.5, 2, 2.5, and 3) and some interesting values of logindexp (say they are -0.2, 0, and 0.2) and run the following commands:

                        Code:
                        margins, at(logfedexp = (1(0.5)3) logindexp = (-0.2 0 0.2))
                        marginsplot
                        and Stata will give you a graph showing how the predicted outcome varies with different values of logfedexp and logindexp. (Evidently replace the numbers in the -margins- command with actually interesting values of those variables.)

                        Comment


                        • #13
                          Hi Clyde,

                          I'm looking at the interaction effect between these two independent variables: INDUSTRIAL R&D and FEDERAL R&D. The independent variable is the number of disclosures.

                          nbreg disclosure c.L1.logfedexp##c.L1.logindexp licftes L1.log_license_income L1.logotherfund loglegfee year i.id if sumdiscl~=0

                          I attached a screenshot with the regression: the coefficient of the interaction term is positive, but both the single coefficients of the two terms become negative (they were both positive when I did not use the interaction term.
                          How should I interpret it?
                          Industrial and Federal funding interact while affecting the number of yearly disclosures. But what about the negative coefficients?
                          Thank you in advance








                          Attached Files

                          Comment


                          • #14
                            There is nothing surprising or unusual in this. Remember that in an interaction model, the "main effects" no longer have the meaning of being main effects. By using an interaction model you have stipulated that there is no single effect of logfedexp or logindexp on disclosure. Rather there is a different effect of logfedexp corresponding to each value of logindexp (and vice versa.) The statistic that shows up in the output as the coefficient of logfedexp is the effect of logfedexp on disclosure conditional on logindexp being zero. (And, again, vice versa.) More generally, at any given value of logindexp, the effect of logfedexp is -0.0757 + 0.0204*logindexp. So at larger values of logindexp, the value of logindexp is also larger. And it will be positive whenever logindexp exceeds 0.0757/0.0204, or about 3.15. The effect, conditional on the other being zero, may or may not be of any interest. If zero is not even in the range of observed values of logindexp, then the number is a hypothetical abstraction, an extrapolation from the data and of little or no meaning or use. Even if zero is in the range of observed values, it may or may not be important enough to warrant spending much time thinking about. The importance is, of course, a matter of the context and what a zero value of that variable means in the real world.

                            By contrast, when you do a non-interaction model, you are constraining the model to providing a single effect estimate for each variable logfedexp and logindexp that does not depend on the other. That single effect will be something like an average effect (though I would argue that it is really just a fictitious number that is probably useless for most purposes). In any case, assuming that the distribution of logindexp has a substantial part of its mass to the right of 3.15, that "average" effect would be positive. So the difference between the interaction and non-interaction results you see here is quite unremarkable.

                            I should add that things are made slightly more complicated by the fact that you are using an -nbreg- model, which has a log link. So in the above when I refer to the effect of something on disclosure, it is really the effect of that something on the expected value of the logarithm of the number of disclosures. In terms of directions of effects, this makes no difference, but it just makes the interpretation of the coefficient that more complicated.

                            It is difficult to really grasp the implications of an interaction model from its coefficient table, especially when you also have a log link in the mix. So I recommend examining it graphically. Choose a salient range of values of logfedexp and logindexp. (For purposes of illustration, I'll take the former to be 0, 1, 2, 3, 4 , and the latter to be -1, -0.5, 0, 0.5, and 1. But you should choose numbers that lie within the range of the data you have and are interesting in the context of your problem.) Then you can run:

                            Code:
                            margins, dydx(logfedexp logindexp) at(logfedexp = (0(1)4) logindexp = (-1(0.5)1))
                            marginsplot
                            to get a picture of how the marginal effects of each of these variables varies in relation to the levels of both variables. It is also useful to see how this translates in terms of the expected values of the disclosure outcome (and here I'm referring not to log disclosure but disclosure itself).

                            Code:
                            margins, at(logfedexp = (0(1)4) logindexp = (-1(0.5)1))
                            marginsplot
                            The -marginsplot- command has several options that allow you to modify the appearance of the graph to suit your needs, and, it also accept nearly all of the options available in -graph twoway-, so you can really customize the plot.

                            I think this is a situation where a couple of pictures are truly worth several thousand words.

                            Comment


                            • #15
                              Clyde, your help is so important to me!
                              Thank you for helping me so much in these days with my PhD dissertation

                              Comment

                              Working...
                              X