Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regression & interaction terms

    Hey everybody,

    I have a linear regression model where I used interaction terms to see if my two treatments modify the relationship between originality of an answer and three other dimensions (fluency, flexibility, elaboration).

    Code:
    . reg originality treat1 treat2 fluency flexibility elaboration treat1_flu treat1_flex treat1_elab treat2_fl
    > u treat2_flex treat2_elab, robust
    
    Linear regression                               Number of obs     =        178
                                                    F(11, 166)        =       4.64
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2189
                                                    Root MSE          =     .06348
    
    ------------------------------------------------------------------------------
                 |               Robust
     originality |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          treat1 |   .0945984   .0417611     2.27   0.025     .0121469    .1770498
          treat2 |   .0959994   .0397926     2.41   0.017     .0174346    .1745642
         fluency |   .0011194   .0021622     0.52   0.605    -.0031495    .0053883
     flexibility |    .007972   .0051042     1.56   0.120    -.0021056    .0180495
     elaboration |   .0054839   .0029268     1.87   0.063    -.0002947    .0112626
      treat1_flu |   .0065834   .0044946     1.46   0.145    -.0022905    .0154574
     treat1_flex |  -.0180415   .0088497    -2.04   0.043     -.035514   -.0005691
     treat1_elab |  -.0003588   .0051295    -0.07   0.944    -.0104864    .0097687
      treat2_flu |    .003234   .0029645     1.09   0.277     -.002619     .009087
     treat2_flex |  -.0100783   .0060007    -1.68   0.095    -.0219259    .0017692
     treat2_elab |  -.0044916   .0038248    -1.17   0.242    -.0120432    .0030599
           _cons |   .7248947   .0313272    23.14   0.000     .6630435    .7867458
    ------------------------------------------------------------------------------
    Although there are statistically significant coefficients for some interaction terms in the model, when I run a test if the overall interaction is significant, it is not.

    Code:
    test treat1_flu treat2_flu treat1_flex treat2_flex treat1_elab treat2_elab
    
     ( 1)  treat1_flu = 0
     ( 2)  treat2_flu = 0
     ( 3)  treat1_flex = 0
     ( 4)  treat2_flex = 0
     ( 5)  treat1_elab = 0
     ( 6)  treat2_elab = 0
    
           F(  6,   166) =    1.15
                Prob > F =    0.3380
    How is that possible? Is my approach correct? Does this mean that I should leave all interaction terms out of the regression and only use the main effects? Like this:

    Code:
    reg originality treat1 treat2 fluency flexibility elaboration
    Thanks a lot in advance!

  • #2
    This situation is not uncommon. And it doesn't mean you did anything wrong in your modeling. The joint significance of a group of variables is not a simple function of the p-values of each of the variables. First note that only one of these variables as a p-value < 0.05 (the conventional significance level), and that one only barely so. The test of significance for a single variable basically looks at that variable's coefficient divided by its standard error and compares it to a threshold in the t distribution. But the test of significance for several variables jointly looks, more or less, at the sum of the squares of the coefficients and their cross-products, divided by a combined variance estimator, and compares that to a threshold in the F distribution. (Geometrically, the one-variable test identifies whether a point falls in an interval; the multiple variable test identifies whether a point falls in a multi-dimensional ellipsoid, and an ellipsoid whose axes are usually oblique to the variable axes.) Even if one term in this sum is somewhat large, if the others are small, the net effect may be a small F statistic, as in your case.

    If the goal of the research was to identify whether either treatment modifies the effects of fluency, flexibility, and elaboration, then the job is done: the answer is no, and you don't need to do the second regression, as you already have your answer. If, on the other hand, you want to additionally identify the direct effects of the treatments, or of fluency etc., then the second regression would be appropriate.

    If you originally had two separate hypotheses to test, one about treatment 1, and the other about treatment 2, then instead of the omnibus test you did, you should separately
    Code:
    test treat1_flu treat1_flex treat1_elab
    test treat2_flu treat2_flex treat2_elab
    There may be other sets of test that are germane. It really depends on what your research hypothesis was--you have to tailor the test to it.

    Comment


    • #3
      Thanks a lot, that was really helpful! I have one last question, if you would be so kind?
      If the goal of the research was to identify whether either treatment modifies the effects of fluency, flexibility, and elaboration, then the job is done: the answer is no, and you don't need to do the second regression, as you already have your answer. If, on the other hand, you want to additionally identify the direct effects of the treatments, or of fluency etc., then the second regression would be appropriate.
      I actually started with the second regression because I want to know the direct effects of the treatments and the three variables, and then added the interaction effects, as I thought I should also check whether either treatment modifies the three variables. According to your statement, this would mean that for the direct effects I would look at the second regression without interaction terms and could not use the coefficients of the first regression with interaction terms. Is that because of the joint insignificance?
      Could I report it separately like that, arguing that the coefficients for the direct effects of the treatments or fluency of the first regression with interaction terms etc. are not "valid" due to this joint insignificance, so that I use the ones of the second regression without interaction terms? I'm not quite sure how to structure my report...

      Sorry to ask such beginner questions, it's the first time I'm writing an empirical paper.

      Thank you!

      Comment


      • #4
        According to your statement, this would mean that for the direct effects I would look at the second regression without interaction terms and could not use the coefficients of the first regression with interaction terms. Is that because of the joint insignificance?
        Yes. Your data have demonstrated that they are consistent with the absence of effect modification. So the model without interaction terms would be used instead to estimate the treatment effects (and the covariate effects if that is part of the plan).

        Could I report it separately like that, arguing that the coefficients for the direct effects of the treatments or fluency of the first regression with interaction terms etc. are not "valid" due to this joint insignificance, so that I use the ones of the second regression without interaction terms? I'm not quite sure how to structure my report..
        The interaction term coefficients are valid, they're just not distinguishable from zero in your data, and collectively they do not signal the presence of any effect modification.

        So based on your explanation, I would probably start my report with the no-interaction-terms regression and discuss the findings about the treatments (and the covariates if that is part of the goal), and then I would simply add a comment that you did an additional analysis to see if treatment effects were modified by the covariates, and you found no evidence that they were. If you, or your intended audience, had expectations before the analysis that there should be effect modification, you might try to explain away the absence of an effect modification signal as a matter of statistical power or noisy measurements--assuming, that is, that you really are underpowered or have noisy measurements. With 178 observations and 3 treatment groups, if they are of about equal size, you have about 60 observations per group, which should be enough to detect at least moderately large interaction effects if your measurements are precise, but probably wouldn't detect small ones, and would also fail with low-reliability measurements.

        Comment

        Working...
        X