Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction Term Interpretation/Multicollinearity/polychoric correlation.

    Dear Statalist,

    This is my first post, excuse me for the elementary question. I can not find any similar posts that I can interpret. I am using StataMP 15.1
    My problem is the following:

    I am writing my thesis about the relationship between cognitive empathy and altruistic sharing. I've collected data from 126 respondents who all played a dictator game and participated in a survey on empathy using a 5-point likert scale. there are two samples, a control and a treatment group. the only difference between the groups being whether treatment was received. treatment was an exercise in which one had to focus on cognitive components of a situation with an individual in distress before participating in a dictator game. With the help of a tutor we've transformed the ordinal data into continuous data using polychoric correlation.

    I am now running linear regressions on the data. The dependent variable is DG (amounts given in the dictator game) and the dependent variables are treatment, General empathy (obtained from the survey) and an interaction term between the two called TxGE. this is my output:
    (1) (2) (3)
    VARIABLES model 1 Model 2 Model 3
    GE 1.277*** 1.234*** 1.533***
    (0.251) (0.249) (0.338)
    Treatment 1.021** 4.949
    (0.475) (3.062)
    TxGE -0.646
    (0.497)
    Constant -3.503** -3.719** -5.515***
    (1.546) (1.527) (2.057)
    Observations 126 126 126
    R-squared 0.172 0.202 0.213
    As in the third model the std. error of treatment inflates greatly, as well as a rather large change in coefficient of that variable. I concluded I should see if multicollinearity is a problem in the model. However, a my search only results in answers i do not understand. Sometimes multicollinearity is a problem sometimes not, and it seems to depend on whether it concerns a interaction term or not. Anyhow, my questions is: How do I interpret this model? Is multicollinearity a problem in this model? When i run the estat vif command i get the following results.

    Variable | VIF 1/VIF
    -------------+----------------------
    centered_T~E | 43.83 0.022818
    centered_t~t | 42.00 0.023811
    centered_GE | 1.88 0.533141
    -------------+----------------------
    Mean VIF | 29.23

    When i run the model without the interaction term, treatment and GE are both significant. To me it seems that the amounts given in the dictator game depend on the level of general empathy and on whether the respondent received treatment. However, the change in the dependent variable can not be explained through the interaction term. Meaning that stata can not distinguish the effects of general empathy on the outcome on the dictator game based on the value of the treatment dummy.

    Thank you in advance, and happy to receive comments on whether this is a valid question or whether i posted properly.

    greetings from the Netherlands

    Tim Wilmink
    Last edited by Tim Wilmink; 04 Jun 2018, 05:39.

  • #2
    Strong correlations between an interaction variable and its components is expected and is baked into the calculation of the interaction term as a product. Looking at the statistical significance of the T and GE variables by themselves in an interaction model is inappropriate. Neither of these variables, in an interaction model, means what they appear to mean. The coefficient of T in the interaction model is not the effect of treatment. It is the effect of treatment among those with GE = 0. (And depending on how GE was measured, there may not even be any such people.) Similarly, the coefficient of GE does not represent the slope of the DG:GE relationship: it represents that slope only in the T = 0 group. Notice that in your second model, the coefficient of GE you get is 1.234. That is an overall, averaged estimate of the DG:GE slope in both treatment groups combined. In your third model, for the T= 0 group, that slope is estimated as the coefficient of GE, 1.533. And in the T= 1 group, that slope is estimate as the sum of the coefficient of GE and the coefficient of the interaction, 1.533 + (-0.646), or 0.887. So that makes sense; 1.234 is a mix of 1.533 and 0.887.

    To really understand this model, you need to use the -margins- command. You do not show the code for the regression: if you did not use factor-variable notation, then you must re-run it that way first. So it will look like this:

    Code:
    regress DG i.T##c.GE
    Then you need to set out a range of values that spans the observed range of GE. Let's say for the sake of illustration that GE ranges from 0 to 5. Then you would run
    Code:
    margins T, dydx(GE) // DG:GE SLOPE IN EACH GROUP
    margins, dydx(GE) // AVERAGE DG:GE SLOPE, IF THIS IS OF INTEREST
    margins T, at(GE = (0(1)5)) // PREDICTED VALUES OF DG BY T AND GE
    marginsplot
    So, yes, multicolinearity is inflating the standard error of your T and GE coefficients. But that is expected, and those coefficients are not nearly as important in the interaction model as they were in the non-interaction model.

    If you want significance tests of the effects of T and GE (which I do not recommend, but are commonly sought), the correct way to do that is not by looking at the regression output, but by the following commands:

    Code:
    test 1.T 1.T#GE // AGGREGATE TEST OF EFFECT OF T
    test GE 1.T#GE // AGGREGATE TEST OF EFFECT OF GE
    (Note: Above assumes T is coded 0/1. Modify accordingly as needed.)

    If you are not familiar with the -margins- command, I recommend you read the excellent Richard Willilams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It is very clearly written and contains several worked examples, including some interaction models.

    Comment


    • #3
      Your comment is greatly appreciated. I was actually looking at an earlier post of yours about the same subject. Which helped me in the right direction. I hope this comment will do the same. I've wrecked my brain for today. But i'm going to study what you said with care tomorrow. To clarify by the way, GE was obtained as follows: I've collected data on 126 people who all responded to 21 questions on a Likert scale ranging from 1-5. After which my tutor used something called polychoric correlation to transform this ordinal data to continuous data. I regret I cannot share the code as he asked me not to spread them.

      Comment

      Working...
      X