Help with interpreting interaction results

Guest
#1

Help with interpreting interaction results

22 Feb 2022, 17:45

Hello all,

My research aims to investigate whether individuals with high education and high training earn more than those who do not have sufficient training/education. I have broken down participants into four groups (1) low education/low training; 2) high education/high training; 3) low education/high training and 4) high education/low training). I created interactions between them to see which group earns the highest wages. I have also included other determinants of income in my model. Below, I have included examples of a couple of the groups to help and develop my case.

My questions are:
1) Is this the correct way of using interactions? Is there a better/more efficient way for the purposes of the research question?
2) Interestingly, when I include other variables in my model, the interaction between education and training becomes insignificant - what does that mean?
3) I would be grateful if someone could also help me with the interpretation of these interaction results. For example, as the case below shows, the interaction between low_edu and low_training is -0.3568421. What does that mean?
4) when I estimate interactions between gender and training, Stata only reveals the results for females. What shall I do for it to include males too (given that the variable 'gender' contains both)?

Thank you very much in advance!

[ . xtreg wages i.low_edu##i.low_training, vce (cluster id)

Random-effects GLS regression Number of obs = 338,585
Group variable: id Number of groups = 89,185

R-squared: Obs per group:
Within = 0.0077 min = 1
Between = 0.1314 avg = 3.8
Overall = 0.1101 max = 10

Wald chi2(3) = 9169.39
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. err. adjusted for 89,185 clusters in id)
--------------------------------------------------------------------------------------
| Robust
wages | Coefficient std. err. z P>|z| [95% conf. interval]
---------------------+----------------------------------------------------------------
1.low_edu | -5.8737 .1188482 -49.42 0.000 -6.106638 -5.640761
1.low_training | -1.698064 .0974041 -17.43 0.000 -1.888972 -1.507155
|
low_edu#low_training |
1 1 | -.3568421 .1119215 -3.19 0.001 -.5762041 -.1374801
|
_cons | 12.68142 .1052855 120.45 0.000 12.47507 12.88778
---------------------+----------------------------------------------------------------
sigma_u | 6.1106458
sigma_e | 6.167227
rho | .4953917 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------
]

[xtreg wages i.low_edu i.low_training i.low_edu##i.low_training i.illness_disability i.sex i.children i.general_health i.marrital_status i.region i.age i.sector

Random-effects GLS regression Number of obs = 80,987
Group variable: id Number of groups = 45,184

R-squared: Obs per group:
Within = 0.0084 min = 1
Between = 0.2277 avg = 1.8
Overall = 0.2171 max = 4

Wald chi2(50) = 13450.92
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

----------------------------------------------------------------------------------------------------
wages | Coefficient Std. err. z P>|z| [95% conf. interval]
-----------------------------------+----------------------------------------------------------------
1.low_edu | -2.90262 .138659 -20.93 0.000 -3.174386 -2.630853
1.low_training | -.596947 .1072609 -5.57 0.000 -.8071746 -.3867194
|
low_edu#low_training |
1 1 | -.1727221 .135354 -1.28 0.202 -.4380111 .0925668
|
illness_disability |
no | .0274173 .070342 0.39 0.697 -.1104504 .1652851
|
sex |
female | -1.614785 .0852835 -18.93 0.000 -1.781937 -1.447632
|
children |
1 | -.4176488 .1091449 -3.83 0.000 -.6315689 -.2037287
2 | -.4066241 .1293634 -3.14 0.002 -.6601718 -.1530764
3 | -1.318122 .2158384 -6.11 0.000 -1.741158 -.8950867
4 | -2.118664 .4798341 -4.42 0.000 -3.059122 -1.178207
5 | -2.447484 1.181438 -2.07 0.038 -4.76306 -.1319082
6 | -5.246481 2.336251 -2.25 0.025 -9.825449 -.6675131
|
general_health |
very good | -.2556415 .0720459 -3.55 0.000 -.3968488 -.1144342
good | -.5585276 .0817225 -6.83 0.000 -.7187008 -.3983544
fair | -.9761969 .1124904 -8.68 0.000 -1.196674 -.7557198
or Poor? | -1.172676 .2144851 -5.47 0.000 -1.593059 -.7522932
|
marrital_status |
married | 1.023875 .0938596 10.91 0.000 .8399137 1.207837
civil partner (legal) | .7189542 .5583093 1.29 0.198 -.3753119 1.81322
separated legally marr | .2603422 .2100295 1.24 0.215 -.151308 .6719925
divorced | .4680171 .1413487 3.31 0.001 .1909787 .7450554
widowed | .4130304 .307392 1.34 0.179 -.1894467 1.015508
sep from civil partner | -1.898825 1.72616 -1.10 0.271 -5.282037 1.484387
a former civil partner | -1.278494 3.659424 -0.35 0.727 -8.450834 5.893846
surviving civil partner | 3.183246 3.74548 0.85 0.395 -4.157761 10.52425
|
region |
North West | .2900335 .2244571 1.29 0.196 -.1498944 .7299613
Yorkshire and the Humber | -.2500624 .2325673 -1.08 0.282 -.7058859 .2057611
East Midlands | .029028 .2327893 0.12 0.901 -.4272308 .4852867
West Midlands | .5222915 .2328438 2.24 0.025 .065926 .978657
East of England | .9182685 .2279198 4.03 0.000 .471554 1.364983
London | 1.488184 .2197212 6.77 0.000 1.057539 1.91883
South East | 1.473484 .2180581 6.76 0.000 1.046098 1.90087
South West | .0037702 .230934 0.02 0.987 -.4488522 .4563927
Wales | -.2532327 .2355061 -1.08 0.282 -.7148162 .2083507
Scotland | .6664761 .2257843 2.95 0.003 .223947 1.109005
Northern Ireland | -.2878537 .2387056 -1.21 0.228 -.755708 .1800007
|
age |
18-19 years old | .6488594 .2727242 2.38 0.017 .1143299 1.183389
20-24 years old | 1.168957 .2601275 4.49 0.000 .6591165 1.678798
25-29 years old | 2.011161 .2648456 7.59 0.000 1.492073 2.530249
30-34 years old | 3.218317 .2671965 12.04 0.000 2.694622 3.742013
35-39 years old | 4.004506 .2694368 14.86 0.000 3.47642 4.532593
40-44 years old | 4.330584 .2686496 16.12 0.000 3.80404 4.857127
45-49 years old | 4.369398 .2688791 16.25 0.000 3.842405 4.896392
50-54 years old | 4.033248 .2711242 14.88 0.000 3.501854 4.564641
55-59 years old | 3.776262 .2764009 13.66 0.000 3.234526 4.317998
60-64 years old | 2.857164 .287798 9.93 0.000 2.293091 3.421238
65 years or older | .3759601 .3172377 1.19 0.236 -.2458143 .9977346
|
sector |
managerial & technical occupation | -.3781948 .1493594 -2.53 0.011 -.6709339 -.0854558
skilled non-manual | -3.037687 .1624002 -18.70 0.000 -3.355985 -2.719388
skilled manual | -7.110947 .1664965 -42.71 0.000 -7.437274 -6.78462
partly skilled occupation | -4.504815 .170801 -26.37 0.000 -4.839579 -4.170051
unskilled occupation | -5.152122 .2260784 -22.79 0.000 -5.595227 -4.709016
|
_cons | 13.98079 .3639008 38.42 0.000 13.26756 14.69402
-----------------------------------+----------------------------------------------------------------
sigma_u | 6.462475
sigma_e | 5.1604593
rho | .61063295 (fraction of variance due to u_i)
----------------------------------------------------------------------------------------------------]

[*HE & HT
. xtreg wages i.high_edu##i.high_training, vce (cluster id)

Random-effects GLS regression Number of obs = 338,585
Group variable: id Number of groups = 89,185

R-squared: Obs per group:
Within = 0.0077 min = 1
Between = 0.1314 avg = 3.8
Overall = 0.1101 max = 10

Wald chi2(3) = 9169.39
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. err. adjusted for 89,185 clusters in id)
----------------------------------------------------------------------------------------
| Robust
wages | Coefficient std. err. z P>|z| [95% conf. interval]
-----------------------+----------------------------------------------------------------
1.high_edu | 6.230542 .076358 81.60 0.000 6.080883 6.380201
1.high_training | 2.054906 .0555559 36.99 0.000 1.946018 2.163793
|
high_edu#high_training |
1 1 | -.3568421 .1119215 -3.19 0.001 -.5762041 -.1374801
|
_cons | 4.752816 .0257906 184.28 0.000 4.702267 4.803365
-----------------------+----------------------------------------------------------------
sigma_u | 6.1106458
sigma_e | 6.167227
rho | .4953917 (fraction of variance due to u_i)
----------------------------------------------------------------------------------------
]
[*LE & HT

. xtreg wages i.low_edu##i.high_training, vce (cluster id)

Random-effects GLS regression Number of obs = 338,585
Group variable: id Number of groups = 89,185

R-squared: Obs per group:
Within = 0.0077 min = 1
Between = 0.1314 avg = 3.8
Overall = 0.1101 max = 10

Wald chi2(3) = 9169.39
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. err. adjusted for 89,185 clusters in id)
---------------------------------------------------------------------------------------
| Robust
wages | Coefficient std. err. z P>|z| [95% conf. interval]
----------------------+----------------------------------------------------------------
1.low_edu | -6.230542 .076358 -81.60 0.000 -6.380201 -6.080883
1.high_training | 1.698064 .0974041 17.43 0.000 1.507155 1.888972
|
low_edu#high_training |
1 1 | .3568421 .1119215 3.19 0.001 .1374801 .5762041
|
_cons | 10.98336 .073165 150.12 0.000 10.83996 11.12676
----------------------+----------------------------------------------------------------
sigma_u | 6.1106458
sigma_e | 6.167227
rho | .4953917 (fraction of variance due to u_i)
---------------------------------------------------------------------------------------
]
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

22 Feb 2022, 18:35

Actually, assuming that there is no "intermediate" category for education, nor for training, and that your low_ed, high_ed, low_training, and high_training are coded 1 for true and 0 for false, the four regressions you show are actually just algebraic transforms of the same regression. (Well, except for the second regression which includes a bunch of additional variables.)

I'm assuming at this point that the variable high_edu is coded 1 = high education and 0 = low education, and that the variable high_training is coded 1 = high training and 0 = low training. If so, that is the easiest to interpret. Just to make things crystal clear, I suggest you value label them:

Code:

label define low_high 0 "Low" 1 "High" label values high_edu high_training low_high

And the way to do it is to re-run that same regression and follow it with:

Code:

margins high_edu#high_training margins high_edu, dydx(high_training) margins high_training, dydx(high_edu)

The first of these will produce a table showing you the expected wages in all four combinations of education and training categories. The second will give you the marginal effects of high training on wages conditional on low and high levels of education. The third will give you the marginal effects of high education on wages conditional on low and high levels of training. I would focus my attention on these -margins- outputs rather than the regression coefficients themselves.

As for your results changing when you include the other variables, that is to be expected. (In fact, if that sort of thing didn't happen, there would be no point to adding them to the regression!) I suggest that you repeat this expanded regression, substituting the variables high_edu and high_training (and their interaction) for the low education and low training variables. The results from this -regression-, and the three -margins- command shown above (which need not, and should not, mention the other variables) will give you results adjusted for the confounding effects of the additional variables.

I do have concerns about a few of those added variables. Thinking about marital status, children, general health, and region, by including them as you have, you are viewing them as being potential causes of wages. But a case can be made that it is the other way around: higher wages cause good health, facilitate getting married, and generally lead people to live in more prosperous regions. There is also some reason to believe that higher wages reduces the number of children people choose to have. It is, I think, fair to say that education also has similar causal effects on health, marital status, children, and region. If I'm thinking about this correctly, this makes health, marital status, children, and region colliders of the education -> wages relationship. And that implies that they must be excluded from the model or they will bias the education -> wages relationship. (If you are not familiar with collider bias, a clear and non-technical illustration of it can be found at https://observablehq.com/@herbps10/collider-bias.) This is a subtle problem because you can also argue that health and region, at least, may also cause higher wages. So we have bi-directional causality, which is a nightmare situation for regression models.

Given the unclarity of whether these variables are colliders or not, I would do three models. One would include only the education and training variables and their interaction (unadjusted analysis). The second would include those variables plus disability, age, sex, and sector, but not the others. The third would include all of the variables. If the second and third analysis disagree substantially, I would put greater credence in the second one. If they largely support the same conclusions, then we can just relax about it. Either the second or third model could end up disagreeing (even to the extent of opposite signs of marginal effects) considerably with the first--but that is just normal and demonstrates the importance of adjusting to reduce omitted variable bias.
2 likes
Comment

Guest

23 Feb 2022, 10:29

Thank you so much Clyde Schechter, I found this incredibly helpful and useful!

When I used the margin analysis, I found that the marginal effect of high training on wages for those with low and high education is the same. Does that mean that training has no marginal impact regardless of individual education levels?

Code:

Average marginal effects                                Number of obs = 80,987
Model VCE: Conventional

Expression: Linear prediction, predict()
dy/dx wrt:  1.training_level

-----------------------------------------------------------------------------------
                  |            Delta-method
                  |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
0.training_level  |  (base outcome)
------------------+----------------------------------------------------------------
1.training_level  |
        edu_level |
             Low  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
            High  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
-----------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

23 Feb 2022, 10:38

This is very suspicioius. It is extremely rare that the marginal effects will be exactly the same in both groups. If true, what is shown says that education level has no effect on the marginal effect of training. But exactly no effect is highly implausible. It is more likely that something has gone awry. Please show the output from the regression that preceded the -margins- outputs you are showing, and the Stata command that led to it.
Comment

Guest

23 Feb 2022, 10:42

Thank you for your response Clyde Schechter, here is everything I have done so far:

Code:

label define low_high 0 "Low" 1 "High"

label values high_edu high_training low_high

 rename high_edu edu_level
 rename high_training training_level
 
 * A NEW MODEL:
 
* xtreg wages i.edu_level i.training_level i.illness_disability i.sex i.children i.general_health i.marrital_status i.region i.age i.sector 

Random-effects GLS regression                   Number of obs     =     80,987
Group variable: id                              Number of groups  =     45,184

R-squared:                                      Obs per group:
     Within  = 0.0084                                         min =          1
     Between = 0.2277                                         avg =        1.8
     Overall = 0.2170                                         max =          4

                                                Wald chi2(49)     =   13448.66
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

----------------------------------------------------------------------------------------------------
                             wages | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------------------------+----------------------------------------------------------------
                         edu_level |
                             High  |   3.039784    .087604    34.70   0.000     2.868083    3.211484
                                   |
                    training_level |
                             High  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
                                   |
                illness_disability |
                               no  |   .0269743   .0703411     0.38   0.701    -.1108917    .1648403
                                   |
                               sex |
                           female  |   -1.61674   .0852713   -18.96   0.000    -1.783869   -1.449611
                                   |
                          children |
                                1  |  -.4153716   .1091311    -3.81   0.000    -.6292646   -.2014786
                                2  |  -.4037609   .1293456    -3.12   0.002    -.6572737   -.1502482
                                3  |   -1.31493   .2158262    -6.09   0.000    -1.737941   -.8919183
                                4  |   -2.11243   .4798124    -4.40   0.000    -3.052845   -1.172015
                                5  |  -2.446739   1.181447    -2.07   0.038    -4.762333   -.1311443
                                6  |  -5.256411   2.336242    -2.25   0.024    -9.835361   -.6774604
                                   |
                    general_health |
                        very good  |  -.2554821   .0720456    -3.55   0.000    -.3966889   -.1142753
                             good  |  -.5584111   .0817225    -6.83   0.000    -.7185842   -.3982379
                             fair  |  -.9755705   .1124893    -8.67   0.000    -1.196045   -.7550955
                         or Poor?  |   -1.17296   .2144847    -5.47   0.000    -1.593342   -.7525773
                                   |
                   marrital_status |
                          married  |   1.024838   .0938579    10.92   0.000     .8408795    1.208796
            civil partner (legal)  |   .7185408   .5583165     1.29   0.198    -.3757393    1.812821
           separated legally marr  |   .2621095   .2100265     1.25   0.212    -.1495348    .6737538
                         divorced  |   .4690657   .1413484     3.32   0.001     .1920279    .7461036
                          widowed  |   .4129976   .3073969     1.34   0.179    -.1894891    1.015484
           sep from civil partner  |  -1.889693   1.726138    -1.09   0.274    -5.272862    1.493476
           a former civil partner  |  -1.278073   3.659407    -0.35   0.727    -8.450379    5.894233
          surviving civil partner  |   3.234842   3.745303     0.86   0.388    -4.105816     10.5755
                                   |
                            region |
                       North West  |   .2891474   .2244608     1.29   0.198    -.1507877    .7290824
         Yorkshire and the Humber  |  -.2518902   .2325678    -1.08   0.279    -.7077147    .2039343
                    East Midlands  |   .0273693   .2327906     0.12   0.906     -.428892    .4836305
                    West Midlands  |   .5202431   .2328432     2.23   0.025     .0638788    .9766074
                  East of England  |   .9177246   .2279242     4.03   0.000     .4710014    1.364448
                           London  |    1.48838   .2197257     6.77   0.000     1.057726    1.919035
                       South East  |   1.471756   .2180585     6.75   0.000      1.04437    1.899143
                       South West  |   .0022558   .2309359     0.01   0.992    -.4503703    .4548818
                            Wales  |  -.2541682     .23551    -1.08   0.280    -.7157592    .2074229
                         Scotland  |   .6654369   .2257876     2.95   0.003     .2229013    1.107972
                 Northern Ireland  |  -.2898318   .2387057    -1.21   0.225    -.7576864    .1780228
                                   |
                               age |
                  18-19 years old  |    .653821   .2726963     2.40   0.017     .1193461    1.188296
                  20-24 years old  |   1.175185   .2600831     4.52   0.000      .665432    1.684939
                  25-29 years old  |   2.014949   .2648308     7.61   0.000      1.49589    2.534008
                  30-34 years old  |   3.222129   .2671816    12.06   0.000     2.698463    3.745796
                  35-39 years old  |   4.007818   .2694262    14.88   0.000     3.479752    4.535883
                  40-44 years old  |   4.334004   .2686382    16.13   0.000     3.807483    4.860525
                  45-49 years old  |   4.372021   .2688732    16.26   0.000     3.845039    4.899003
                  50-54 years old  |   4.035654   .2711197    14.89   0.000     3.504269    4.567039
                  55-59 years old  |   3.779382   .2763922    13.67   0.000     3.237663    4.321101
                  60-64 years old  |   2.859103   .2877963     9.93   0.000     2.295033    3.423174
                65 years or older  |   .3752281     .31724     1.18   0.237    -.2465508     .997007
                                   |
                            sector |
managerial & technical occupation  |  -.3782077   .1493615    -2.53   0.011    -.6709508   -.0854646
               skilled non-manual  |  -3.038697   .1624002   -18.71   0.000    -3.356995   -2.720398
                   skilled manual  |  -7.112368   .1664947   -42.72   0.000    -7.438691   -6.786044
        partly skilled occupation  |   -4.50574   .1708015   -26.38   0.000    -4.840505   -4.170975
             unskilled occupation  |  -5.156299   .2260567   -22.81   0.000    -5.599362   -4.713236
                                   |
                             _cons |   10.31751   .3561231    28.97   0.000     9.619519     11.0155
-----------------------------------+----------------------------------------------------------------
                           sigma_u |  6.4628101
                           sigma_e |  5.1604397
                               rho |  .61065941   (fraction of variance due to u_i)
----------------------------------------------------------------------------------------------------

margins edu_level#training_level 

Predictive margins                                      Number of obs = 80,987
Model VCE: Conventional

Expression: Linear prediction, predict()

------------------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
edu_level#training_level |
                Low#Low  |   10.51889   .0465658   225.89   0.000     10.42762    10.61016
               Low#High  |   11.22386     .07135   157.31   0.000     11.08402     11.3637
               High#Low  |   13.55868    .072141   187.95   0.000     13.41728    13.70007
              High#High  |   14.26364    .088169   161.78   0.000     14.09083    14.43645
------------------------------------------------------------------------------------------


margins edu_level, dydx(training_level)

Average marginal effects                                Number of obs = 80,987
Model VCE: Conventional

Expression: Linear prediction, predict()
dy/dx wrt:  1.training_level

-----------------------------------------------------------------------------------
                  |            Delta-method
                  |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
0.training_level  |  (base outcome)
------------------+----------------------------------------------------------------
1.training_level  |
        edu_level |
             Low  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
            High  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
-----------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.



margins training_level, dydx(edu_level)

Average marginal effects                                Number of obs = 80,987
Model VCE: Conventional

Expression: Linear prediction, predict()
dy/dx wrt:  1.edu_level

--------------------------------------------------------------------------------
               |            Delta-method
               |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
0.edu_level    |  (base outcome)
---------------+----------------------------------------------------------------
1.edu_level    |
training_level |
          Low  |   3.039784    .087604    34.70   0.000     2.868083    3.211484
         High  |   3.039784    .087604    34.70   0.000     2.868083    3.211484
--------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Comment

Guest
#6

23 Feb 2022, 10:45

in fact, Clyde Schechter, another result above suggests that the marginal impact of high education on those with high and low training is exactly the same as well
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#7

23 Feb 2022, 10:57

Yes, because the regression didn't include an interaction between education and training! If you want to explore whether the marginal impact of either of these is affected by the other, you have to have an interaction term.

Code:

xtreg wages i.edu_level##i.training_level i.illness_disability i.sex i.children i.general_health i.marrital_status i.region i.age i.sector margins training_level, dydx(edu_level) margins edu_level, dydx(training_level)
1 like
Comment
Guest
#8

23 Feb 2022, 11:19

right, I apologise for such a basic mistake and for consequently wasting your time! I am beyond grateful for all your help and prompt responses! Clyde Schechter
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#9

23 Feb 2022, 11:22

No worries.
Comment
Guest
#10

23 Feb 2022, 12:09

sorry, one last question (promise)! In post #2 you suggested creating 3 models to reveal the colliders. In Model 2 I only included trainining, education, their interaction, as well as disability, age, sex, and sector, but not the others. Model 3 had all variables. In model 2, the interaction between education and training had a negative sign and was significant. In model 3, the interaction was insignificant (with a negative sign). The other variables remained (more or less) the same, apart from the age variable, which (in model 3) had a significant positive effect on wages (and in model 2 all of it was insignificant). My questions are:

1) the fact that interaction between education and training in model 3 became insignificant - does that reveal collider effect?
2) the fact that other variables became significant in model 3 (such as age), what does that mean, if anything?

Thank you very much in advance! Clyde Schechter
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#11

23 Feb 2022, 13:09

First, full disclosure, I am in the school of statistics that makes little use of statistical significance and believes that the concept should be discarded. Nevertheless, even if we take statistical significance seriously, it is critical to understand that the difference between statistically significant and statistically non-significant is, itself, not statistically significant. That is, even if you believe wholeheartedly in the validity of the concept of statistical significance, you should never draw any conclusion at all from the fact that one thing is statistically signifcant and another is not. It means nothing.

I would look at the actual values of the interaction coefficients in model 2 and model 3. Are they roughly similar, for practical purposes? Do the confidence intervals around them overlap extensively, only a little, or not at all? Those are the criteria that I would use to decide whether the estimated interaction effect is materially different in the two models.

If I concluded that the interaction effects in the two models were materially different, I would not be able to conclude that collider bias is operating here, though I would suspect that is the case. But other explanations are possible in this ambiguous situation. Collider effects are identified in advance of analysis by looking at a diagram of the causal relationships among all of the variables we are modeling (or considering modeling). Ideally, such a diagram contains no cycles, and in that case it is clear that we include confounders and exclude colliders--no variables are both confounders and colliders. But in the present situation, it is not clear in which direction causality runs, and perhaps it runs in both directions. In such a situation, it is not at all clear what is going on. So when including the set of suspected colliders results in a change, we cannot be sure if it is a true collision effect (which is bias and undesirable) or if it is an effect of adjusting for confounders (which is reduction in bias and is desirable). This is just a very difficult situation, and clear understanding is, I think, not possible. (Note: some or all of the "cycles" of causation in this data could be resolved with longitudinal data. Good health in youth would be causally contributory to higher wages in young adulthood, which would in turn be contributory to good general health in mid-life, etc. But you don't have longitudinal data, so...)

Regarding age, the same initial reasoning applies. Examine the coefficients and confidence intervals--ignore the p-values. But in this case, if there is a material difference in the age coefficients, we can be much more confident that the difference is a collider effect. The reason is that unlike education, training, or wages, it is not plausible to think of two-way causation between age and number of children, marital status, general health, or region. The causality in those relationships can only be age causing them: they are not causes of aging. So here we would be quite sure there is collision, not confounding.
Comment
Guest
#12

23 Feb 2022, 15:37

interesting! in fact, coefficients of training and education interactions are fairly similar (model 2 is -.2175764 and -.1727221 in model 3) and confidence intervals overlap a little bit (in model 2 it is -.4077302 -.0274226 and in model 3 it is -.4380111 .0925668)! Clyde Schechter does that mean I can conclude the difference is not materially significant?

P.S. thank you for the age variable explanation, it makes a lot of sense!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#13

23 Feb 2022, 18:23

Yes, I would interpret those interaaction coefficients in models 2 and 3 as not being materially different. And I would say that the confidence intervals actually overlap substantially, not just a little bit.
Comment
Guest
#14

23 Feb 2022, 18:38

perfect, thank you very much for everything! Clyde Schechter
Comment

Announcement

Help with interpreting interaction results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment