Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with interpreting interaction results

    Hello all,

    My research aims to investigate whether individuals with high education and high training earn more than those who do not have sufficient training/education. I have broken down participants into four groups (1) low education/low training; 2) high education/high training; 3) low education/high training and 4) high education/low training). I created interactions between them to see which group earns the highest wages. I have also included other determinants of income in my model. Below, I have included examples of a couple of the groups to help and develop my case.

    My questions are:
    1) Is this the correct way of using interactions? Is there a better/more efficient way for the purposes of the research question?
    2) Interestingly, when I include other variables in my model, the interaction between education and training becomes insignificant - what does that mean?
    3) I would be grateful if someone could also help me with the interpretation of these interaction results. For example, as the case below shows, the interaction between low_edu and low_training is -0.3568421. What does that mean?
    4) when I estimate interactions between gender and training, Stata only reveals the results for females. What shall I do for it to include males too (given that the variable 'gender' contains both)?

    Thank you very much in advance!

    [ . xtreg wages i.low_edu##i.low_training, vce (cluster id)

    Random-effects GLS regression Number of obs = 338,585
    Group variable: id Number of groups = 89,185

    R-squared: Obs per group:
    Within = 0.0077 min = 1
    Between = 0.1314 avg = 3.8
    Overall = 0.1101 max = 10

    Wald chi2(3) = 9169.39
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    (Std. err. adjusted for 89,185 clusters in id)
    --------------------------------------------------------------------------------------
    | Robust
    wages | Coefficient std. err. z P>|z| [95% conf. interval]
    ---------------------+----------------------------------------------------------------
    1.low_edu | -5.8737 .1188482 -49.42 0.000 -6.106638 -5.640761
    1.low_training | -1.698064 .0974041 -17.43 0.000 -1.888972 -1.507155
    |
    low_edu#low_training |
    1 1 | -.3568421 .1119215 -3.19 0.001 -.5762041 -.1374801
    |
    _cons | 12.68142 .1052855 120.45 0.000 12.47507 12.88778
    ---------------------+----------------------------------------------------------------
    sigma_u | 6.1106458
    sigma_e | 6.167227
    rho | .4953917 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------------
    ]

    [xtreg wages i.low_edu i.low_training i.low_edu##i.low_training i.illness_disability i.sex i.children i.general_health i.marrital_status i.region i.age i.sector

    Random-effects GLS regression Number of obs = 80,987
    Group variable: id Number of groups = 45,184

    R-squared: Obs per group:
    Within = 0.0084 min = 1
    Between = 0.2277 avg = 1.8
    Overall = 0.2171 max = 4

    Wald chi2(50) = 13450.92
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    ----------------------------------------------------------------------------------------------------
    wages | Coefficient Std. err. z P>|z| [95% conf. interval]
    -----------------------------------+----------------------------------------------------------------
    1.low_edu | -2.90262 .138659 -20.93 0.000 -3.174386 -2.630853
    1.low_training | -.596947 .1072609 -5.57 0.000 -.8071746 -.3867194
    |
    low_edu#low_training |
    1 1 | -.1727221 .135354 -1.28 0.202 -.4380111 .0925668
    |
    illness_disability |
    no | .0274173 .070342 0.39 0.697 -.1104504 .1652851
    |
    sex |
    female | -1.614785 .0852835 -18.93 0.000 -1.781937 -1.447632
    |
    children |
    1 | -.4176488 .1091449 -3.83 0.000 -.6315689 -.2037287
    2 | -.4066241 .1293634 -3.14 0.002 -.6601718 -.1530764
    3 | -1.318122 .2158384 -6.11 0.000 -1.741158 -.8950867
    4 | -2.118664 .4798341 -4.42 0.000 -3.059122 -1.178207
    5 | -2.447484 1.181438 -2.07 0.038 -4.76306 -.1319082
    6 | -5.246481 2.336251 -2.25 0.025 -9.825449 -.6675131
    |
    general_health |
    very good | -.2556415 .0720459 -3.55 0.000 -.3968488 -.1144342
    good | -.5585276 .0817225 -6.83 0.000 -.7187008 -.3983544
    fair | -.9761969 .1124904 -8.68 0.000 -1.196674 -.7557198
    or Poor? | -1.172676 .2144851 -5.47 0.000 -1.593059 -.7522932
    |
    marrital_status |
    married | 1.023875 .0938596 10.91 0.000 .8399137 1.207837
    civil partner (legal) | .7189542 .5583093 1.29 0.198 -.3753119 1.81322
    separated legally marr | .2603422 .2100295 1.24 0.215 -.151308 .6719925
    divorced | .4680171 .1413487 3.31 0.001 .1909787 .7450554
    widowed | .4130304 .307392 1.34 0.179 -.1894467 1.015508
    sep from civil partner | -1.898825 1.72616 -1.10 0.271 -5.282037 1.484387
    a former civil partner | -1.278494 3.659424 -0.35 0.727 -8.450834 5.893846
    surviving civil partner | 3.183246 3.74548 0.85 0.395 -4.157761 10.52425
    |
    region |
    North West | .2900335 .2244571 1.29 0.196 -.1498944 .7299613
    Yorkshire and the Humber | -.2500624 .2325673 -1.08 0.282 -.7058859 .2057611
    East Midlands | .029028 .2327893 0.12 0.901 -.4272308 .4852867
    West Midlands | .5222915 .2328438 2.24 0.025 .065926 .978657
    East of England | .9182685 .2279198 4.03 0.000 .471554 1.364983
    London | 1.488184 .2197212 6.77 0.000 1.057539 1.91883
    South East | 1.473484 .2180581 6.76 0.000 1.046098 1.90087
    South West | .0037702 .230934 0.02 0.987 -.4488522 .4563927
    Wales | -.2532327 .2355061 -1.08 0.282 -.7148162 .2083507
    Scotland | .6664761 .2257843 2.95 0.003 .223947 1.109005
    Northern Ireland | -.2878537 .2387056 -1.21 0.228 -.755708 .1800007
    |
    age |
    18-19 years old | .6488594 .2727242 2.38 0.017 .1143299 1.183389
    20-24 years old | 1.168957 .2601275 4.49 0.000 .6591165 1.678798
    25-29 years old | 2.011161 .2648456 7.59 0.000 1.492073 2.530249
    30-34 years old | 3.218317 .2671965 12.04 0.000 2.694622 3.742013
    35-39 years old | 4.004506 .2694368 14.86 0.000 3.47642 4.532593
    40-44 years old | 4.330584 .2686496 16.12 0.000 3.80404 4.857127
    45-49 years old | 4.369398 .2688791 16.25 0.000 3.842405 4.896392
    50-54 years old | 4.033248 .2711242 14.88 0.000 3.501854 4.564641
    55-59 years old | 3.776262 .2764009 13.66 0.000 3.234526 4.317998
    60-64 years old | 2.857164 .287798 9.93 0.000 2.293091 3.421238
    65 years or older | .3759601 .3172377 1.19 0.236 -.2458143 .9977346
    |
    sector |
    managerial & technical occupation | -.3781948 .1493594 -2.53 0.011 -.6709339 -.0854558
    skilled non-manual | -3.037687 .1624002 -18.70 0.000 -3.355985 -2.719388
    skilled manual | -7.110947 .1664965 -42.71 0.000 -7.437274 -6.78462
    partly skilled occupation | -4.504815 .170801 -26.37 0.000 -4.839579 -4.170051
    unskilled occupation | -5.152122 .2260784 -22.79 0.000 -5.595227 -4.709016
    |
    _cons | 13.98079 .3639008 38.42 0.000 13.26756 14.69402
    -----------------------------------+----------------------------------------------------------------
    sigma_u | 6.462475
    sigma_e | 5.1604593
    rho | .61063295 (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------------------------]

    [*HE & HT
    . xtreg wages i.high_edu##i.high_training, vce (cluster id)

    Random-effects GLS regression Number of obs = 338,585
    Group variable: id Number of groups = 89,185

    R-squared: Obs per group:
    Within = 0.0077 min = 1
    Between = 0.1314 avg = 3.8
    Overall = 0.1101 max = 10

    Wald chi2(3) = 9169.39
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    (Std. err. adjusted for 89,185 clusters in id)
    ----------------------------------------------------------------------------------------
    | Robust
    wages | Coefficient std. err. z P>|z| [95% conf. interval]
    -----------------------+----------------------------------------------------------------
    1.high_edu | 6.230542 .076358 81.60 0.000 6.080883 6.380201
    1.high_training | 2.054906 .0555559 36.99 0.000 1.946018 2.163793
    |
    high_edu#high_training |
    1 1 | -.3568421 .1119215 -3.19 0.001 -.5762041 -.1374801
    |
    _cons | 4.752816 .0257906 184.28 0.000 4.702267 4.803365
    -----------------------+----------------------------------------------------------------
    sigma_u | 6.1106458
    sigma_e | 6.167227
    rho | .4953917 (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------------
    ]
    [*LE & HT

    . xtreg wages i.low_edu##i.high_training, vce (cluster id)

    Random-effects GLS regression Number of obs = 338,585
    Group variable: id Number of groups = 89,185

    R-squared: Obs per group:
    Within = 0.0077 min = 1
    Between = 0.1314 avg = 3.8
    Overall = 0.1101 max = 10

    Wald chi2(3) = 9169.39
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    (Std. err. adjusted for 89,185 clusters in id)
    ---------------------------------------------------------------------------------------
    | Robust
    wages | Coefficient std. err. z P>|z| [95% conf. interval]
    ----------------------+----------------------------------------------------------------
    1.low_edu | -6.230542 .076358 -81.60 0.000 -6.380201 -6.080883
    1.high_training | 1.698064 .0974041 17.43 0.000 1.507155 1.888972
    |
    low_edu#high_training |
    1 1 | .3568421 .1119215 3.19 0.001 .1374801 .5762041
    |
    _cons | 10.98336 .073165 150.12 0.000 10.83996 11.12676
    ----------------------+----------------------------------------------------------------
    sigma_u | 6.1106458
    sigma_e | 6.167227
    rho | .4953917 (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------
    ]

  • #2


    Actually, assuming that there is no "intermediate" category for education, nor for training, and that your low_ed, high_ed, low_training, and high_training are coded 1 for true and 0 for false, the four regressions you show are actually just algebraic transforms of the same regression. (Well, except for the second regression which includes a bunch of additional variables.)

    I'm assuming at this point that the variable high_edu is coded 1 = high education and 0 = low education, and that the variable high_training is coded 1 = high training and 0 = low training. If so, that is the easiest to interpret. Just to make things crystal clear, I suggest you value label them:

    Code:
    label define low_high 0 "Low" 1 "High"
    label values high_edu high_training low_high
    And the way to do it is to re-run that same regression and follow it with:

    Code:
    margins high_edu#high_training
    margins high_edu, dydx(high_training)
    margins high_training, dydx(high_edu)
    The first of these will produce a table showing you the expected wages in all four combinations of education and training categories. The second will give you the marginal effects of high training on wages conditional on low and high levels of education. The third will give you the marginal effects of high education on wages conditional on low and high levels of training. I would focus my attention on these -margins- outputs rather than the regression coefficients themselves.

    As for your results changing when you include the other variables, that is to be expected. (In fact, if that sort of thing didn't happen, there would be no point to adding them to the regression!) I suggest that you repeat this expanded regression, substituting the variables high_edu and high_training (and their interaction) for the low education and low training variables. The results from this -regression-, and the three -margins- command shown above (which need not, and should not, mention the other variables) will give you results adjusted for the confounding effects of the additional variables.

    I do have concerns about a few of those added variables. Thinking about marital status, children, general health, and region, by including them as you have, you are viewing them as being potential causes of wages. But a case can be made that it is the other way around: higher wages cause good health, facilitate getting married, and generally lead people to live in more prosperous regions. There is also some reason to believe that higher wages reduces the number of children people choose to have. It is, I think, fair to say that education also has similar causal effects on health, marital status, children, and region. If I'm thinking about this correctly, this makes health, marital status, children, and region colliders of the education -> wages relationship. And that implies that they must be excluded from the model or they will bias the education -> wages relationship. (If you are not familiar with collider bias, a clear and non-technical illustration of it can be found at https://observablehq.com/@herbps10/collider-bias.) This is a subtle problem because you can also argue that health and region, at least, may also cause higher wages. So we have bi-directional causality, which is a nightmare situation for regression models.

    Given the unclarity of whether these variables are colliders or not, I would do three models. One would include only the education and training variables and their interaction (unadjusted analysis). The second would include those variables plus disability, age, sex, and sector, but not the others. The third would include all of the variables. If the second and third analysis disagree substantially, I would put greater credence in the second one. If they largely support the same conclusions, then we can just relax about it. Either the second or third model could end up disagreeing (even to the extent of opposite signs of marginal effects) considerably with the first--but that is just normal and demonstrates the importance of adjusting to reduce omitted variable bias.

    Comment


    • #3
      Thank you so much Clyde Schechter, I found this incredibly helpful and useful!

      When I used the margin analysis, I found that the marginal effect of high training on wages for those with low and high education is the same. Does that mean that training has no marginal impact regardless of individual education levels?

      Code:
      Average marginal effects                                Number of obs = 80,987
      Model VCE: Conventional
      
      Expression: Linear prediction, predict()
      dy/dx wrt:  1.training_level
      
      -----------------------------------------------------------------------------------
                        |            Delta-method
                        |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
      ------------------+----------------------------------------------------------------
      0.training_level  |  (base outcome)
      ------------------+----------------------------------------------------------------
      1.training_level  |
              edu_level |
                   Low  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
                  High  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
      -----------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.

      Comment


      • #4
        This is very suspicioius. It is extremely rare that the marginal effects will be exactly the same in both groups. If true, what is shown says that education level has no effect on the marginal effect of training. But exactly no effect is highly implausible. It is more likely that something has gone awry. Please show the output from the regression that preceded the -margins- outputs you are showing, and the Stata command that led to it.

        Comment


        • #5
          Thank you for your response Clyde Schechter, here is everything I have done so far:

          Code:
          label define low_high 0 "Low" 1 "High"
          
          label values high_edu high_training low_high
          
           rename high_edu edu_level
           rename high_training training_level
           
           * A NEW MODEL:
           
          * xtreg wages i.edu_level i.training_level i.illness_disability i.sex i.children i.general_health i.marrital_status i.region i.age i.sector 
          
          Random-effects GLS regression                   Number of obs     =     80,987
          Group variable: id                              Number of groups  =     45,184
          
          R-squared:                                      Obs per group:
               Within  = 0.0084                                         min =          1
               Between = 0.2277                                         avg =        1.8
               Overall = 0.2170                                         max =          4
          
                                                          Wald chi2(49)     =   13448.66
          corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
          
          ----------------------------------------------------------------------------------------------------
                                       wages | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
          -----------------------------------+----------------------------------------------------------------
                                   edu_level |
                                       High  |   3.039784    .087604    34.70   0.000     2.868083    3.211484
                                             |
                              training_level |
                                       High  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
                                             |
                          illness_disability |
                                         no  |   .0269743   .0703411     0.38   0.701    -.1108917    .1648403
                                             |
                                         sex |
                                     female  |   -1.61674   .0852713   -18.96   0.000    -1.783869   -1.449611
                                             |
                                    children |
                                          1  |  -.4153716   .1091311    -3.81   0.000    -.6292646   -.2014786
                                          2  |  -.4037609   .1293456    -3.12   0.002    -.6572737   -.1502482
                                          3  |   -1.31493   .2158262    -6.09   0.000    -1.737941   -.8919183
                                          4  |   -2.11243   .4798124    -4.40   0.000    -3.052845   -1.172015
                                          5  |  -2.446739   1.181447    -2.07   0.038    -4.762333   -.1311443
                                          6  |  -5.256411   2.336242    -2.25   0.024    -9.835361   -.6774604
                                             |
                              general_health |
                                  very good  |  -.2554821   .0720456    -3.55   0.000    -.3966889   -.1142753
                                       good  |  -.5584111   .0817225    -6.83   0.000    -.7185842   -.3982379
                                       fair  |  -.9755705   .1124893    -8.67   0.000    -1.196045   -.7550955
                                   or Poor?  |   -1.17296   .2144847    -5.47   0.000    -1.593342   -.7525773
                                             |
                             marrital_status |
                                    married  |   1.024838   .0938579    10.92   0.000     .8408795    1.208796
                      civil partner (legal)  |   .7185408   .5583165     1.29   0.198    -.3757393    1.812821
                     separated legally marr  |   .2621095   .2100265     1.25   0.212    -.1495348    .6737538
                                   divorced  |   .4690657   .1413484     3.32   0.001     .1920279    .7461036
                                    widowed  |   .4129976   .3073969     1.34   0.179    -.1894891    1.015484
                     sep from civil partner  |  -1.889693   1.726138    -1.09   0.274    -5.272862    1.493476
                     a former civil partner  |  -1.278073   3.659407    -0.35   0.727    -8.450379    5.894233
                    surviving civil partner  |   3.234842   3.745303     0.86   0.388    -4.105816     10.5755
                                             |
                                      region |
                                 North West  |   .2891474   .2244608     1.29   0.198    -.1507877    .7290824
                   Yorkshire and the Humber  |  -.2518902   .2325678    -1.08   0.279    -.7077147    .2039343
                              East Midlands  |   .0273693   .2327906     0.12   0.906     -.428892    .4836305
                              West Midlands  |   .5202431   .2328432     2.23   0.025     .0638788    .9766074
                            East of England  |   .9177246   .2279242     4.03   0.000     .4710014    1.364448
                                     London  |    1.48838   .2197257     6.77   0.000     1.057726    1.919035
                                 South East  |   1.471756   .2180585     6.75   0.000      1.04437    1.899143
                                 South West  |   .0022558   .2309359     0.01   0.992    -.4503703    .4548818
                                      Wales  |  -.2541682     .23551    -1.08   0.280    -.7157592    .2074229
                                   Scotland  |   .6654369   .2257876     2.95   0.003     .2229013    1.107972
                           Northern Ireland  |  -.2898318   .2387057    -1.21   0.225    -.7576864    .1780228
                                             |
                                         age |
                            18-19 years old  |    .653821   .2726963     2.40   0.017     .1193461    1.188296
                            20-24 years old  |   1.175185   .2600831     4.52   0.000      .665432    1.684939
                            25-29 years old  |   2.014949   .2648308     7.61   0.000      1.49589    2.534008
                            30-34 years old  |   3.222129   .2671816    12.06   0.000     2.698463    3.745796
                            35-39 years old  |   4.007818   .2694262    14.88   0.000     3.479752    4.535883
                            40-44 years old  |   4.334004   .2686382    16.13   0.000     3.807483    4.860525
                            45-49 years old  |   4.372021   .2688732    16.26   0.000     3.845039    4.899003
                            50-54 years old  |   4.035654   .2711197    14.89   0.000     3.504269    4.567039
                            55-59 years old  |   3.779382   .2763922    13.67   0.000     3.237663    4.321101
                            60-64 years old  |   2.859103   .2877963     9.93   0.000     2.295033    3.423174
                          65 years or older  |   .3752281     .31724     1.18   0.237    -.2465508     .997007
                                             |
                                      sector |
          managerial & technical occupation  |  -.3782077   .1493615    -2.53   0.011    -.6709508   -.0854646
                         skilled non-manual  |  -3.038697   .1624002   -18.71   0.000    -3.356995   -2.720398
                             skilled manual  |  -7.112368   .1664947   -42.72   0.000    -7.438691   -6.786044
                  partly skilled occupation  |   -4.50574   .1708015   -26.38   0.000    -4.840505   -4.170975
                       unskilled occupation  |  -5.156299   .2260567   -22.81   0.000    -5.599362   -4.713236
                                             |
                                       _cons |   10.31751   .3561231    28.97   0.000     9.619519     11.0155
          -----------------------------------+----------------------------------------------------------------
                                     sigma_u |  6.4628101
                                     sigma_e |  5.1604397
                                         rho |  .61065941   (fraction of variance due to u_i)
          ----------------------------------------------------------------------------------------------------
          
          margins edu_level#training_level 
          
          Predictive margins                                      Number of obs = 80,987
          Model VCE: Conventional
          
          Expression: Linear prediction, predict()
          
          ------------------------------------------------------------------------------------------
                                   |            Delta-method
                                   |     Margin   std. err.      z    P>|z|     [95% conf. interval]
          -------------------------+----------------------------------------------------------------
          edu_level#training_level |
                          Low#Low  |   10.51889   .0465658   225.89   0.000     10.42762    10.61016
                         Low#High  |   11.22386     .07135   157.31   0.000     11.08402     11.3637
                         High#Low  |   13.55868    .072141   187.95   0.000     13.41728    13.70007
                        High#High  |   14.26364    .088169   161.78   0.000     14.09083    14.43645
          ------------------------------------------------------------------------------------------
          
          
          margins edu_level, dydx(training_level)
          
          Average marginal effects                                Number of obs = 80,987
          Model VCE: Conventional
          
          Expression: Linear prediction, predict()
          dy/dx wrt:  1.training_level
          
          -----------------------------------------------------------------------------------
                            |            Delta-method
                            |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
          ------------------+----------------------------------------------------------------
          0.training_level  |  (base outcome)
          ------------------+----------------------------------------------------------------
          1.training_level  |
                  edu_level |
                       Low  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
                      High  |   .7049676   .0658395    10.71   0.000     .5759246    .8340106
          -----------------------------------------------------------------------------------
          Note: dy/dx for factor levels is the discrete change from the base level.
          
          
          
          margins training_level, dydx(edu_level)
          
          Average marginal effects                                Number of obs = 80,987
          Model VCE: Conventional
          
          Expression: Linear prediction, predict()
          dy/dx wrt:  1.edu_level
          
          --------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
          ---------------+----------------------------------------------------------------
          0.edu_level    |  (base outcome)
          ---------------+----------------------------------------------------------------
          1.edu_level    |
          training_level |
                    Low  |   3.039784    .087604    34.70   0.000     2.868083    3.211484
                   High  |   3.039784    .087604    34.70   0.000     2.868083    3.211484
          --------------------------------------------------------------------------------
          Note: dy/dx for factor levels is the discrete change from the base level.

          Comment


          • #6
            in fact, Clyde Schechter, another result above suggests that the marginal impact of high education on those with high and low training is exactly the same as well

            Comment


            • #7
              Yes, because the regression didn't include an interaction between education and training! If you want to explore whether the marginal impact of either of these is affected by the other, you have to have an interaction term.
              Code:
              xtreg wages i.edu_level##i.training_level i.illness_disability i.sex i.children i.general_health i.marrital_status i.region i.age i.sector
              margins training_level, dydx(edu_level)
              margins edu_level, dydx(training_level)

              Comment


              • #8
                right, I apologise for such a basic mistake and for consequently wasting your time! I am beyond grateful for all your help and prompt responses! Clyde Schechter

                Comment


                • #9
                  No worries.

                  Comment


                  • #10
                    sorry, one last question (promise)! In post #2 you suggested creating 3 models to reveal the colliders. In Model 2 I only included trainining, education, their interaction, as well as disability, age, sex, and sector, but not the others. Model 3 had all variables. In model 2, the interaction between education and training had a negative sign and was significant. In model 3, the interaction was insignificant (with a negative sign). The other variables remained (more or less) the same, apart from the age variable, which (in model 3) had a significant positive effect on wages (and in model 2 all of it was insignificant). My questions are:

                    1) the fact that interaction between education and training in model 3 became insignificant - does that reveal collider effect?
                    2) the fact that other variables became significant in model 3 (such as age), what does that mean, if anything?

                    Thank you very much in advance! Clyde Schechter

                    Comment


                    • #11
                      First, full disclosure, I am in the school of statistics that makes little use of statistical significance and believes that the concept should be discarded. Nevertheless, even if we take statistical significance seriously, it is critical to understand that the difference between statistically significant and statistically non-significant is, itself, not statistically significant. That is, even if you believe wholeheartedly in the validity of the concept of statistical significance, you should never draw any conclusion at all from the fact that one thing is statistically signifcant and another is not. It means nothing.

                      I would look at the actual values of the interaction coefficients in model 2 and model 3. Are they roughly similar, for practical purposes? Do the confidence intervals around them overlap extensively, only a little, or not at all? Those are the criteria that I would use to decide whether the estimated interaction effect is materially different in the two models.

                      If I concluded that the interaction effects in the two models were materially different, I would not be able to conclude that collider bias is operating here, though I would suspect that is the case. But other explanations are possible in this ambiguous situation. Collider effects are identified in advance of analysis by looking at a diagram of the causal relationships among all of the variables we are modeling (or considering modeling). Ideally, such a diagram contains no cycles, and in that case it is clear that we include confounders and exclude colliders--no variables are both confounders and colliders. But in the present situation, it is not clear in which direction causality runs, and perhaps it runs in both directions. In such a situation, it is not at all clear what is going on. So when including the set of suspected colliders results in a change, we cannot be sure if it is a true collision effect (which is bias and undesirable) or if it is an effect of adjusting for confounders (which is reduction in bias and is desirable). This is just a very difficult situation, and clear understanding is, I think, not possible. (Note: some or all of the "cycles" of causation in this data could be resolved with longitudinal data. Good health in youth would be causally contributory to higher wages in young adulthood, which would in turn be contributory to good general health in mid-life, etc. But you don't have longitudinal data, so...)

                      Regarding age, the same initial reasoning applies. Examine the coefficients and confidence intervals--ignore the p-values. But in this case, if there is a material difference in the age coefficients, we can be much more confident that the difference is a collider effect. The reason is that unlike education, training, or wages, it is not plausible to think of two-way causation between age and number of children, marital status, general health, or region. The causality in those relationships can only be age causing them: they are not causes of aging. So here we would be quite sure there is collision, not confounding.

                      Comment


                      • #12
                        interesting! in fact, coefficients of training and education interactions are fairly similar (model 2 is -.2175764 and -.1727221 in model 3) and confidence intervals overlap a little bit (in model 2 it is -.4077302 -.0274226 and in model 3 it is -.4380111 .0925668)! Clyde Schechter does that mean I can conclude the difference is not materially significant?

                        P.S. thank you for the age variable explanation, it makes a lot of sense!

                        Comment


                        • #13
                          Yes, I would interpret those interaaction coefficients in models 2 and 3 as not being materially different. And I would say that the confidence intervals actually overlap substantially, not just a little bit.

                          Comment


                          • #14
                            perfect, thank you very much for everything! Clyde Schechter

                            Comment

                            Working...
                            X