Need help with interaction terms in the regression equation

Guest

Need help with interaction terms in the regression equation

22 Apr 2022, 11:34

Dear all,

I am estimating whether full-time employees receive more training than part-time employees - or if the employment type affects the training hours. As such, I created dummy variables where job_type==1 (full-time) and job_type==0 (part-time).

Initially, I used different models, where one was created using "xtreg wages i.high_qual training_hrs i.illness_disability i.sex i.children i.general_health i.region i.age i.sector if job_type==1, re vce(robust)' for FT employees and another for PT employees using " xtreg wages i.high_qual training_hrs i.illness_disability i.sex i.children i.general_health i.region i.age i.sector if job_type==0, re vce(robust)".

However, I now want to do the same, but using interaction terms to produce a single set of models.

My question is: is the best way of doing it using the code

Code:

xtreg wages i.high_qual training_hrs i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type training_hrs##job_type, re vce(robust)

When I run this code, I encounter a couple of problems - 1) Stata runs the command for a very long time (over 30 mins and it is still not over) and 2) as you can see below, a lot of variables are omitted.

Code:

xtreg wages i.high_qual training_hrs training_hrs##job_type i.illness_disability i.sex i.children i.general_health i.region i.age i.sector job_type, re vce(robust)
note: 5120.training_hrs omitted because of collinearity.
note: 85.training_hrs#0.job_type identifies no observations in the sample.
note: 85.training_hrs#1.job_type omitted because of collinearity.
note: 107.training_hrs#0.job_type identifies no observations in the sample.
note: 107.training_hrs#1.job_type omitted because of collinearity.
note: 109.training_hrs#0.job_type identifies no observations in the sample.
note: 109.training_hrs#1.job_type omitted because of collinearity.
note: 113.training_hrs#0.job_type identifies no observations in the sample.
note: 113.training_hrs#1.job_type omitted because of collinearity.
note: 121.training_hrs#0.job_type identifies no observations in the sample.
note: 121.training_hrs#1.job_type omitted because of collinearity.
note: 127.training_hrs#1.job_type identifies no observations in the sample.
note: 134.training_hrs#0.job_type identifies no observations in the sample.
note: 134.training_hrs#1.job_type omitted because of collinearity.
note: 139.training_hrs#0.job_type identifies no observations in the sample.
note: 139.training_hrs#1.job_type omitted because of collinearity.
note: 143.training_hrs#0.job_type identifies no observations in the sample.
note: 143.training_hrs#1.job_type omitted because of collinearity.
note: 146.training_hrs#0.job_type identifies no observations in the sample.
note: 146.training_hrs#1.job_type omitted because of collinearity.
note: 149.training_hrs#0.job_type identifies no observations in the sample.
note: 149.training_hrs#1.job_type omitted because of collinearity.
note: 159.training_hrs#0.job_type identifies no observations in the sample.
note: 159.training_hrs#1.job_type omitted because of collinearity.
note: 163.training_hrs#0.job_type identifies no observations in the sample.
note: 163.training_hrs#1.job_type omitted because of collinearity.
note: 166.training_hrs#0.job_type identifies no observations in the sample.
note: 166.training_hrs#1.job_type omitted because of collinearity.
note: 177.training_hrs#0.job_type identifies no observations in the sample.
note: 177.training_hrs#1.job_type omitted because of collinearity.
note: 186.training_hrs#0.job_type identifies no observations in the sample.
note: 186.training_hrs#1.job_type omitted because of collinearity.
note: 187.training_hrs#0.job_type identifies no observations in the sample.
note: 187.training_hrs#1.job_type omitted because of collinearity.
note: 191.training_hrs#0.job_type identifies no observations in the sample.
note: 191.training_hrs#1.job_type omitted because of collinearity.
note: 193.training_hrs#1.job_type identifies no observations in the sample.
note: 194.training_hrs#0.job_type identifies no observations in the sample.
note: 194.training_hrs#1.job_type omitted because of collinearity.
note: 205.training_hrs#0.job_type identifies no observations in the sample.
note: 205.training_hrs#1.job_type omitted because of collinearity.
note: 209.training_hrs#1.job_type identifies no observations in the sample.
note: 212.training_hrs#0.job_type identifies no observations in the sample.
note: 212.training_hrs#1.job_type omitted because of collinearity.
note: 215.training_hrs#0.job_type identifies no observations in the sample.
note: 215.training_hrs#1.job_type omitted because of collinearity.
note: 217.training_hrs#0.job_type identifies no observations in the sample.
note: 217.training_hrs#1.job_type omitted because of collinearity.
note: 219.training_hrs#0.job_type identifies no observations in the sample.
note: 219.training_hrs#1.job_type omitted because of collinearity.
note: 222.training_hrs#0.job_type identifies no observations in the sample.
note: 222.training_hrs#1.job_type omitted because of collinearity.
note: 223.training_hrs#1.job_type identifies no observations in the sample.
note: 231.training_hrs#1.job_type identifies no observations in the sample.
note: 232.training_hrs#0.job_type identifies no observations in the sample.
note: 232.training_hrs#1.job_type omitted because of collinearity.

How can I estimate the impact of employment type on training hours using interaction terms otherwise? I include the data example below. I am grateful in advance if you could please help me with this problem!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(sex region age high_qual) float(training_hrs wages training job_type) int _freq
1 1 4 1 0         0 0 0 1
1 1 4 1 0 3.2342455 0 0 1
1 1 4 1 0  3.986507 0 0 1
1 1 4 1 0  5.193341 0 0 1
1 1 4 1 0  5.792361 0 0 1
1 1 4 1 0  5.979761 0 0 1
1 1 4 1 0  6.311906 0 0 1
1 1 4 1 0  6.525496 0 0 1
1 1 4 1 0  6.599126 0 0 1
1 1 4 1 0  6.899724 0 0 1
1 1 4 1 0  6.976388 0 0 1
1 1 4 1 0  7.163255 0 0 1
1 1 4 1 0  7.661569 0 0 1
1 1 4 1 0  7.664548 0 0 1
1 1 4 1 0  7.901599 0 0 1
1 1 4 1 0  8.284412 0 0 1
1 1 4 1 0  8.432996 0 0 1
1 1 4 1 0  9.966268 0 0 1
1 1 4 1 0  10.09832 0 0 1
1 1 4 1 0 16.360012 0 0 1
1 1 4 1 0  22.99908 0 0 1
2 1 4 1 0  3.425857 0 0 1
2 1 4 1 0 3.9366376 0 0 1
2 1 4 1 0  4.599816 0 0 1
2 1 4 1 0  6.342206 0 0 1
2 1 4 1 0  6.612236 0 0 1
2 1 4 1 0  7.007532 0 0 1
2 1 4 1 0  8.023076 0 0 1
2 1 4 1 0  8.333333 0 0 1
2 1 4 1 0  8.491968 0 0 1
2 1 4 1 0  8.624655 0 0 1
2 1 4 1 0  9.199632 0 0 1
2 1 4 1 0  9.380175 0 0 1
2 1 4 1 0  9.400874 0 0 1
2 1 4 1 0 11.234702 0 0 1
2 1 4 1 0 11.461246 0 0 1
2 1 4 1 0 11.710413 0 0 1
2 1 4 1 0  11.93499 0 0 1
2 1 4 1 0  11.95521 0 0 1
2 1 4 1 0 12.457797 0 0 1
2 1 4 1 0 13.080727 0 0 1
2 1 4 1 0 14.566084 0 0 1
1 1 5 1 0         0 0 0 3
1 1 5 1 0  .7153225 0 0 1
1 1 5 1 0   1.91659 0 0 1
1 1 5 1 0  5.366452 0 0 1
1 1 5 1 0  6.073195 0 0 1
1 1 5 1 0  6.976357 0 0 1
1 1 5 1 0  8.097592 0 0 1
1 1 5 1 0  8.145508 0 0 1
1 1 5 1 0  9.966191 0 0 1
1 1 5 1 0 14.627415 0 0 1
2 1 5 1 0  3.752082 0 0 1
2 1 5 1 0  7.328183 0 0 1
2 1 5 1 0  7.877185 0 0 1
2 1 5 1 0  8.840271 0 0 1
2 1 5 1 0  8.957537 0 0 1
2 1 5 1 0  8.969642 0 0 1
2 1 5 1 0 10.732904 0 0 1
2 1 5 1 0 15.639375 0 0 1
2 1 5 1 0 16.065336 0 0 1
2 1 5 1 0 17.888147 0 0 1
2 1 5 1 0 101.85307 0 0 2
1 1 6 1 0  5.398363 0 0 1
1 1 6 1 0  7.187212 0 0 1
1 1 6 1 0  7.225736 0 0 1
1 1 6 1 0  8.305198 0 0 1
1 1 6 1 0  8.624655 0 0 1
1 1 6 1 0  9.199632 0 0 1
1 1 6 1 0  9.722339 0 0 1
2 1 6 1 0  4.456072 0 0 1
2 1 6 1 0  6.470288 0 0 1
2 1 6 1 0  6.644243 0 0 1
2 1 6 1 0  7.704692 0 0 1
2 1 6 1 0  7.927679 0 0 1
2 1 6 1 0  8.213957 0 0 1
2 1 6 1 0   8.23367 0 0 1
2 1 6 1 0  8.305223 0 0 2
2 1 6 1 0  8.518178 0 0 1
2 1 6 1 0  8.624655 0 0 1
2 1 6 1 0  8.944087 0 0 1
2 1 6 1 0  9.045794 0 0 1
2 1 6 1 0  9.301748 0 0 1
2 1 6 1 0   9.58295 0 0 1
2 1 6 1 0   9.65487 0 0 1
2 1 6 1 0  9.774609 0 0 1
2 1 6 1 0  9.966268 0 0 1
2 1 6 1 0 10.732904 0 0 1
2 1 6 1 0 11.821527 0 0 1
2 1 6 1 0  11.95952 0 0 1
2 1 6 1 0 12.218262 0 0 1
2 1 6 1 0  12.34217 0 0 1
2 1 6 1 0 12.431935 0 0 1
2 1 6 1 0  12.90881 0 0 1
2 1 6 1 0 12.936982 0 0 1
2 1 6 1 0  13.41613 0 0 1
2 1 6 1 0 13.569457 0 0 1
2 1 6 1 0  13.64612 0 0 1
2 1 6 1 0 13.799448 0 0 1
2 1 6 1 0 14.503965 0 0 1
end
label values sex b_sex
label def b_sex 1 "male", modify
label def b_sex 2 "female", modify
label values region b_gor_dv
label def b_gor_dv 1 "North East", modify
label values age b_agegr13_dv
label def b_agegr13_dv 4 "20-24 years old", modify
label def b_agegr13_dv 5 "25-29 years old", modify
label def b_agegr13_dv 6 "30-34 years old", modify
label values high_qual b_hiqual_dv
label def b_hiqual_dv 1 "Degree", modify
label values job_type ptime_ftime
label def ptime_ftime 0 "Part time", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#2

22 Apr 2022, 12:01

The problem is with the way you have specified the interaction term. Your variable training_hrs is a continuous variable. In interaction terms, if you write X##Y, Stata assumes that X and Y are categorical variables, not continuous. So Stata is trying to treat training_hrs as a continuous variable, resulting in a huge number of "dummy" variables corresponding to each distinct value. Moreover, most of those values will not occur with both full time and part time jobs, so you are getting this large number of messages about empty variables. The solution is to use the c. prefix for that variable. And, although it is not necessary, I recommend you use the i. prefix for job_type. It's always clearer to be explicit rather than rely on default interpretations. Note that you also specified training_hrs by itself in the model. That is OK, but unnecessary and potentially a source of confusion. So rewrite the command as:

Code:

xtreg wages i.high_qual i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type c.training_hrs##i.job_type, re vce(robust)
Comment

Guest

22 Apr 2022, 12:12

Oh right! Thank you Clyde Schechter, it worked well! But do you think that the approach below will also be feasible for my purposes (where high training is training_hrs >21hrs and low training is below 21hrs)?

Code:

xtreg wages i.high_qual training_hrs training_level##job_type i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type, re vce(robust)

Random-effects GLS regression                   Number of obs     =     81,014
Group variable: id                              Number of groups  =     45,174

R-squared:                                      Obs per group:
     Within  = 0.0352                                         min =          1
     Between = 0.2319                                         avg =        1.8
     Overall = 0.2210                                         max =          4

                                                Wald chi2(46)     =   12401.44
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                                      (Std. err. adjusted for 45,174 clusters in id)
----------------------------------------------------------------------------------------------------
                                   |               Robust
                             wages | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-----------------------------------+----------------------------------------------------------------
                         high_qual |
              Other higher degree  |  -2.328943   .1287285   -18.09   0.000    -2.581246    -2.07664
                      A-level etc  |  -2.861423   .1151489   -24.85   0.000     -3.08711   -2.635735
                         GCSE etc  |  -3.798948   .1139943   -33.33   0.000    -4.022373   -3.575523
              Other qualification  |  -4.539611   .1462049   -31.05   0.000    -4.826168   -4.253055
                 No qualification  |   -5.50174   .1728681   -31.83   0.000    -5.840555   -5.162925
                                   |
                      training_hrs |   -.000675   .0003241    -2.08   0.037    -.0013102   -.0000397
                                   |
                    training_level |
                             High  |    .535811   .1900772     2.82   0.005     .1632665    .9083555
                                   |
                          job_type |
                        Full time  |  -2.862712   .1152421   -24.84   0.000    -3.088583   -2.636842
                                   |
           training_level#job_type |
                   High#Full time  |   .2860871   .1987586     1.44   0.150    -.1034727    .6756468
                                   |
                illness_disability |
                               no  |    .207743   .0701464     2.96   0.003     .0702585    .3452275
                                   |
                               sex |
                           female  |  -2.271416   .0897984   -25.29   0.000    -2.447418   -2.095414
                                   |
                          children |
                                1  |  -.8475629   .1000312    -8.47   0.000    -1.043621   -.6515053
                                2  |  -1.102295   .1331827    -8.28   0.000    -1.363328   -.8412614
                                3  |  -2.145631   .1992736   -10.77   0.000      -2.5362   -1.755062
                                4  |  -2.992585   .3185437    -9.39   0.000     -3.61692   -2.368251
                                5  |  -3.012423   .7865182    -3.83   0.000     -4.55397   -1.470876
                                6  |  -5.169554   1.542942    -3.35   0.001    -8.193664   -2.145444
                                   |
                    general_health |
                        very good  |  -.3796525   .0738245    -5.14   0.000    -.5243459   -.2349591
                         or Poor?  |  -.9176918   .1817519    -5.05   0.000    -1.273919   -.5614647
                                   |
                            region |
                       North West  |   .3039916   .1792045     1.70   0.090    -.0472428     .655226
         Yorkshire and the Humber  |  -.2867893   .1812841    -1.58   0.114    -.6420997    .0685211
                    East Midlands  |   .0635786   .1902627     0.33   0.738    -.3093294    .4364866
                    West Midlands  |   .5081194   .1968847     2.58   0.010     .1222325    .8940063
                  East of England  |   .9545191   .1930645     4.94   0.000     .5761197    1.332919
                           London  |   1.414972   .1889339     7.49   0.000     1.044669    1.785276
                       South East  |   1.512466   .1893075     7.99   0.000      1.14143    1.883502
                       South West  |  -.0486637   .1930369    -0.25   0.801     -.427009    .3296817
                            Wales  |  -.2450324   .1865062    -1.31   0.189    -.6105778    .1205131
                         Scotland  |   .5794995   .1790084     3.24   0.001     .2286495    .9303494
                 Northern Ireland  |  -.0599725   .1805486    -0.33   0.740    -.4138413    .2938963
                                   |
                               age |
                  18-19 years old  |   .8949354   .2158393     4.15   0.000      .471898    1.317973
                  20-24 years old  |   2.082983   .2184386     9.54   0.000     1.654851    2.511115
                  25-29 years old  |   3.579527   .2139906    16.73   0.000     3.160113     3.99894
                  30-34 years old  |   5.031552   .2137355    23.54   0.000     4.612638    5.450465
                  35-39 years old  |   5.966847   .2184446    27.32   0.000     5.538704    6.394991
                  40-44 years old  |   6.405754   .2149296    29.80   0.000       5.9845    6.827009
                  45-49 years old  |   6.525323   .2153492    30.30   0.000     6.103246    6.947399
                  50-54 years old  |   6.157413   .2132289    28.88   0.000     5.739492    6.575334
                  55-59 years old  |   5.825606   .2261325    25.76   0.000     5.382395    6.268818
                  60-64 years old  |   4.816733   .2503987    19.24   0.000      4.32596    5.307506
                65 years or older  |   2.119762   .3149281     6.73   0.000     1.502515     2.73701
                                   |
                            sector |
managerial & technical occupation  |   -.411535     .22952    -1.79   0.073     -.861386     .038316
               skilled non-manual  |  -3.377155   .2304428   -14.66   0.000    -3.828814   -2.925495
                   skilled manual  |  -6.791225    .236485   -28.72   0.000    -7.254727   -6.327723
        partly skilled occupation  |  -4.937016   .2350558   -21.00   0.000    -5.397717   -4.476315
             unskilled occupation  |  -5.285932   .2588535   -20.42   0.000    -5.793275   -4.778588
                                   |
                             _cons |   14.68235   .3437735    42.71   0.000     14.00857    15.35613
-----------------------------------+----------------------------------------------------------------
                           sigma_u |   6.479498
                           sigma_e |  5.0558683
                               rho |  .62156281   (fraction of variance due to u_i)
----------------------------------------------------------------------------------------------------

Also, when I want to standerdise coefficients for the above equation, I get an error code

Code:

  xtreg wages i.high_qual training_hrs training_level##job_type i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type, re 
> vce(robust) beta
option beta not allowed
r(198);

Is there anything I can do to fix this? Thank you so much for your time!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#4

22 Apr 2022, 12:25

I generally discourage creating dichotomies out of continuous variables. Unless there is something qualitatively different that happens at 21 hours, all you are doing is replacing information with noise. Is it really true that 1 hour of training is similar in its effect on wages to 21 hours of training but 22 hours of training is radically different? If so, then go ahead. If not, it's a really bad idea to do this.

Standardizing is another bad idea. Don't do it here. All of your variables are either categories (where standardizing makes no sense at all!) or have natural units like hours and wages. If you try to standardize all you will do is obfuscate your results so that nobody can understand them. Standardizing is only useful with continuous variables that have no natural units, or whose natural units are not understandable to the audience for your study. I cannot imagine you will be publishing or presenting these results to people who do not understand dollars (or euros, yen, pounds, yuan, rubles, whatever) and hours.
1 like
Comment
Guest
#5

22 Apr 2022, 19:22

Clyde Schechter Thank you very much, I will follow your advice! You have helped me a lot and I really appreciate it.
Comment

Announcement

Need help with interaction terms in the regression equation

Comment

Comment

Comment

Comment