Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help with interaction terms in the regression equation

    Dear all,

    I am estimating whether full-time employees receive more training than part-time employees - or if the employment type affects the training hours. As such, I created dummy variables where job_type==1 (full-time) and job_type==0 (part-time).

    Initially, I used different models, where one was created using "xtreg wages i.high_qual training_hrs i.illness_disability i.sex i.children i.general_health i.region i.age i.sector if job_type==1, re vce(robust)' for FT employees and another for PT employees using " xtreg wages i.high_qual training_hrs i.illness_disability i.sex i.children i.general_health i.region i.age i.sector if job_type==0, re vce(robust)".

    However, I now want to do the same, but using interaction terms to produce a single set of models.

    My question is: is the best way of doing it using the code

    Code:
    xtreg wages i.high_qual training_hrs i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type training_hrs##job_type, re vce(robust)
    When I run this code, I encounter a couple of problems - 1) Stata runs the command for a very long time (over 30 mins and it is still not over) and 2) as you can see below, a lot of variables are omitted.

    Code:
    xtreg wages i.high_qual training_hrs training_hrs##job_type i.illness_disability i.sex i.children i.general_health i.region i.age i.sector job_type, re vce(robust)
    note: 5120.training_hrs omitted because of collinearity.
    note: 85.training_hrs#0.job_type identifies no observations in the sample.
    note: 85.training_hrs#1.job_type omitted because of collinearity.
    note: 107.training_hrs#0.job_type identifies no observations in the sample.
    note: 107.training_hrs#1.job_type omitted because of collinearity.
    note: 109.training_hrs#0.job_type identifies no observations in the sample.
    note: 109.training_hrs#1.job_type omitted because of collinearity.
    note: 113.training_hrs#0.job_type identifies no observations in the sample.
    note: 113.training_hrs#1.job_type omitted because of collinearity.
    note: 121.training_hrs#0.job_type identifies no observations in the sample.
    note: 121.training_hrs#1.job_type omitted because of collinearity.
    note: 127.training_hrs#1.job_type identifies no observations in the sample.
    note: 134.training_hrs#0.job_type identifies no observations in the sample.
    note: 134.training_hrs#1.job_type omitted because of collinearity.
    note: 139.training_hrs#0.job_type identifies no observations in the sample.
    note: 139.training_hrs#1.job_type omitted because of collinearity.
    note: 143.training_hrs#0.job_type identifies no observations in the sample.
    note: 143.training_hrs#1.job_type omitted because of collinearity.
    note: 146.training_hrs#0.job_type identifies no observations in the sample.
    note: 146.training_hrs#1.job_type omitted because of collinearity.
    note: 149.training_hrs#0.job_type identifies no observations in the sample.
    note: 149.training_hrs#1.job_type omitted because of collinearity.
    note: 159.training_hrs#0.job_type identifies no observations in the sample.
    note: 159.training_hrs#1.job_type omitted because of collinearity.
    note: 163.training_hrs#0.job_type identifies no observations in the sample.
    note: 163.training_hrs#1.job_type omitted because of collinearity.
    note: 166.training_hrs#0.job_type identifies no observations in the sample.
    note: 166.training_hrs#1.job_type omitted because of collinearity.
    note: 177.training_hrs#0.job_type identifies no observations in the sample.
    note: 177.training_hrs#1.job_type omitted because of collinearity.
    note: 186.training_hrs#0.job_type identifies no observations in the sample.
    note: 186.training_hrs#1.job_type omitted because of collinearity.
    note: 187.training_hrs#0.job_type identifies no observations in the sample.
    note: 187.training_hrs#1.job_type omitted because of collinearity.
    note: 191.training_hrs#0.job_type identifies no observations in the sample.
    note: 191.training_hrs#1.job_type omitted because of collinearity.
    note: 193.training_hrs#1.job_type identifies no observations in the sample.
    note: 194.training_hrs#0.job_type identifies no observations in the sample.
    note: 194.training_hrs#1.job_type omitted because of collinearity.
    note: 205.training_hrs#0.job_type identifies no observations in the sample.
    note: 205.training_hrs#1.job_type omitted because of collinearity.
    note: 209.training_hrs#1.job_type identifies no observations in the sample.
    note: 212.training_hrs#0.job_type identifies no observations in the sample.
    note: 212.training_hrs#1.job_type omitted because of collinearity.
    note: 215.training_hrs#0.job_type identifies no observations in the sample.
    note: 215.training_hrs#1.job_type omitted because of collinearity.
    note: 217.training_hrs#0.job_type identifies no observations in the sample.
    note: 217.training_hrs#1.job_type omitted because of collinearity.
    note: 219.training_hrs#0.job_type identifies no observations in the sample.
    note: 219.training_hrs#1.job_type omitted because of collinearity.
    note: 222.training_hrs#0.job_type identifies no observations in the sample.
    note: 222.training_hrs#1.job_type omitted because of collinearity.
    note: 223.training_hrs#1.job_type identifies no observations in the sample.
    note: 231.training_hrs#1.job_type identifies no observations in the sample.
    note: 232.training_hrs#0.job_type identifies no observations in the sample.
    note: 232.training_hrs#1.job_type omitted because of collinearity.
    How can I estimate the impact of employment type on training hours using interaction terms otherwise? I include the data example below. I am grateful in advance if you could please help me with this problem!


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(sex region age high_qual) float(training_hrs wages training job_type) int _freq
    1 1 4 1 0         0 0 0 1
    1 1 4 1 0 3.2342455 0 0 1
    1 1 4 1 0  3.986507 0 0 1
    1 1 4 1 0  5.193341 0 0 1
    1 1 4 1 0  5.792361 0 0 1
    1 1 4 1 0  5.979761 0 0 1
    1 1 4 1 0  6.311906 0 0 1
    1 1 4 1 0  6.525496 0 0 1
    1 1 4 1 0  6.599126 0 0 1
    1 1 4 1 0  6.899724 0 0 1
    1 1 4 1 0  6.976388 0 0 1
    1 1 4 1 0  7.163255 0 0 1
    1 1 4 1 0  7.661569 0 0 1
    1 1 4 1 0  7.664548 0 0 1
    1 1 4 1 0  7.901599 0 0 1
    1 1 4 1 0  8.284412 0 0 1
    1 1 4 1 0  8.432996 0 0 1
    1 1 4 1 0  9.966268 0 0 1
    1 1 4 1 0  10.09832 0 0 1
    1 1 4 1 0 16.360012 0 0 1
    1 1 4 1 0  22.99908 0 0 1
    2 1 4 1 0  3.425857 0 0 1
    2 1 4 1 0 3.9366376 0 0 1
    2 1 4 1 0  4.599816 0 0 1
    2 1 4 1 0  6.342206 0 0 1
    2 1 4 1 0  6.612236 0 0 1
    2 1 4 1 0  7.007532 0 0 1
    2 1 4 1 0  8.023076 0 0 1
    2 1 4 1 0  8.333333 0 0 1
    2 1 4 1 0  8.491968 0 0 1
    2 1 4 1 0  8.624655 0 0 1
    2 1 4 1 0  9.199632 0 0 1
    2 1 4 1 0  9.380175 0 0 1
    2 1 4 1 0  9.400874 0 0 1
    2 1 4 1 0 11.234702 0 0 1
    2 1 4 1 0 11.461246 0 0 1
    2 1 4 1 0 11.710413 0 0 1
    2 1 4 1 0  11.93499 0 0 1
    2 1 4 1 0  11.95521 0 0 1
    2 1 4 1 0 12.457797 0 0 1
    2 1 4 1 0 13.080727 0 0 1
    2 1 4 1 0 14.566084 0 0 1
    1 1 5 1 0         0 0 0 3
    1 1 5 1 0  .7153225 0 0 1
    1 1 5 1 0   1.91659 0 0 1
    1 1 5 1 0  5.366452 0 0 1
    1 1 5 1 0  6.073195 0 0 1
    1 1 5 1 0  6.976357 0 0 1
    1 1 5 1 0  8.097592 0 0 1
    1 1 5 1 0  8.145508 0 0 1
    1 1 5 1 0  9.966191 0 0 1
    1 1 5 1 0 14.627415 0 0 1
    2 1 5 1 0  3.752082 0 0 1
    2 1 5 1 0  7.328183 0 0 1
    2 1 5 1 0  7.877185 0 0 1
    2 1 5 1 0  8.840271 0 0 1
    2 1 5 1 0  8.957537 0 0 1
    2 1 5 1 0  8.969642 0 0 1
    2 1 5 1 0 10.732904 0 0 1
    2 1 5 1 0 15.639375 0 0 1
    2 1 5 1 0 16.065336 0 0 1
    2 1 5 1 0 17.888147 0 0 1
    2 1 5 1 0 101.85307 0 0 2
    1 1 6 1 0  5.398363 0 0 1
    1 1 6 1 0  7.187212 0 0 1
    1 1 6 1 0  7.225736 0 0 1
    1 1 6 1 0  8.305198 0 0 1
    1 1 6 1 0  8.624655 0 0 1
    1 1 6 1 0  9.199632 0 0 1
    1 1 6 1 0  9.722339 0 0 1
    2 1 6 1 0  4.456072 0 0 1
    2 1 6 1 0  6.470288 0 0 1
    2 1 6 1 0  6.644243 0 0 1
    2 1 6 1 0  7.704692 0 0 1
    2 1 6 1 0  7.927679 0 0 1
    2 1 6 1 0  8.213957 0 0 1
    2 1 6 1 0   8.23367 0 0 1
    2 1 6 1 0  8.305223 0 0 2
    2 1 6 1 0  8.518178 0 0 1
    2 1 6 1 0  8.624655 0 0 1
    2 1 6 1 0  8.944087 0 0 1
    2 1 6 1 0  9.045794 0 0 1
    2 1 6 1 0  9.301748 0 0 1
    2 1 6 1 0   9.58295 0 0 1
    2 1 6 1 0   9.65487 0 0 1
    2 1 6 1 0  9.774609 0 0 1
    2 1 6 1 0  9.966268 0 0 1
    2 1 6 1 0 10.732904 0 0 1
    2 1 6 1 0 11.821527 0 0 1
    2 1 6 1 0  11.95952 0 0 1
    2 1 6 1 0 12.218262 0 0 1
    2 1 6 1 0  12.34217 0 0 1
    2 1 6 1 0 12.431935 0 0 1
    2 1 6 1 0  12.90881 0 0 1
    2 1 6 1 0 12.936982 0 0 1
    2 1 6 1 0  13.41613 0 0 1
    2 1 6 1 0 13.569457 0 0 1
    2 1 6 1 0  13.64612 0 0 1
    2 1 6 1 0 13.799448 0 0 1
    2 1 6 1 0 14.503965 0 0 1
    end
    label values sex b_sex
    label def b_sex 1 "male", modify
    label def b_sex 2 "female", modify
    label values region b_gor_dv
    label def b_gor_dv 1 "North East", modify
    label values age b_agegr13_dv
    label def b_agegr13_dv 4 "20-24 years old", modify
    label def b_agegr13_dv 5 "25-29 years old", modify
    label def b_agegr13_dv 6 "30-34 years old", modify
    label values high_qual b_hiqual_dv
    label def b_hiqual_dv 1 "Degree", modify
    label values job_type ptime_ftime
    label def ptime_ftime 0 "Part time", modify


  • #2
    The problem is with the way you have specified the interaction term. Your variable training_hrs is a continuous variable. In interaction terms, if you write X##Y, Stata assumes that X and Y are categorical variables, not continuous. So Stata is trying to treat training_hrs as a continuous variable, resulting in a huge number of "dummy" variables corresponding to each distinct value. Moreover, most of those values will not occur with both full time and part time jobs, so you are getting this large number of messages about empty variables. The solution is to use the c. prefix for that variable. And, although it is not necessary, I recommend you use the i. prefix for job_type. It's always clearer to be explicit rather than rely on default interpretations. Note that you also specified training_hrs by itself in the model. That is OK, but unnecessary and potentially a source of confusion. So rewrite the command as:

    Code:
    xtreg wages i.high_qual i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type c.training_hrs##i.job_type, re vce(robust)

    Comment


    • #3
      Oh right! Thank you Clyde Schechter, it worked well! But do you think that the approach below will also be feasible for my purposes (where high training is training_hrs >21hrs and low training is below 21hrs)?

      Code:
      xtreg wages i.high_qual training_hrs training_level##job_type i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type, re vce(robust)
      
      Random-effects GLS regression                   Number of obs     =     81,014
      Group variable: id                              Number of groups  =     45,174
      
      R-squared:                                      Obs per group:
           Within  = 0.0352                                         min =          1
           Between = 0.2319                                         avg =        1.8
           Overall = 0.2210                                         max =          4
      
                                                      Wald chi2(46)     =   12401.44
      corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
      
                                                            (Std. err. adjusted for 45,174 clusters in id)
      ----------------------------------------------------------------------------------------------------
                                         |               Robust
                                   wages | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -----------------------------------+----------------------------------------------------------------
                               high_qual |
                    Other higher degree  |  -2.328943   .1287285   -18.09   0.000    -2.581246    -2.07664
                            A-level etc  |  -2.861423   .1151489   -24.85   0.000     -3.08711   -2.635735
                               GCSE etc  |  -3.798948   .1139943   -33.33   0.000    -4.022373   -3.575523
                    Other qualification  |  -4.539611   .1462049   -31.05   0.000    -4.826168   -4.253055
                       No qualification  |   -5.50174   .1728681   -31.83   0.000    -5.840555   -5.162925
                                         |
                            training_hrs |   -.000675   .0003241    -2.08   0.037    -.0013102   -.0000397
                                         |
                          training_level |
                                   High  |    .535811   .1900772     2.82   0.005     .1632665    .9083555
                                         |
                                job_type |
                              Full time  |  -2.862712   .1152421   -24.84   0.000    -3.088583   -2.636842
                                         |
                 training_level#job_type |
                         High#Full time  |   .2860871   .1987586     1.44   0.150    -.1034727    .6756468
                                         |
                      illness_disability |
                                     no  |    .207743   .0701464     2.96   0.003     .0702585    .3452275
                                         |
                                     sex |
                                 female  |  -2.271416   .0897984   -25.29   0.000    -2.447418   -2.095414
                                         |
                                children |
                                      1  |  -.8475629   .1000312    -8.47   0.000    -1.043621   -.6515053
                                      2  |  -1.102295   .1331827    -8.28   0.000    -1.363328   -.8412614
                                      3  |  -2.145631   .1992736   -10.77   0.000      -2.5362   -1.755062
                                      4  |  -2.992585   .3185437    -9.39   0.000     -3.61692   -2.368251
                                      5  |  -3.012423   .7865182    -3.83   0.000     -4.55397   -1.470876
                                      6  |  -5.169554   1.542942    -3.35   0.001    -8.193664   -2.145444
                                         |
                          general_health |
                              very good  |  -.3796525   .0738245    -5.14   0.000    -.5243459   -.2349591
                               or Poor?  |  -.9176918   .1817519    -5.05   0.000    -1.273919   -.5614647
                                         |
                                  region |
                             North West  |   .3039916   .1792045     1.70   0.090    -.0472428     .655226
               Yorkshire and the Humber  |  -.2867893   .1812841    -1.58   0.114    -.6420997    .0685211
                          East Midlands  |   .0635786   .1902627     0.33   0.738    -.3093294    .4364866
                          West Midlands  |   .5081194   .1968847     2.58   0.010     .1222325    .8940063
                        East of England  |   .9545191   .1930645     4.94   0.000     .5761197    1.332919
                                 London  |   1.414972   .1889339     7.49   0.000     1.044669    1.785276
                             South East  |   1.512466   .1893075     7.99   0.000      1.14143    1.883502
                             South West  |  -.0486637   .1930369    -0.25   0.801     -.427009    .3296817
                                  Wales  |  -.2450324   .1865062    -1.31   0.189    -.6105778    .1205131
                               Scotland  |   .5794995   .1790084     3.24   0.001     .2286495    .9303494
                       Northern Ireland  |  -.0599725   .1805486    -0.33   0.740    -.4138413    .2938963
                                         |
                                     age |
                        18-19 years old  |   .8949354   .2158393     4.15   0.000      .471898    1.317973
                        20-24 years old  |   2.082983   .2184386     9.54   0.000     1.654851    2.511115
                        25-29 years old  |   3.579527   .2139906    16.73   0.000     3.160113     3.99894
                        30-34 years old  |   5.031552   .2137355    23.54   0.000     4.612638    5.450465
                        35-39 years old  |   5.966847   .2184446    27.32   0.000     5.538704    6.394991
                        40-44 years old  |   6.405754   .2149296    29.80   0.000       5.9845    6.827009
                        45-49 years old  |   6.525323   .2153492    30.30   0.000     6.103246    6.947399
                        50-54 years old  |   6.157413   .2132289    28.88   0.000     5.739492    6.575334
                        55-59 years old  |   5.825606   .2261325    25.76   0.000     5.382395    6.268818
                        60-64 years old  |   4.816733   .2503987    19.24   0.000      4.32596    5.307506
                      65 years or older  |   2.119762   .3149281     6.73   0.000     1.502515     2.73701
                                         |
                                  sector |
      managerial & technical occupation  |   -.411535     .22952    -1.79   0.073     -.861386     .038316
                     skilled non-manual  |  -3.377155   .2304428   -14.66   0.000    -3.828814   -2.925495
                         skilled manual  |  -6.791225    .236485   -28.72   0.000    -7.254727   -6.327723
              partly skilled occupation  |  -4.937016   .2350558   -21.00   0.000    -5.397717   -4.476315
                   unskilled occupation  |  -5.285932   .2588535   -20.42   0.000    -5.793275   -4.778588
                                         |
                                   _cons |   14.68235   .3437735    42.71   0.000     14.00857    15.35613
      -----------------------------------+----------------------------------------------------------------
                                 sigma_u |   6.479498
                                 sigma_e |  5.0558683
                                     rho |  .62156281   (fraction of variance due to u_i)
      ----------------------------------------------------------------------------------------------------
      Also, when I want to standerdise coefficients for the above equation, I get an error code

      Code:
        xtreg wages i.high_qual training_hrs training_level##job_type i.illness_disability i.sex i.children i.general_health i.region i.age i.sector i.job_type, re 
      > vce(robust) beta
      option beta not allowed
      r(198);
      Is there anything I can do to fix this? Thank you so much for your time!

      Comment


      • #4
        I generally discourage creating dichotomies out of continuous variables. Unless there is something qualitatively different that happens at 21 hours, all you are doing is replacing information with noise. Is it really true that 1 hour of training is similar in its effect on wages to 21 hours of training but 22 hours of training is radically different? If so, then go ahead. If not, it's a really bad idea to do this.

        Standardizing is another bad idea. Don't do it here. All of your variables are either categories (where standardizing makes no sense at all!) or have natural units like hours and wages. If you try to standardize all you will do is obfuscate your results so that nobody can understand them. Standardizing is only useful with continuous variables that have no natural units, or whose natural units are not understandable to the audience for your study. I cannot imagine you will be publishing or presenting these results to people who do not understand dollars (or euros, yen, pounds, yuan, rubles, whatever) and hours.

        Comment


        • #5
          Clyde Schechter Thank you very much, I will follow your advice! You have helped me a lot and I really appreciate it.

          Comment

          Working...
          X