Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in specifying base level when including (#) vs. (##) interactions

    I'm estimating a model in which multiple variables are interacted many times. To more easily identify mistakes and avoid seeing repeated variables Stata is omitting, I would like to be able to write out "a b a#b" instead of "a##b". However, I am having trouble consistently being able to define base levels.

    In the following example code (this isn't the actual regression I'm running), the second line of attempt 1 and attempt 2 omits race==3 instead of race==1. The only way I was able to get the ## and # regressions to match was to manually change my preferred base level to be larger than all other values of race (attempt 3).

    I imagine I'm missing something obvious. Thank you!

    clear
    input float(y educ race age prg gdr)
    0 1 1 2 1 0
    0 1 2 3.2 1 0
    0 3 2 7 1 0
    0 2 2 6 0 0
    1 2 2 1 0 1
    1 2 2 45 0 1
    1 1 2 2 0 0
    1 1 2 1 0 0
    0 3 2 3 1 0
    0 3 2 2 1 1
    0 1 1 1 0 1
    0 2 1 43 0 1
    1 2 1 2 0 0
    1 2 1 1 1 0
    1 3 1 3 0 0
    1 1 1 2 0 1
    0 1 3 1 0 1
    0 2 3 43 0 1
    1 2 3 2 1 0
    1 2 3 1 0 0
    1 3 3 3 1 0
    1 1 3 2 0 1
    1 1 1 2 0 1
    0 1 3 1 0 1
    end

    *attempt 1
    fvset base 1 race
    reg y race##i1.prg
    reg y i.race prg race#i1.prg

    *attempt 2
    reg y ib1.race##i1.prg
    reg y ib1.race prg ib1.race#i1.prg

    *attempt 3
    replace race = 6 if race==1
    fvset base 6 race
    reg y race##i1.prg
    reg y i.race prg race#i1.prg

  • #2
    Krista, I don't understand the reason you choose #, but personally I prefer ## to #, as shown below -- both the code and result are concise and well organized.

    Code:
    reg y b1.race##prg
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            race |
              2  |   .1333333   .2794395     0.48   0.639    -.4537472    .7204139
              3  |  -.2666667   .2794395    -0.95   0.353    -.8537472    .3204139
                 |
           1.prg |  -.1666667   .3767961    -0.44   0.664    -.9582859    .6249526
                 |
        race#prg |
            2 1  |  -.6333333   .4876563    -1.30   0.210    -1.657861    .3911945
            3 1  |   .7666667   .5394899     1.42   0.172    -.3667596    1.900093
                 |
           _cons |   .6666667   .1883981     3.54   0.002      .270857    1.062476
    ------------------------------------------------------------------------------
    If you'd like to separate the three terms, then it's important to tell Stata that the variable type of "race" or "prg" is consistent in any place. In the first line of command below, "prg" itself is treated as a continuous variable while is regarded as a factor variable in "race#prg". One solution (the second line of command) is to add "i." to "prg" itself to emphasize that "prg" is always a factor variable. Another solution (the third line of command) is to add "c." to "prg" in "race#prg" to highlight that "prg" is always a continuous variable. The second solution is only technically correct; conceptually, the first solution is preferred as "prg" is indeed a discrete factor variable.

    Code:
    reg y i.race prg b1.race#prg
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            race |
              2  |   .1333333   .2794395     0.48   0.639    -.4537472    .7204139
              3  |  -.2666667   .2794395    -0.95   0.353    -.8537472    .3204139
                 |
             prg |         .6   .3861011     1.55   0.138    -.2111684    1.411168
                 |
        race#prg |
            1 1  |  -.7666667   .5394899    -1.42   0.172    -1.900093    .3667596
            2 1  |       -1.4   .4948812    -2.83   0.011    -2.439707   -.3602932
            3 1  |          0  (omitted)
                 |
           _cons |   .6666667   .1883981     3.54   0.002      .270857    1.062476
    ------------------------------------------------------------------------------
    
    reg y i.race i.prg b1.race#prg
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            race |
              2  |   .1333333   .2794395     0.48   0.639    -.4537472    .7204139
              3  |  -.2666667   .2794395    -0.95   0.353    -.8537472    .3204139
                 |
           1.prg |  -.1666667   .3767961    -0.44   0.664    -.9582859    .6249526
                 |
        race#prg |
            2 1  |  -.6333333   .4876563    -1.30   0.210    -1.657861    .3911945
            3 1  |   .7666667   .5394899     1.42   0.172    -.3667596    1.900093
                 |
           _cons |   .6666667   .1883981     3.54   0.002      .270857    1.062476
    ------------------------------------------------------------------------------
    
    reg y i.race prg b1.race#c.prg
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            race |
              2  |   .1333333   .2794395     0.48   0.639    -.4537472    .7204139
              3  |  -.2666667   .2794395    -0.95   0.353    -.8537472    .3204139
                 |
             prg |  -.1666667   .3767961    -0.44   0.664    -.9582859    .6249526
                 |
      race#c.prg |
              2  |  -.6333333   .4876563    -1.30   0.210    -1.657861    .3911945
              3  |   .7666667   .5394899     1.42   0.172    -.3667596    1.900093
                 |
           _cons |   .6666667   .1883981     3.54   0.002      .270857    1.062476
    ------------------------------------------------------------------------------
    You may set base group using -fvset base- at the beginning, but again the regression commands need to be the following to reflect the base group setting.

    Code:
    reg y i.race i.prg race#prg
    reg y i.race prg race#c.prg

    Comment


    • #3
      Thank you! This is very helpful. I agree that ## is more well organized in this case. A better example of why I prefer # in my case is the comparison between

      reg y race##c.age gdr##c.age
      reg y i.race i.gdr age race#c.age gdr#c.age
      in which the first regression omits the second occurrence of "age".

      Comment


      • #4
        I see your point now. For this case, you may use the code below to avoid the omission issue.

        Code:
        reg y (race gdr)##c.age
        You may refer to "help fvvarlist" for more flexible syntax.

        Comment

        Working...
        X