Using # versus ## in regression with interaction terms

Warner Shannon

Join Date: Apr 2022

Posts: 1
#1

Using # versus ## in regression with interaction terms

06 Apr 2022, 17:54

I am trying to understand how using one # is different from using two ##.
Could someone please explain the difference between these two regressions:

Code:

xtreg incarcerationRate treatmentOn i.year i.stateId#c.year, fe cluster(city)

Code:

xtreg incarcerationRate treatmentOn i.year i.stateId##c.year, fe cluster(city)

Thank you!
Tags: difference-in-difference, interaction, regression, syntax
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

06 Apr 2022, 18:24

When you use the ##, Stata will "expand" the right hand side of the regression to include stateId, c.year, and their interacation. If you use just the single #, you get only the interaction term, without stateId and c.year appearing. Now, it is possible to algebraically transform either of these models into the other. But doing that breaks down in certain situations, such as when interactions are used to represent quadratic or higher power terms and leads to an invalid model. While you can, with enough insight and experience, work around these difficulties, the safer approach is to always include the constituent ("main") effects and all lower-order interactions when including an interaction term in a model. You will never go wrong that way, and -margins- will help you with the interpretation. So I recommend always using ## unless you have a compelling reason to omit one of the constituents and know how to properly work with the results.

A couple of asides on your model. You don't say how you -xtset- your data. But assuming we are operating in an environment where cities are nested within states, the use of cluster(city) may be problematic. Probably you should be clustering on stateID--unless you don't have enough states in the data to support that. I recommend that because when you specify the variable you cluster on, you are implicitly promising that observations that differ at any higher level in a nesting hierarchy are independent. But you will have dependency in observations from the same state.

The other aside is that the use of yearly shocks and a time trend in the same model, although legal, will make it impossible for you to use the -margins- command. The reason is that -margins- does not know how to deal with the same variable being used as both continuous and factor-variable in the same regression. And if you try to run -margins- after either of those equations, it will complain to you about that and give you no answers. You can work around that by creating a clone of the year variable and using one of them with i. and the other with c., but you have to be careful. That approach will get you incorrect results if you try to condition any of the -margin- results you want on year (or its clone), or estimate a marginal effect of year (or its clone). So I would give some serious thought to whether you really need both yearly shocks and a linear time trend in your model. If you just use year-specific shocks, you will capture all of the same information in the model; you just won't be able to separate out an average linear trend from those shocks. So only do this if your research goals specifically require you to get an estimate of the linear time trend component and you also need to explicitly separate the year-specific component of the noise from the rest.

Last edited by Clyde Schechter; 06 Apr 2022, 18:27.
2 likes
Comment

John Linton

Join Date: Apr 2023
Posts: 3

12 Apr 2023, 07:13

Clyde,

I am curious as to what you mean by

a compelling reason to omit one of the constituents and know how to properly work with the results.

I receive very different results from my model when I use # and ## to measure mass (the treatment) and enabler (a variable hypothesized to enhance the treatment). Both are dichotomous as is the dependent variable, Victory.

For ##,

Code:

logit Victory mass##enabler territory_new funding_new joint_new troop_presence_new other_new unknown_new ActorStrength MilitaryIntervention 

margins mass enabler enabler#mass

margins, dydx(mass) at(enabler = (0 1))

I get these results:

Code:

Logistic regression                                     Number of obs =    674
                                                        LR chi2(11)   =  34.67
                                                        Prob > chi2   = 0.0003
Log likelihood = -403.13236                             Pseudo R2     = 0.0412

--------------------------------------------------------------------------------------
             Victory | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
              1.mass |   .4767732    .265971     1.79   0.073    -.0445203    .9980667
           1.enabler |   .9194111   1.059304     0.87   0.385    -1.156787    2.995609
                     |
        mass#enabler |
                1 1  |  -.1841672   1.146962    -0.16   0.872    -2.432172    2.063837
                     |
       territory_new |  -1.148464   .3092765    -3.71   0.000    -1.754634   -.5422928
         funding_new |  -.1606015   .2711115    -0.59   0.554    -.6919703    .3707674
           joint_new |   .2150985   .2796229     0.77   0.442    -.3329523    .7631492
  troop_presence_new |   .6038305   .6944154     0.87   0.385    -.7571987     1.96486
           other_new |  -.3278876   .5079454    -0.65   0.519    -1.323442    .6676671
         unknown_new |  -1.189378   .9147618    -1.30   0.194    -2.982278    .6035224
       ActorStrength |  -.0178336   .1069252    -0.17   0.868    -.2274031    .1917359
MilitaryIntervention |   -1.08499   .5843243    -1.86   0.063    -2.230245    .0602646
               _cons |   -.686115   .1488068    -4.61   0.000    -.9777711    -.394459
--------------------------------------------------------------------------------------

Code:

Expression: Pr(Victory), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        mass |
          0  |   .2961328   .0215521    13.74   0.000     .2538914    .3383742
          1  |   .3964986   .0500642     7.92   0.000     .2983745    .4946227
             |
     enabler |
          0  |   .3090185   .0178961    17.27   0.000     .2739427    .3440942
          1  |   .5073895   .1943466     2.61   0.009     .1264773    .8883018
             |
enabler#mass |
        0 0  |   .2882189   .0202234    14.25   0.000     .2485818     .327856
        0 1  |   .3896768   .0511843     7.61   0.000     .2893574    .4899961
        1 0  |   .4920958   .2455064     2.00   0.045     .0109121    .9732795
        1 1  |   .5600116   .1074918     5.21   0.000     .3493315    .7706916
------------------------------------------------------------------------------

Code:

Expression: Pr(Victory), predict()
dy/dx wrt:  1.mass
1._at: enabler = 0
2._at: enabler = 1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.mass       |  (base outcome)
-------------+----------------------------------------------------------------
1.mass       |
         _at |
          1  |   .1014579   .0579847     1.75   0.080      -.01219    .2151057
          2  |   .0679158   .2619651     0.26   0.795    -.4455264     .581358
------------------------------------------------------------------------------

For #,

Code:

logit Victory mass#enabler territory_new funding_new joint_new troop_presence_new other_new unknown_new ActorStrength MilitaryIntervention 

margins mass enabler enabler#mass

margins, dydx(mass) at(enabler = (0 1))

These are the results:

Code:

Logistic regression                                     Number of obs =    674
                                                        LR chi2(11)   =  34.67
                                                        Prob > chi2   = 0.0003
Log likelihood = -403.13236                             Pseudo R2     = 0.0412

--------------------------------------------------------------------------------------
             Victory | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
        mass#enabler |
                0 1  |   .9194111   1.059304     0.87   0.385    -1.156787    2.995609
                1 0  |   .4767732    .265971     1.79   0.073    -.0445203    .9980667
                1 1  |   1.212017   .4961716     2.44   0.015     .2395387    2.184496
                     |
       territory_new |  -1.148464   .3092765    -3.71   0.000    -1.754634   -.5422928
         funding_new |  -.1606015   .2711115    -0.59   0.554    -.6919703    .3707674
           joint_new |   .2150985   .2796229     0.77   0.442    -.3329523    .7631492
  troop_presence_new |   .6038305   .6944154     0.87   0.385    -.7571987     1.96486
           other_new |  -.3278876   .5079454    -0.65   0.519    -1.323442    .6676671
         unknown_new |  -1.189378   .9147618    -1.30   0.194    -2.982278    .6035224
       ActorStrength |  -.0178336   .1069252    -0.17   0.868    -.2274031    .1917359
MilitaryIntervention |   -1.08499   .5843243    -1.86   0.063    -2.230245    .0602646
               _cons |   -.686115   .1488068    -4.61   0.000    -.9777711    -.394459
--------------------------------------------------------------------------------------

Code:

Expression: Pr(Victory), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        mass |
          0  |   .2961328   .0215521    13.74   0.000     .2538914    .3383742
          1  |   .3964986   .0500642     7.92   0.000     .2983745    .4946227
             |
     enabler |
          0  |   .3090185   .0178961    17.27   0.000     .2739427    .3440942
          1  |   .5073895   .1943466     2.61   0.009     .1264773    .8883018
             |
enabler#mass |
        0 0  |   .2882189   .0202234    14.25   0.000     .2485818     .327856
        0 1  |   .3896768   .0511843     7.61   0.000     .2893574    .4899961
        1 0  |   .4920958   .2455064     2.00   0.045     .0109121    .9732795
        1 1  |   .5600116   .1074918     5.21   0.000     .3493315    .7706916
------------------------------------------------------------------------------

Code:

Expression: Pr(Victory), predict()
dy/dx wrt:  1.mass
1._at: enabler = 0
2._at: enabler = 1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.mass       |  (base outcome)
-------------+----------------------------------------------------------------
1.mass       |
         _at |
          1  |   .1014579   .0579847     1.75   0.080      -.01219    .2151057
          2  |   .0679158   .2619651     0.26   0.795    -.4455264     .581358
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

The likelihood of Victory have opposite directions for the interaction term in each model with large differences in statistical significance, though I have read your convincing arguments on other posts that signficance does not actually mean much. However, predicted margins and marginal effects remain similar. What do you think is the reason for these very different results in the logit model and is either model more valid than the other? You mention that # models may run into trouble when dealing with quadratics or higher but admittedly I am not sure if this logit model fits that case.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

12 Apr 2023, 09:19

The -margins- outputs are not merely similar. They are identical. Notice also that except for the mass, enabler, and mass#enabler results, the other coefficients are also the same in both regressions. And that is because the two logistic regressions are just different ways of parameterizing the same model.

The logistic regression outputs are algebraic transforms of each other. The coefficients that are called "mass#enabler" in both models are, in fact, very different things and should not be compared to each other. In the # model, each of the mass#enabler coefficients represents the difference between the log odds of Victory in the designated combination of mass and enabler and the log odds of victory in the omitted combination where mass = 0 and enabler = 0. The log odds of victory in mass = 0 and enabler = 0 is seen instead in the constant term. In the ## model, it works differently. The log odds of victory relative to mass = 0 and enabler = 0 when mass = m and enabler = e is given by the sum of the coefficients m.mass + e.enabler + m.mass#e.enabler (where 0.mass and 0.enabler, the omitted categories, are interpreted as 0.) For example if, looking at the ## model, you add up .4767732, .9194111, and -.1841672, you get 1.212017, which is, indeed, the coefficient of 1.mass#1.enabler in the # model. All the other combinations work that way as well.

If your research question is whether or not mass and enabler interact, then the ## model is easier to use because the 1.mass#1.enabler coefficient by itself expresses the quantitative interaction effect. But if that is not part of the question and you are just estimating the relative effects of different combinations of mass and enabler, then those are given more transparently in the # model. Of course, either model can serve both purposes if you do the appropriate algebra--but that's tedious and error prone. So pick the model that gives the direct answer to your question. If you are asking both questions, there is no harm in using both, applying each to its appropriate question.

Last edited by Clyde Schechter; 12 Apr 2023, 09:21.
1 like
Comment
John Linton

Join Date: Apr 2023

Posts: 3
#5

12 Apr 2023, 21:06

Thank you for your response. It definitely cleared up the difference. However, I would like to clarify my conceptualization of my variables.

My layman's understanding of the interaction in the ## model is that the negative coefficient of 1.mass#1.enabler means it underperforms compared to the sum of its parts (mass and enabler). However, I do not conceptually expect enabler to modify the relative effect of mass to that degree. Whereas mass is a tangible object (like equipment), enabler captures the intangible human qualities (like being well trained) that should improve the effectiveness of mass to some degree when the two are combined, indicated by log odds of victory. If my understanding is correct, the # model better captures this concept.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

12 Apr 2023, 22:41

If my understanding is correct, the # model better captures this concept.

No. The two models are simply different views of looking at the entirety. Everything that is captured in either is captured in the other--it's just expressed differently and you need to know how to translate from one to the other. The "underperformance" is easier to see in the ## model, but it is just as present in the # model, if you know where to look.

Looking at the # model, the "sum of the parts" is .9194111 + .4767732 which is 1.3961843. However, the joint effect is the 1 1 mass#enabler coefficient: 1.212017 which is, indeed, an "underperformance." In fact, it is an underperformance by exactly 0.1841673, which is, behold!, precisely the amount by which the ## model's 1.mass#1.enabler coefficient says it is an underperformance. (OK, they differ in the 7th decimal place--that is due to rounding errors.) # and ## say the same thing about it.

These results do not support your expectations that mass and enabler would be synergistic. That said, I would also note that the confidence interval (## model) around this estimate of the underperformance is very wide. Not only does it cross zero, it extends far into positive territory and also far into negative territory (-2.432172 to 2.063837). The appropriate conclusion here is that this data and model, are simply uninformative about the interaction between mass and enabler, even as to whether it is synergy or interference, and whether it is large or small. This is a textbook case of an inconclusive study, at least as far as the interaction between mass and enabler is concerned.

Last edited by Clyde Schechter; 12 Apr 2023, 22:44.
Comment

Announcement