Comparing coefficients in all-terms-interacted model

Meng Yu

Join Date: Feb 2018

Posts: 169
#1

Comparing coefficients in all-terms-interacted model

01 Mar 2018, 00:08

When the coefficient of a variable in Group A is significant, but the same coefficient in Group B is not, can they be compared? If not, when interpreting the result in Group A, is the reference group for this variable its own reference group? I am asking because the interaction effects is incremental to the main effects in a model with one interaction term. Thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#2

01 Mar 2018, 00:29

The difference between statistically significant and not statistically significant is, itself, not statistically significant.

The observation that one thing is statistically significant and the other is not is about as uninformative an observation as you can ask for because statistical significance confounds so many different things into a single muddy statistic.

If you have carried out two separate analyses, one on group A and one on group B, it may or may not be possible to put them together. Some kinds of analyses can be combined using the -suest- command, but many types of analyses are not compatible with -suest-.

An approach that is more generally applicable is to carry out a single analysis on the combined groups and include a group#variable interaction term. If your goal is to determine whether the difference in coefficients between groups A and B is statistically significant, the output for the coefficient of group#variable settles the question. This approach also enables you to use the -margins- command to estimate the predicted outcomes in both groups (at various levels of he variable) and to calculate both group-specific and averaged marginal effects of model variables. Another advantage of this approach is that you have the freedom to decide which variables you want to constrain to be equal across groups, and which you wish to vary freely and examine. For all of these reasons, I usually lean towards this approach.

Added: I want to emphasize that for the model to be correct, it must include the group term, the other variable term, and the interaction. All three must be present (although one or more is sometimes dropped by Stata due to colinearity with other variables in the model.) So the model must look like -regress y i.group i.x i.group#i.x-, or, more simply, -regress y i.group##i.x-. But -regress y i.group#i.x- is just a mis-specified model whose results are uninterpretable (unless other variables in the model are colinear with group and x causing Stata to drop them). If the latter is what you mean by an "all terms interacted" model then you have meaningless coefficients and there is nothing to compare.

If not, when interpreting the result in Group A, is the reference group for this variable its own reference group?

This question doesn't make any sense to me. Perhaps if you showed the code and the results you are referring to it might be possible to find an interpretation for it. Look, if you have done an analysis on group A alone, it does not contain a group variable in the model. Even if you try to put a group variable into that model, with every observation in group A, it will be colinear with the constant and it will automatically be omitted. So there is no reference group in a model estimated on just one group. It takes two groups to support a group variable and a reference group.

Or perhaps you are talking about the combined estimation with interaction term. So, if we have just two groups, A and B, there will be a single variable that indicates which observations fall into which group. For the sake of concreteness, let's say that this variable is 0 for group A observations and 1 for group B observations. Important: there is just one group variable for two groups. If there are n groups, there are n-1 variables. The number of variables is always one less than the number of groups. The reference group is the group having no variable taking on a one variable. It is the group for which all of the grouop indicators are zero. In the groups A and B instance here, group A is the reference group. And it is important to remember that you do not have separate variables for groups A and B. You have a single variable that distinguishes two groups by virtue of its two values.

I am asking because the interaction effects is incremental to the main effects in a model with one interaction term.

Again, I don't understand what, if anything, you're asking here. A model that interacts two dichotomous variables, call them X1 and X2, and, for concreteness, assume that each of them is coded 0/1, is interpreted as follows:

The model contains terms X1, X2, and X1#X2. (More conveniently, you can code all three of these as just X1##X2)

The coefficient of X1 represents the effect of X1 conditional on X2 = 0.
The coefficient of X2 represents the effect of X2 conditional on X1 = 0.

The effect of X1 conditional on X2 = 1 is given by the coefficient of X1 + the coefficient of X1#X2.
The effect of X2 conditional on X1 = 1 is given by the coefficient of X2 + the coefficient of X1#X2.

These latter effects can be calculated directly using the -lincom- command, or, I think it is easier to just get everything all at once with -margins X1#X2, dydx(X1 X2)-. (If there are other variables in the model, and the model is non-linear, this approach has the advantage of calculating the marginal effects adjusted for the other variables, which -lincom- cannot do by itself.)

Added: Also, if variable X1, say, is interacted not only with X2, but also with another variable X3, then using -lincom- to calculate marginal effects of X1 gets complicated and requires more terms. The -lincom- approach quickly gets confusing, and even when you understand it clearly, the expressions involved are long and difficult to type without introducing errors. So the -margins- approach is far superior here because it can be used with minimal effort and spares the confusion and errors.

Added: If you are not familiar with the -margins- command, you owe it to yourself to learn about it now. I think the best place to start is with the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It covers the basics, and it has worked examples that include interaction models. From there you can go to the -margins- chapter of the PDF manuals that are part of your Stata installation for some of the more advanced (and not often used) features.

Last edited by Clyde Schechter; 01 Mar 2018, 00:48.
1 like
Comment
Meng Yu

Join Date: Feb 2018

Posts: 169
#3

01 Mar 2018, 18:34

Thank you very much Dr. Schechter for your reply.

The difference between statistically significant and not statistically significant is, itself, not statistically significant.

I understand now. It is not meaningful to compare two coefficients if one is or both are not significant.

All item interacted model refers to what I learned from this thread https://www.statalist.org/forums/for...dent-variables
The syntax of a simplified version of my all-item-interacted model is :

Code:

xtlogit i.health (i.frequency i.sex. c.income)#i.sex

Is this correct?
For the variable frequency, I have four categories, with category 1 being the reference group. For men, category 2 is significant with an odds ratio of 1.35. When I interpret it, can I say "Compared with men's reference group, men in category 2 is 1.35 times more likely to experience health issues."?

For the rest of your reply, I will need to spend some more time to understand it. But thanks for the advice of using marginal effects. I will read further of it.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#4

02 Mar 2018, 09:49

I understand now. It is not meaningful to compare two coefficients if one is or both are not significant.

No, that is not at all what I mean. It is not meaningful to draw any conclusions from the fact that one coefficient is statistically significant and the other is not.

Whether coefficients are statistically significant or not has no bearing at all on how one might compare them. The problem of comparing coefficients is a difficult one and depends heavily on the particular context and the meaning of the variables. It is sometimes meaningful to do and sometimes not, but statistical significance has nothing at all to do with it.

Code:

xtlogit i.health (i.frequency i.sex. c.income)#i.sex

is probably a mis-specified model. You shouldn't do anything with its results. It is missing the main effects of frequency, sex, and income. Unless those variables are constant within your -xtset- groups, they have to be there. Even if they are constant within your -xtset- groups, for clarity about what you are doing, it is better to have them there and let Stata omit them. People make mistakes about this sort of thing, or sometimes have incorrect expectations about what is going on in their data. Stata always gets it right, and if the results are surprising, then it calls attention to the problems in the data. Anyway, the fix is easy. Just replace # with ## (which, by the way, is the same point made in the thread you linked to.)
Comment
Meng Yu

Join Date: Feb 2018

Posts: 169
#5

02 Mar 2018, 16:57

Thank you again Dr. Schechter for your reply. I tried

Code:

xtlogit i.health (i.frequency i.sex c.income)##i.sex

, however, the regression results only have main effects and those of females. I also tried the model

Code:

xtlogit i.health i.frequency i.sex c.income i.frequency#i.sex

, the results also only produce the main effects and those of the females. I wonder what the problem could be.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#6

02 Mar 2018, 18:39

Please show your output. I do not understand your description of what you are getting.
Comment
Meng Yu

Join Date: Feb 2018

Posts: 169
#7

02 Mar 2018, 22:58

The output is something like this: (Because I am using confidential data, having results released takes a while and they may not be released.)
No participation 1.00
(0.00)
Every day -0.5
Once a week 1.5

men 1.00
(0.00)
women 2.1

no participation#female 1.00
(0.00)
every day#female 1.1
once a week#female 2.0
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29814

02 Mar 2018, 23:31

The results shown in #7 are too incomplete to provide the information needed to answer your questions. I would need to see the complete output of -xtlogit-, not passed through any -estout- or -esttab- or -outreg-, and unabridged.

Also the model that has #, not ##, is invalid and you should stop trying to do that.

That said, I will speculate on what you mean when you say you only have estimates of females. Your interaction term outputs are labeled in what you show as no participation#female, every day #female, and once a week#female. It sounds to me like you are expecting to also see outputs labeled every day#male, etc. But that is not how interaction terms work, nor how indicator variables work.

Let's take a simple example using the built-in auto.dta

Code:

. sysuse auto, clear
(1978 Automobile Data)

. regress mpg i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     13.18
       Model |  378.153515         1  378.153515   Prob > F        =    0.0005
    Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
-------------+----------------------------------   Adj R-squared   =    0.1430
       Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
       _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
------------------------------------------------------------------------------

. tab foreign

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

From the -tab foreign- output we see that foreign is a dichotomous variable with values labeled Domestic and Foreign. But in the regression output there is only a Foreign line, no Domestic line. That's how it's supposed to be. Domestic is the reference category, and its "coefficient" is constrained to be zero. If we complicated things by adding an interaction term:

Code:

 
. regress mpg i.foreign##c.weight

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =     51.99
       Model |  1686.54824         3  562.182746   Prob > F        =    0.0000
    Residual |  756.911221        70  10.8130174   R-squared       =    0.6902
-------------+----------------------------------   Adj R-squared   =    0.6770
       Total |  2443.45946        73  33.4720474   Root MSE        =    3.2883

----------------------------------------------------------------------------------
             mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
         foreign |
        Foreign  |   9.271333   4.500409     2.06   0.043     .2955505    18.24711
          weight |  -.0059751   .0006622    -9.02   0.000    -.0072958   -.0046544
                 |
foreign#c.weight |
        Foreign  |  -.0044509   .0017846    -2.49   0.015    -.0080101   -.0008916
                 |
           _cons |   39.64696   2.243364    17.67   0.000     35.17272    44.12121
----------------------------------------------------------------------------------

the same thing happens with the interaction: the reference category is omitted.

So you are probably wondering, how will you find out what's going on in the reference category (male in your case). That is where the -margins- command comes in.

Code:

. margins foreign, at(weight = (2000 3000 4000))

Adjusted predictions                            Number of obs     =         74
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : weight          =        2000

2._at        : weight          =        3000

3._at        : weight          =        4000

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _at#foreign |
 1#Domestic  |    27.6968   .9841848    28.14   0.000      25.7339    29.65969
  1#Foreign  |   28.06638   .8749718    32.08   0.000     26.32131    29.81146
 2#Domestic  |   21.72171   .5020333    43.27   0.000     20.72044    22.72299
  2#Foreign  |   17.64042   1.332931    13.23   0.000     14.98198    20.29887
 3#Domestic  |   15.74663   .6422001    24.52   0.000      14.4658    17.02746
  3#Foreign  |   7.214467   2.877568     2.51   0.014     1.475338     12.9536
------------------------------------------------------------------------------

As you can see, the -margins- command gives you results for both the domestic and foreign predicted values.

Similar things will happen if you follow your -xtlogit- command with corresponding -margins- commands. Now, your margin involves a number of different variables, and I don't know what your research questions are, so I can't advise you what specific -margins- command to use: it depends on what predicted outcomes or marginal effects you need.

Recommended reading: the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It is a very clear explanation of how the -margins- command works and has several worked examples, including some interaction models.

Comment

Meng Yu

Join Date: Feb 2018

Posts: 169
#9

03 Mar 2018, 01:20

Thanks again for the detailed explanation. I think

Code:

xtlogit i.health i.frequency i.sex c.income i.frequency#i.sex

is correct, although I am only using one #.

I will read Dr. Williams' article on margins.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#10

03 Mar 2018, 10:22

Yes, that one's OK because in addition to i.frequency#i.sex it also has i.frequency and i.sex. (i.frequency##i.sex is a "shorthand" for the combination of the three. And you will get identical results either way.)
Comment
Meng Yu

Join Date: Feb 2018

Posts: 169
#11

05 Mar 2018, 21:08

That said, I will speculate on what you mean when you say you only have estimates of females. Your interaction term outputs are labeled in what you show as no participation#female, every day #female, and once a week#female. It sounds to me like you are expecting to also see outputs labeled every day#male, etc.

So when I interpret the results, is the reference category for "no participation#female" "no participation#male", and for "every day#female" the reference is "every day#male", etc.?
Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#12

05 Mar 2018, 21:15

So when I interpret the results, is the reference category for "no participation#female" "no participation#male", and for "every day#female" the reference is "every day#male", etc.?

Something like that.

I don't really understand the "output" you show in #7, as it is unlabeled. But it suggests that the reference category for frequency is no participation, and that for sex is men. In that case, the reference category for the participation#sex interaction would be no participation#men. But, actually you don't really need to worry about what the reference category for the interaction is, because you won't need to add interaction terms to each other. Interaction coefficients get added to main effect coefficients. And if you use the -margins- command as I have suggested, you don't have to give any thought to what gets added to what because Stata will figure it all out for you.
1 like
Comment
Meng Yu

Join Date: Feb 2018

Posts: 169
#13

07 Mar 2018, 00:03

I tried

Code:

margins, dydx (frequency) at (sex=(0 1))

,
the result shows something like this:
dydx w.r.t.: 1.frequency 2. frequency 3. frequency
1# male; 1# female
2#male; 2#female
3#male; 3#female

However, I have four categories in my frequency variable, with 0 indicating no participation. So now I lost one category for both men and women. What I get is a comparison between men and women in the other three categories. Is that correct?

I also tried

Code:

margins i.sex#i.frequency

, I wonder what the differences between these two commands are.
Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29814
#14

07 Mar 2018, 00:13

When you are taking the marginal effects (which is what the dydx() gives you), those are differences between outcomes. You have four categories of frequency, of which one is the reference (or base) category. So thee are only three other categories that are different from that. What you are getting is the expected difference in outcome of 1.frequency vs 0 frequency in men and in women, then 2.frequency vs 0 frequency in men and women, and 3.frequency vs 0 frequency in men and women. There is no marginal effect at 0.frequency because it is the base category, with reference to which all the other marginal effects are calculated.

Code:

margins i.sex#i.frequency

does not give you marginal effects. It gives you the expected outcome levels in all 8 combinations of sex and frequency.

If you have not already done so, please read https://www3.nd.edu/~rwilliam/stats/Margins01.pdf, the excellent Richard Williams' crystal clear explanation of the basics of the -margins- command. If you have done so, please read it again, more carefully this time. If you do not understand what marginal effects and predictive margins are and how they differ from each other, then you need to consult a basic statistics or econometrics textbook.
Comment
Meng Yu

Join Date: Feb 2018

Posts: 169
#15

09 Mar 2018, 00:40

I have read Dr. Williams' article and I think I understand what marginal effects are. They are similar to regression coefficients if the dependent variable is continuous.
Do you mind if I confirm with you my interpretation of marginal effects coefficients in a xtlogit model? Suppose the model is

Code:

xtlogit health frequency##sex

and the marginal effects code is

Code:

margins, dydx (frequency) at (sex=(0 1))

Suppose the results are:
------------------------------------------------------------------------------
| Delta-method
| dy/dx P>|z|
-------------+----------------------------------------------------------------
1. freq |
_at |
male | 0.02 0.000
female | 0.03 0.000
-------------+----------------------------------------------------------------
2.freq |
_at |
male | 0.05 0.234
female | 0.10 0.000
-------------+----------------------------------------------------------------

My interpretation is: Compared to those who do not participate, the probability for male participants with frequency 1 to suffer from a health problem is 2 percentage points higher; whereas for women with this frequency it is 3 percentage points higher. For female participants with frequency 2, the probability is 10 percentage points higher, yet for male participants with this frequency, the result is not significant.

Thank you.

Last edited by Meng Yu; 09 Mar 2018, 00:43.
Comment

Announcement