Comparing coefficient across subgroups

paulvonhippel

Join Date: Apr 2014

Posts: 495
#1

Comparing coefficient across subgroups

29 Sep 2024, 09:16

It's common to fit the same model to different subgroups and then make inferences about whether coefficients are similar across subgroups. What's the best way to do this in Stata? Here's an example that regresses price on mpg, separately for foreign and domestic cars:

Code:

sysuse auto, clear reg price mpg if foreign reg price mpg if !foreign

How can I now test if the coefficient of mpg is similar for foreign and domestic cars?

I'm aware, by the way, that I can accomplish something similar by estimating an interaction between foreign and mpg, like this:

Code:

reg price c.mpg##i.foreign

But I'd like to know if Stata has a convenient way to compare the foreign and domestic regressions without an interaction.

Last edited by paulvonhippel; 29 Sep 2024, 09:31.
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#2

29 Sep 2024, 09:30

The -sureg- and -suest- would be your best bet for simplicity.

Code:

sysuse auto, clear reg price mpg if foreign est sto M1a reg price mpg if !foreign est sto M1b suest M1a M1b

The interaction approach fails to allow residual error variances to differ, so not quite the same, as you alluded to. The other possibility is to put all of this into an SEM framework, in which case, all manner of constraints can be tested.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 495
#3

29 Sep 2024, 11:55

Thanks! That's a great answer to the question I asked, but the question I should have asked involves -xtreg-, and unfortunately -suest- doesn't support -xtreg-. Here's an example where I try to ask if the relationship between age and log wages, controlling for education, is the same inside and outside the south

Code:

xtset idcode xtreg ln_w grade c.age if south, be est sto b_south xtreg ln_w grade c.age if !south, be est sto b_north suest b_south b_north

Error: xtreg is not supported by suest
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

29 Sep 2024, 12:35

What is the reluctance to use interactions? The convenience command is suest, but it is only available for estimators that support it. If your estimator is the between estimator, this represents a cross-sectional equation that can be estimated using regress.

Code:

webuse nlswork, clear
xtset idcode
xtreg ln_w grade c.age if south, be
xtreg ln_w grade c.age if !south, be


collapse ln_w grade age, by(south id)
regress ln_w grade c.age if south
regress ln_w grade c.age if !south

Res.:

Code:

. xtreg ln_w grade c.age if south, be

Between regression (regression on group means)  Number of obs     =     11,675
Group variable: idcode                          Number of groups  =      2,151

R-squared:                                      Obs per group:
     Within  = 0.1054                                         min =          1
     Between = 0.3542                                         avg =        5.4
     Overall = 0.2504                                         max =         15

                                                F(2,2148)         =     588.99
sd(u_i + avg(e_i.)) = .3407131                  Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       grade |   .0840141   .0028702    29.27   0.000     .0783854    .0896428
         age |   .0113972   .0013538     8.42   0.000     .0087422    .0140522
       _cons |   .1801492   .0452635     3.98   0.000     .0913843    .2689141
------------------------------------------------------------------------------

. 
. xtreg ln_w grade c.age if !south, be

Between regression (regression on group means)  Number of obs     =     16,833
Group variable: idcode                          Number of groups  =      3,101

R-squared:                                      Obs per group:
     Within  = 0.0974                                         min =          1
     Between = 0.2745                                         avg =        5.4
     Overall = 0.2078                                         max =         15

                                                F(2,3098)         =     586.18
sd(u_i + avg(e_i.)) = .3662916                  Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       grade |   .0787267   .0028531    27.59   0.000     .0731325     .084321
         age |   .0127707    .001221    10.46   0.000     .0103766    .0151649
       _cons |   .3284773   .0425135     7.73   0.000     .2451199    .4118348
------------------------------------------------------------------------------

. 
. 
. 
. 
. 
. collapse ln_w grade age, by(south id)

. 
. regress ln_w grade c.age if south

      Source |       SS           df       MS      Number of obs   =     2,153
-------------+----------------------------------   F(2, 2150)      =    588.37
       Model |  136.735823         2  68.3679116   Prob > F        =    0.0000
    Residual |  249.827047     2,150  .116198626   R-squared       =    0.3537
-------------+----------------------------------   Adj R-squared   =    0.3531
       Total |   386.56287     2,152  .179629586   Root MSE        =    .34088

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       grade |   .0840105   .0028711    29.26   0.000       .07838    .0896409
         age |   .0113903   .0013535     8.42   0.000      .008736    .0140447
       _cons |   .1805665   .0452719     3.99   0.000     .0917851    .2693479
------------------------------------------------------------------------------

. 
. regress ln_w grade c.age if !south

      Source |       SS           df       MS      Number of obs   =     3,101
-------------+----------------------------------   F(2, 3098)      =    586.14
       Model |  157.290459         2  78.6452297   Prob > F        =    0.0000
    Residual |  415.675854     3,098   .13417555   R-squared       =    0.2745
-------------+----------------------------------   Adj R-squared   =    0.2741
       Total |  572.966313     3,100  .184827843   Root MSE        =     .3663

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       grade |   .0787344   .0028532    27.60   0.000     .0731401    .0843288
         age |   .0127645   .0012211    10.45   0.000     .0103703    .0151587
       _cons |     .32857   .0425144     7.73   0.000     .2452107    .4119293
------------------------------------------------------------------------------

.

Comment

paulvonhippel

Join Date: Apr 2014

Posts: 495
#5

29 Sep 2024, 14:47

I typically use interactions, but some people prefer a different approach and I wanted to see how it was done. From what I've seen so far, it looks more complicated, at least in Stata, than just tossing an interaction into the model.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#6

29 Sep 2024, 15:00

I suppose of subgroups can be regarded is independent, then one could compare coefficients across subgroups using, essentially, a t-test. -suest-applies a kind of sandwich estimator to approximate the covariance matrix which you would not have, but it could be a reasonable first approximation.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 495
#7

29 Sep 2024, 16:17

Yes. If the covariance between regression estimates is small enough, it can be neglected. But how does suest estimate the covariance if you run the regressions separately, as you did?

Code:

sysuse auto, clear reg price mpg if foreign est sto M1a reg price mpg if !foreign est sto M1b suest M1a M1b

Last edited by paulvonhippel; 29 Sep 2024, 16:19.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#8

29 Sep 2024, 16:23

The details are the manual for -suest- under Methods and Formulas. The essence seems t be to convert the data to a stacked dataset and re-estimate the models using a cluster-robust sandwich estimator.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#9

30 Sep 2024, 09:16

What about writing a short program to feed to -bootstrap- or -permute-? As a sketch, I'm thinking of something like this:

Code:

prog comp, rclass xtreg .... if south == 1 local b = _b[something] xtreg ... if south != 1 return scalar diffsomething = _b[something] - `b' end // bootstrap diff = r(diffsomething), reps(1000): comp

I haven't tried something like this, so perhaps there might be issues with how -bootstrap- works with if qualifiers, or perhaps there is some logical difficulty with what I'm suggesting, but this approach should avoid any problems with what estimation commands -suest- supports.
1 like
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 495
#10

30 Sep 2024, 20:28

Thanks Mike. Yes, I think the bootstrap should work!

A couple of things strike me about this conversation:
It seems harder, at least in Stata, to compare coefficients across groups than it is to run a group-by-treatment interaction. You can add a group-by-treatment interaction to just about any model, but comparing coefficients seems to require custom coding for some models.

I haven't found an article that compares the two approaches. They don't make exactly the same assumptions, but I suspect they yield similar results most of the time. It surprises me if no one has looked into this.

By the way, my question concerns linear models. With nonlinear models like logistic regression, comparing coefficients across groups is much trickier, no matter how it is done.

Best,
Paul
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#11

30 Sep 2024, 21:00

Paul--Post back with some results if you go ahead with this approach. I'd be curious to hear whether it produces sensible results for you.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#12

30 Sep 2024, 21:45

Originally posted by paulvonhippel View Post

Thanks Mike. Yes, I think the bootstrap should work!

A couple of things strike me about this conversation:
It seems harder, at least in Stata, to compare coefficients across groups than it is to run a group-by-treatment interaction. You can add a group-by-treatment interaction to just about any model, but comparing coefficients seems to require custom coding for some models.

I haven't found an article that compares the two approaches. They don't make exactly the same assumptions, but I suspect they yield similar results most of the time. It surprises me if no one has looked into this.

By the way, my question concerns linear models. With nonlinear models like logistic regression, comparing coefficients across groups is much trickier, no matter how it is done.

Best,
Paul

I would be surprised if this hasn't been investigated in the SEM literature, though it probably goes by different names like model or group invariance.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#13

21 Oct 2024, 03:03

Originally posted by Leonardo Guizzetti View Post

The -sureg- and -suest- would be your best bet for simplicity.

Code:

sysuse auto, clear reg price mpg if foreign est sto M1a reg price mpg if !foreign est sto M1b suest M1a M1b

The interaction approach fails to allow residual error variances to differ, so not quite the same, as you alluded to. The other possibility is to put all of this into an SEM framework, in which case, all manner of constraints can be tested.

Interesting! In what scenarios would the suest approach be, theoretically, superior to the interaction effect? Can one test this before running the analysis to decide whether to use interactions or suest?

Best wishes

(Stata 16.1 MP)
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 495
#14

21 Oct 2024, 08:01

Leonardo Guizzetti was right that my original interaction model assumed equal residual variances across the two groups:

Code:

sysuse auto, clear reg price c.mpg##i.foreign

However, that assumption can be relaxed by using heteroskedascity-consistent standard errors, like this:

Code:

sysuse auto, clear reg price c.mpg##i.foreign,hc3
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 495
#15

21 Oct 2024, 08:04

Leonardo Guizzetti
I'm not sure I understand how suest solves the problem. Where in the output does suest test whether the coefficients of the two regressions are equal?

Code:

sysuse auto, clear reg price mpg if foreign est sto M1a reg price mpg if !foreign est sto M1b suest M1a M1b
Comment

Announcement

Comparing coefficient across subgroups

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment