Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing coefficient across subgroups

    It's common to fit the same model to different subgroups and then make inferences about whether coefficients are similar across subgroups. What's the best way to do this in Stata? Here's an example that regresses price on mpg, separately for foreign and domestic cars:
    Code:
    sysuse auto, clear
    reg price mpg if foreign
    reg price mpg if !foreign
    How can I now test if the coefficient of mpg is similar for foreign and domestic cars?

    I'm aware, by the way, that I can accomplish something similar by estimating an interaction between foreign and mpg, like this:
    Code:
    reg price c.mpg##i.foreign
    But I'd like to know if Stata has a convenient way to compare the foreign and domestic regressions without an interaction.
    Last edited by paulvonhippel; 29 Sep 2024, 08:31.

  • #2
    The -sureg- and -suest- would be your best bet for simplicity.

    Code:
    sysuse auto, clear
    reg price mpg if foreign
    est sto M1a
    reg price mpg if !foreign
    est sto M1b
    
    suest M1a M1b
    The interaction approach fails to allow residual error variances to differ, so not quite the same, as you alluded to. The other possibility is to put all of this into an SEM framework, in which case, all manner of constraints can be tested.

    Comment


    • #3
      Thanks! That's a great answer to the question I asked, but the question I should have asked involves -xtreg-, and unfortunately -suest- doesn't support -xtreg-. Here's an example where I try to ask if the relationship between age and log wages, controlling for education, is the same inside and outside the south
      Code:
      xtset idcode
      xtreg ln_w grade c.age if south, be
      est sto b_south
      xtreg ln_w grade c.age if !south, be
      est sto b_north
      suest b_south b_north
      Error: xtreg is not supported by suest

      Comment


      • #4
        What is the reluctance to use interactions? The convenience command is suest, but it is only available for estimators that support it. If your estimator is the between estimator, this represents a cross-sectional equation that can be estimated using regress.

        Code:
        webuse nlswork, clear
        xtset idcode
        xtreg ln_w grade c.age if south, be
        xtreg ln_w grade c.age if !south, be
        
        
        collapse ln_w grade age, by(south id)
        regress ln_w grade c.age if south
        regress ln_w grade c.age if !south

        Res.:

        Code:
        . xtreg ln_w grade c.age if south, be
        
        Between regression (regression on group means)  Number of obs     =     11,675
        Group variable: idcode                          Number of groups  =      2,151
        
        R-squared:                                      Obs per group:
             Within  = 0.1054                                         min =          1
             Between = 0.3542                                         avg =        5.4
             Overall = 0.2504                                         max =         15
        
                                                        F(2,2148)         =     588.99
        sd(u_i + avg(e_i.)) = .3407131                  Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               grade |   .0840141   .0028702    29.27   0.000     .0783854    .0896428
                 age |   .0113972   .0013538     8.42   0.000     .0087422    .0140522
               _cons |   .1801492   .0452635     3.98   0.000     .0913843    .2689141
        ------------------------------------------------------------------------------
        
        . 
        . xtreg ln_w grade c.age if !south, be
        
        Between regression (regression on group means)  Number of obs     =     16,833
        Group variable: idcode                          Number of groups  =      3,101
        
        R-squared:                                      Obs per group:
             Within  = 0.0974                                         min =          1
             Between = 0.2745                                         avg =        5.4
             Overall = 0.2078                                         max =         15
        
                                                        F(2,3098)         =     586.18
        sd(u_i + avg(e_i.)) = .3662916                  Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               grade |   .0787267   .0028531    27.59   0.000     .0731325     .084321
                 age |   .0127707    .001221    10.46   0.000     .0103766    .0151649
               _cons |   .3284773   .0425135     7.73   0.000     .2451199    .4118348
        ------------------------------------------------------------------------------
        
        . 
        . 
        . 
        . 
        . 
        . collapse ln_w grade age, by(south id)
        
        . 
        . regress ln_w grade c.age if south
        
              Source |       SS           df       MS      Number of obs   =     2,153
        -------------+----------------------------------   F(2, 2150)      =    588.37
               Model |  136.735823         2  68.3679116   Prob > F        =    0.0000
            Residual |  249.827047     2,150  .116198626   R-squared       =    0.3537
        -------------+----------------------------------   Adj R-squared   =    0.3531
               Total |   386.56287     2,152  .179629586   Root MSE        =    .34088
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               grade |   .0840105   .0028711    29.26   0.000       .07838    .0896409
                 age |   .0113903   .0013535     8.42   0.000      .008736    .0140447
               _cons |   .1805665   .0452719     3.99   0.000     .0917851    .2693479
        ------------------------------------------------------------------------------
        
        . 
        . regress ln_w grade c.age if !south
        
              Source |       SS           df       MS      Number of obs   =     3,101
        -------------+----------------------------------   F(2, 3098)      =    586.14
               Model |  157.290459         2  78.6452297   Prob > F        =    0.0000
            Residual |  415.675854     3,098   .13417555   R-squared       =    0.2745
        -------------+----------------------------------   Adj R-squared   =    0.2741
               Total |  572.966313     3,100  .184827843   Root MSE        =     .3663
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               grade |   .0787344   .0028532    27.60   0.000     .0731401    .0843288
                 age |   .0127645   .0012211    10.45   0.000     .0103703    .0151587
               _cons |     .32857   .0425144     7.73   0.000     .2452107    .4119293
        ------------------------------------------------------------------------------
        
        .

        Comment


        • #5
          I typically use interactions, but some people prefer a different approach and I wanted to see how it was done. From what I've seen so far, it looks more complicated, at least in Stata, than just tossing an interaction into the model.

          Comment


          • #6
            I suppose of subgroups can be regarded is independent, then one could compare coefficients across subgroups using, essentially, a t-test. -suest-applies a kind of sandwich estimator to approximate the covariance matrix which you would not have, but it could be a reasonable first approximation.

            Comment


            • #7
              Yes. If the covariance between regression estimates is small enough, it can be neglected. But how does suest estimate the covariance if you run the regressions separately, as you did?
              Code:
              sysuse auto, clear
              reg price mpg if foreign
              est sto M1a
              reg price mpg if !foreign
              est sto M1b  
              suest M1a M1b
              Last edited by paulvonhippel; 29 Sep 2024, 15:19.

              Comment


              • #8
                The details are the manual for -suest- under Methods and Formulas. The essence seems t be to convert the data to a stacked dataset and re-estimate the models using a cluster-robust sandwich estimator.

                Comment


                • #9
                  What about writing a short program to feed to -bootstrap- or -permute-? As a sketch, I'm thinking of something like this:
                  Code:
                  prog comp, rclass
                  xtreg .... if south == 1
                  local b = _b[something]
                  xtreg ... if south != 1
                  return scalar diffsomething = _b[something] - `b'
                  end
                  //
                  bootstrap diff = r(diffsomething), reps(1000): comp
                  I haven't tried something like this, so perhaps there might be issues with how -bootstrap- works with if qualifiers, or perhaps there is some logical difficulty with what I'm suggesting, but this approach should avoid any problems with what estimation commands -suest- supports.

                  Comment


                  • #10
                    Thanks Mike. Yes, I think the bootstrap should work!

                    A couple of things strike me about this conversation:
                    • It seems harder, at least in Stata, to compare coefficients across groups than it is to run a group-by-treatment interaction. You can add a group-by-treatment interaction to just about any model, but comparing coefficients seems to require custom coding for some models.
                    • I haven't found an article that compares the two approaches. They don't make exactly the same assumptions, but I suspect they yield similar results most of the time. It surprises me if no one has looked into this.
                    By the way, my question concerns linear models. With nonlinear models like logistic regression, comparing coefficients across groups is much trickier, no matter how it is done.

                    Best,
                    Paul

                    Comment


                    • #11
                      Paul--Post back with some results if you go ahead with this approach. I'd be curious to hear whether it produces sensible results for you.

                      Comment


                      • #12
                        Originally posted by paulvonhippel View Post
                        Thanks Mike. Yes, I think the bootstrap should work!

                        A couple of things strike me about this conversation:
                        • It seems harder, at least in Stata, to compare coefficients across groups than it is to run a group-by-treatment interaction. You can add a group-by-treatment interaction to just about any model, but comparing coefficients seems to require custom coding for some models.
                        • I haven't found an article that compares the two approaches. They don't make exactly the same assumptions, but I suspect they yield similar results most of the time. It surprises me if no one has looked into this.

                        By the way, my question concerns linear models. With nonlinear models like logistic regression, comparing coefficients across groups is much trickier, no matter how it is done.

                        Best,
                        Paul
                        I would be surprised if this hasn't been investigated in the SEM literature, though it probably goes by different names like model or group invariance.

                        Comment


                        • #13
                          Originally posted by Leonardo Guizzetti View Post
                          The -sureg- and -suest- would be your best bet for simplicity.

                          Code:
                          sysuse auto, clear
                          reg price mpg if foreign
                          est sto M1a
                          reg price mpg if !foreign
                          est sto M1b
                          
                          suest M1a M1b
                          The interaction approach fails to allow residual error variances to differ, so not quite the same, as you alluded to. The other possibility is to put all of this into an SEM framework, in which case, all manner of constraints can be tested.


                          Interesting! In what scenarios would the suest approach be, theoretically, superior to the interaction effect? Can one test this before running the analysis to decide whether to use interactions or suest?
                          Best wishes

                          (Stata 16.1 MP)

                          Comment


                          • #14
                            Leonardo Guizzetti was right that my original interaction model assumed equal residual variances across the two groups:
                            Code:
                            sysuse auto, clear
                            reg price c.mpg##i.foreign

                            However, that assumption can be relaxed by using heteroskedascity-consistent standard errors, like this:

                            Code:
                            sysuse auto, clear
                            reg price c.mpg##i.foreign,hc3


                            Comment


                            • #15
                              Leonardo Guizzetti
                              I'm not sure I understand how suest solves the problem. Where in the output does suest test whether the coefficients of the two regressions are equal?
                              Code:
                               sysuse auto, clear reg price mpg if foreign est sto M1a reg price mpg if !foreign est sto M1b  suest M1a M1b

                              Comment

                              Working...
                              X