Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing regression estimates across different outcome models and testing whether there is heterogeneity

    Hi Statalisters,

    Suppose I am interested in examining two outcomes within my data set. And suppose these two outcomes represent related concepts. If I run two separate outcome models for a given set of explanatory variables, I am wondering if there is a way to compare whether estimates are consistent across models?

    For example, say I am interested in looking at length and weight as my two outcomes. I could run the models with the same set of explanatory variables and compare whether the estimates are in the same direction or magnitude.

    Code:
    sysuse auto, clear
    
    /* dichotomize at median to generate binary outcome 1 */
    generate largeyn=1 if length>=192.5 & length!=.
    replace largeyn=0 if length<192.5
    tab largeyn, missing
    
    /* dichotomize at median to generate binary outcome 2 */
    generate heavyyn=1 if weight>=3190 & weight!=.
    replace heavyyn=0 if weight<3190
    tab heavyyn, missing
    
    /* manually compare estimates from each model */
    logistic largeyn trunk turn headroom
    logistic heavyyn trunk turn headroom
    
    
    . logistic largeyn trunk turn headroom
    
    Logistic regression                               Number of obs   =         74
                                                      LR chi2(3)      =      76.12
                                                      Prob > chi2     =     0.0000
    Log likelihood = -13.231694                       Pseudo R2       =     0.7420
    
    ------------------------------------------------------------------------------
         largeyn | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           trunk |   1.825171   .4418256     2.49   0.013     1.135665    2.933303
            turn |   1.956289   .3878218     3.38   0.001      1.32644    2.885215
        headroom |   1.163899   .9530395     0.19   0.853     .2338425    5.793053
           _cons |   6.51e-16   5.88e-15    -3.87   0.000     1.31e-23    3.23e-08
    ------------------------------------------------------------------------------
    
    . logistic heavyyn trunk turn headroom
    
    Logistic regression                               Number of obs   =         74
                                                      LR chi2(3)      =      71.06
                                                      Prob > chi2     =     0.0000
    Log likelihood = -15.762918                       Pseudo R2       =     0.6927
    
    ------------------------------------------------------------------------------
         heavyyn | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           trunk |   1.286904   .2192673     1.48   0.139     .9215427    1.797119
            turn |   2.036741   .3769406     3.84   0.000      1.41711    2.927306
        headroom |   1.690922   1.260841     0.70   0.481     .3921233    7.291629
           _cons |   4.56e-15   3.56e-14    -4.24   0.000     1.07e-21    1.95e-08
    ------------------------------------------------------------------------------
    
    /*
    COMPARE ESTIMATES ACROSS MODELS:
    for trunk: 1.83 versus 1.29
    for turn: 1.96 versus 2.04
    for headroom: 1.16 versus 1.69
    */
    However, this would be very subjective. Is there a way to generate a statistical test of heterogeneity similar to what a Q-statistic would give in a meta-analysis?

    Would it be possible to incorporate an interaction term in the models? For example, if we considered the two outcome models as being "stratified" models from some pseudo population that contains duplicate observations for each individual. Ignoring the fact that the observations are correlated, would something similar to the below code work?

    Code:
    sysuse auto, clear
    
    /* dichotomize at median to generate binary outcome 1 */
    generate largeyn=1 if length>=192.5 & length!=.
    replace largeyn=0 if length<192.5
    tab largeyn, missing
    
    /* dichotomize at median to generate binary outcome 2 */
    generate heavyyn=1 if weight>=3190 & weight!=.
    replace heavyyn=0 if weight<3190
    tab heavyyn, missing
    
    /* manually compare estimates from each model */
    logistic largeyn trunk turn headroom
    logistic heavyyn trunk turn headroom
    
    
    /* create pseudo dataset */
    
    preserve
    generate model=0
    generate outcome=largeyn
    count
    save pseudo1.dta, replace
    restore
    
    preserve
    generate model=1
    generate outcome=heavyyn
    count
    save pseudo2.dta, replace
    restore
    
    /* append the two datasets */
    
    clear
    use pseudo1.dta, clear
    append using pseudo2.dta
    count
    
    /* run stratified models using psuedo data, should be same as above */
    logistic outcome trunk turn headroom if model==0
    logistic outcome trunk turn headroom if model==1
    
    /* instead of running stratified models, compare estimates using statistical interaction test */
    logistic outcome c.trunk##i.model c.turn##i.model c.headroom##i.model
    
    . logistic outcome c.trunk##i.model c.turn##i.model c.headroom##i.model
    
    Logistic regression                               Number of obs   =        148
                                                      LR chi2(7)      =     147.18
                                                      Prob > chi2     =     0.0000
    Log likelihood = -28.994612                       Pseudo R2       =     0.7174
    
    ----------------------------------------------------------------------------------
             outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------+----------------------------------------------------------------
               trunk |   1.825171   .4418256     2.49   0.013     1.135665    2.933303
             1.model |   7.014033   83.71004     0.16   0.870     4.87e-10    1.01e+11
                     |
       model#c.trunk |
                  1  |   .7050868   .2087225    -1.18   0.238     .3946996    1.259559
                     |
                turn |   1.956289   .3878218     3.38   0.001      1.32644    2.885215
                     |
        model#c.turn |
                  1  |   1.041125   .2823573     0.15   0.882     .6118622    1.771545
                     |
            headroom |   1.163899   .9530395     0.19   0.853     .2338425    5.793053
                     |
    model#c.headroom |
                  1  |   1.452808   1.608938     0.34   0.736     .1657789    12.73171
                     |
               _cons |   6.51e-16   5.88e-15    -3.87   0.000     1.31e-23    3.23e-08
    ----------------------------------------------------------------------------------
    
    /* display stratum-specific estimates for largeyn outcome*/
    lincom _b[trunk]
    lincom _b[turn]
    lincom _b[headroom]
    
    . lincom _b[trunk]
    
     ( 1)  [outcome]trunk = 0
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   1.825171   .4418256     2.49   0.013     1.135665    2.933303
    ------------------------------------------------------------------------------
    
    . lincom _b[turn]
    
     ( 1)  [outcome]turn = 0
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   1.956289   .3878218     3.38   0.001      1.32644    2.885215
    ------------------------------------------------------------------------------
    
    . lincom _b[headroom]
    
     ( 1)  [outcome]headroom = 0
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   1.163899   .9530395     0.19   0.853     .2338425    5.793053
    ------------------------------------------------------------------------------
    
    /* display stratum-specific estimates for heavyyn outcome */
    lincom _b[trunk]+_b[1.model#c.trunk]
    lincom _b[turn]+_b[1.model#c.turn]
    lincom _b[headroom]+_b[1.model#c.headroom]
    
    . lincom _b[trunk]+_b[1.model#c.trunk]
    
     ( 1)  [outcome]trunk + [outcome]1.model#c.trunk = 0
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   1.286904   .2192673     1.48   0.139     .9215427    1.797119
    ------------------------------------------------------------------------------
    
    . lincom _b[turn]+_b[1.model#c.turn]
    
     ( 1)  [outcome]turn + [outcome]1.model#c.turn = 0
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   2.036741   .3769406     3.84   0.000      1.41711    2.927306
    ------------------------------------------------------------------------------
    
    . lincom _b[headroom]+_b[1.model#c.headroom]
    
     ( 1)  [outcome]headroom + [outcome]1.model#c.headroom = 0
    
    ------------------------------------------------------------------------------
         outcome | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   1.690922   1.260841     0.70   0.481     .3921233    7.291629
    ------------------------------------------------------------------------------
    
    /*
    
    now, comparing the stratum-specific estimates calculated using the interaction term (same values as the stratified models):
    for trunk: 1.83 versus 1.29
    for turn: 1.96 versus 2.04
    for headroom: 1.16 versus 1.69
    
    but now we have a statistical test of heterogeneity (the p-value associated with the interaction term):
    trunk estimates not significantly different across outcome models (interaction p-value 0.238)
    turn estimates not significantly different across outcome models (interaction p-value 0.882)
    headroom estimates not significantly different across outcome models (interaction p-value 0.736)
    
    */
    
    /* remove data */
    rm pseudo1.dta
    rm pseudo2.dta
    Last edited by Jenny Williams; 13 Apr 2018, 20:38.

  • #2
    Originally posted by Jenny Williams View Post
    Suppose I am interested in examining two outcomes within my data set. And suppose these two outcomes represent related concepts. If I run two separate outcome models for a given set of explanatory variables, I am wondering if there is a way to compare whether estimates are consistent across models?
    If I follow you correctly, then you can get what you're after using gsem. See below (start at the Begin here comment.)

    .ÿversionÿ15.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿquietlyÿsysuseÿauto

    .ÿ
    .ÿquietlyÿcentileÿlength

    .ÿgenerateÿbyteÿlargeÿ=ÿlengthÿ>ÿr(c_1)

    .ÿ
    .ÿquietlyÿcentileÿweight

    .ÿgenerateÿbyteÿheavyÿ=ÿweightÿ>ÿr(c_1)

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿ
    .ÿ//ÿWaldÿtest
    .ÿgsemÿ(largeÿheavyÿ<-ÿc.(trunkÿturnÿheadroom),ÿlogit),ÿnodvheaderÿnolog

    GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ74
    Logÿlikelihoodÿ=ÿ-28.994612

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    largeÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿtrunkÿ|ÿÿÿ.6016738ÿÿÿ.2420735ÿÿÿÿÿ2.49ÿÿÿ0.013ÿÿÿÿÿ.1272184ÿÿÿÿ1.076129
    ÿÿÿÿÿÿÿÿturnÿ|ÿÿÿ.6710491ÿÿÿ.1982437ÿÿÿÿÿ3.38ÿÿÿ0.001ÿÿÿÿÿ.2824987ÿÿÿÿÿÿ1.0596
    ÿÿÿÿheadroomÿ|ÿÿÿÿ.151776ÿÿÿ.8188332ÿÿÿÿÿ0.19ÿÿÿ0.853ÿÿÿÿ-1.453108ÿÿÿÿ1.756659
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-34.96852ÿÿÿ9.041231ÿÿÿÿ-3.87ÿÿÿ0.000ÿÿÿÿ-52.68901ÿÿÿ-17.24803
    -------------+----------------------------------------------------------------
    heavyÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿtrunkÿ|ÿÿÿ.2522394ÿÿÿ.1703835ÿÿÿÿÿ1.48ÿÿÿ0.139ÿÿÿÿ-.0817062ÿÿÿÿÿ.586185
    ÿÿÿÿÿÿÿÿturnÿ|ÿÿÿ.7113509ÿÿÿ.1850704ÿÿÿÿÿ3.84ÿÿÿ0.000ÿÿÿÿÿ.3486196ÿÿÿÿ1.074082
    ÿÿÿÿheadroomÿ|ÿÿÿÿ.525274ÿÿÿ.7456529ÿÿÿÿÿ0.70ÿÿÿ0.481ÿÿÿÿ-.9361789ÿÿÿÿ1.986727
    ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ-33.0206ÿÿÿ7.790509ÿÿÿÿ-4.24ÿÿÿ0.000ÿÿÿÿ-48.28972ÿÿÿ-17.75149
    ------------------------------------------------------------------------------

    .ÿtestÿ[large]trunkÿ=ÿ[heavy]trunk,ÿnotest

    ÿ(ÿ1)ÿÿ[large]trunkÿ-ÿ[heavy]trunkÿ=ÿ0

    .ÿtestÿ[large]turnÿ=ÿ[heavy]turn,ÿnotestÿaccumulate

    ÿ(ÿ1)ÿÿ[large]trunkÿ-ÿ[heavy]trunkÿ=ÿ0
    ÿ(ÿ2)ÿÿ[large]turnÿ-ÿ[heavy]turnÿ=ÿ0

    .ÿtestÿ[large]headroomÿ=ÿ[heavy]headroom,ÿaccumulate

    ÿ(ÿ1)ÿÿ[large]trunkÿ-ÿ[heavy]trunkÿ=ÿ0
    ÿ(ÿ2)ÿÿ[large]turnÿ-ÿ[heavy]turnÿ=ÿ0
    ÿ(ÿ3)ÿÿ[large]headroomÿ-ÿ[heavy]headroomÿ=ÿ0

    ÿÿÿÿÿÿÿÿÿÿÿchi2(ÿÿ3)ÿ=ÿÿÿÿ1.80
    ÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.6142

    .ÿ
    .ÿ//ÿLikelihood-ratioÿtest
    .ÿestimatesÿstoreÿUnconstrained

    .ÿ
    .ÿconstraintÿdefineÿ1ÿ[large]trunkÿ=ÿ[heavy]trunk

    .ÿconstraintÿdefineÿ2ÿ[large]turnÿ=ÿ[heavy]turn

    .ÿconstraintÿdefineÿ3ÿ[large]headroomÿ=ÿ[heavy]headroom

    .ÿgsemÿ(largeÿheavyÿ<-ÿc.(trunkÿturnÿheadroom),ÿlogit),ÿ///
    >ÿÿÿÿÿÿÿÿÿconstraints(1/3)ÿnocnsreportÿnodvheaderÿnolog

    GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ74
    Logÿlikelihoodÿ=ÿ-29.965072

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    largeÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿtrunkÿ|ÿÿÿ.3993953ÿÿÿÿ.136128ÿÿÿÿÿ2.93ÿÿÿ0.003ÿÿÿÿÿ.1325894ÿÿÿÿ.6662012
    ÿÿÿÿÿÿÿÿturnÿ|ÿÿÿ.6713905ÿÿÿÿ.128934ÿÿÿÿÿ5.21ÿÿÿ0.000ÿÿÿÿÿ.4186844ÿÿÿÿ.9240965
    ÿÿÿÿheadroomÿ|ÿÿÿ.3462579ÿÿÿ.5404406ÿÿÿÿÿ0.64ÿÿÿ0.522ÿÿÿÿ-.7129862ÿÿÿÿ1.405502
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-32.84459ÿÿÿ5.596455ÿÿÿÿ-5.87ÿÿÿ0.000ÿÿÿÿ-43.81344ÿÿÿ-21.87574
    -------------+----------------------------------------------------------------
    heavyÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿtrunkÿ|ÿÿÿ.3993953ÿÿÿÿ.136128ÿÿÿÿÿ2.93ÿÿÿ0.003ÿÿÿÿÿ.1325894ÿÿÿÿ.6662012
    ÿÿÿÿÿÿÿÿturnÿ|ÿÿÿ.6713905ÿÿÿÿ.128934ÿÿÿÿÿ5.21ÿÿÿ0.000ÿÿÿÿÿ.4186844ÿÿÿÿ.9240965
    ÿÿÿÿheadroomÿ|ÿÿÿ.3462579ÿÿÿ.5404406ÿÿÿÿÿ0.64ÿÿÿ0.522ÿÿÿÿ-.7129862ÿÿÿÿ1.405502
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-32.84459ÿÿÿ5.596455ÿÿÿÿ-5.87ÿÿÿ0.000ÿÿÿÿ-43.81344ÿÿÿ-21.87574
    ------------------------------------------------------------------------------

    .ÿlrtestÿUnconstrained

    Likelihood-ratioÿtestÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(3)ÿÿ=ÿÿÿÿÿÿ1.94
    (Assumption:ÿ.ÿnestedÿinÿUnconstrained)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.5848

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Professor Coveney, thank you very much for this wonderful answer. I also realized that I was doing a manual "suest" (without the clustering) by stacking the data.

      Comment


      • #4
        If you want to replicate the results that you get with suest by using gsem, then use the vce(robust) option. Run the do-file below and compare the results to prove to yourself that they are the same.

        You can also fit a seemingly unrelated bivariate probit regression model using biprobit. (You can fit this model, too, using gsem.)

        I'm not sure that I would go this route, but if you want to "stack" the data and test an interaction term, then you might want to look into something like that below with xt (or corresponding me) commands.

        Code:
        version 15.1
        
        clear *
        
        quietly sysuse auto
        
        quietly centile length
        generate byte large = length > r(c_1)
        
        quietly centile weight
        generate byte heavy = weight > r(c_1)
        
        // -suest-
        quietly logit large c.(trunk turn headroom)
        estimates store L
        
        quietly logit heavy c.(trunk turn headroom)
        estimates store H
        
        suest L H
        test [L_large]trunk = [H_heavy]trunk, notest
        test [L_large]turn = [H_heavy]turn, notest accumulate
        test [L_large]headroom = [H_heavy]headroom, accumulate
        
        // cf.
        gsem (large heavy <- c.(trunk turn headroom), logit), vce(robust) nodvheader nolog
        test [large]trunk = [heavy]trunk, notest
        test [large]turn = [heavy]turn, notest accumulate
        test [large]headroom = [heavy]headroom, accumulate
        
        // fitting a bivariate-probit model
        biprobit large heavy c.(trunk turn headroom), nolog
        test [large]trunk = [heavy]trunk
        test [large]turn = [heavy]turn, accumulate
        test [large]headroom = [heavy]headroom, accumulate
        
        // stacking data, with interaction term
        encode make, generate(Make) label(Makes)
        keep large heavy trunk turn headroom Make
        rename large dep1
        rename heavy dep2
        quietly reshape long dep, i(Make) j(outcome)
        
        xtlogit dep i.outcome##c.(trunk turn headroom), i(Make) re nolog
        test 2.outcome#c.trunk 2.outcome#c.turn 2.outcome#c.headroom
        
        estimates store Full
        xtlogit dep i.outcome c.(trunk turn headroom), i(Make) re nolog
        lrtest Full
        
        // or
        xtprobit dep i.outcome##c.(trunk turn headroom), i(Make) re nolog
        test 2.outcome#c.trunk 2.outcome#c.turn 2.outcome#c.headroom
        
        exit

        Comment


        • #5
          Thank you again Joseph for another excellent response. This summarizes what I have been researching over the past few days and is a handy reference. So now my question is - if I am examining a bivariate (or multivariate) logistic regression model, and if I am interested in examining whether the impact of my exposure variables (trunk turn headroom) are different across the various outcome variables (large and heavy), will these models (in particular the SEM/suest simultaneous models as above) allow me to test for these differences? I think with the robust estimators, it should account for any correlation between the outcomes. But do I really need to look at incorporating other methodological considerations such as the Allison 1989 paper for unobserved heterogeneity or look at comparing outcomes on the probability scale?

          Comment

          Working...
          X