Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding an interaction term into a model or stratifying data , which method is more preferable to analyse interaction terms?

    Hi Statlists,

    Hope this post finds you well. May I know why stratification seems to be less preferable than adding an interaction term into a model straight away? is it because p-value derived from each subgroup tends to be meaningless once we stratify data into groups, as the power analysis maybe impeded by the sample size of each strata itself? In this case, based on the reason given above, am I right to say that interaction term is still more preferable than the stratification itself? It is probably because there is a need in retaining the sample size ? for example, an interaction was found between A and B on C. the interaction between A and B3 on C was observed to be significant in a regression analysis ,which took Interaction term term into account . But when I just looked at the association between A and C while stratifying data into three B groups - B1 (n=100),B2 (n=75) and B3(n=45), the effect between A and B3 on C became insignificant. Why is this so? Is it probably due to the change in sample size ?

    Any input and comments are much appreciated.

    Thank you for the clarification in advance.

    Em

  • #2
    Hi Emerald,

    I would post some of your regression output if you could.

    Regarding splitting the sample into subgroups or using interaction terms, you might checkout:

    Comment


    • #3
      Hi David,

      Thank you for the reply.

      Please find the attached regression output for your information. According to my interaction analysis, an interaction between plasma calcium (ug/L)and ethnicity (ethnic group_1(Ref), ethnic group_2 , and ethnic group_3) was found statistically significant on ethnic group_1(p=0.039) as well as ethnic group_3 (p=0.022) in the regression model. However, when I tried to stratify the data so as to examine the effect of plasma Ca and ethnicity on pre-eclampsia, the interaction between plasma Ca and ethnic group_3 was longer to be seen. In this case, I presume the discrepant observations /results between an interaction term in the model and stratification are probably due to the change in sample size in each strata itself? perhaps?

      Thanks heaps

      Em

      Code:
      . mlogit PIH c.Plasma_Ca##i.mo_eth_1 mo_age i.mo_parity i.alcohol_consumption_pp i.smoke_1 pp_bmicat_1 i.income_1,r
      > rr
      
      Iteration 0:   log likelihood = -241.47676  
      Iteration 1:   log likelihood = -223.67012  
      Iteration 2:   log likelihood = -217.02303  
      Iteration 3:   log likelihood = -216.93448  
      Iteration 4:   log likelihood = -216.93422  
      Iteration 5:   log likelihood = -216.93422  
      
      Multinomial logistic regression                 Number of obs     =        844
                                                      LR chi2(24)       =      49.09
                                                      Prob > chi2       =     0.0018
      Log likelihood = -216.93422                     Pseudo R2         =     0.1016
      
      ------------------------------------------------------------------------------------------
                           PIH |        RRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------------------+----------------------------------------------------------------
      1_Normal_Pregnancy       |  (base outcome)
      -------------------------+----------------------------------------------------------------
      2_Pre_eclampsia          |
                     Plasma_Ca |   .9558571   .0209082    -2.06   0.039     .9157438    .9977275
                               |
                      mo_eth_1 |
                            2  |   .1993895   .8472071    -0.38   0.704     .0000482    825.0144
                            3  |   .0000613   .0002597    -2.29   0.022     1.52e-08    .2474155
                               |
          mo_eth_1#c.Plasma_Ca |
                            2  |   1.008156   .0450634     0.18   0.856     .9235915    1.100463
                            3  |   1.098798    .046976     2.20   0.028     1.010478    1.194837
                               |
                        mo_age |   1.051421   .0464853     1.13   0.257     .9641472    1.146594
                   1.mo_parity |   .2285319   .1083575    -3.11   0.002     .0902305     .578816
      1.alcohol_consumption_pp |   1.105035   .5152004     0.21   0.830     .4431192    2.755698
                     2.smoke_1 |   .5774009   .2665656    -1.19   0.234     .2336188    1.427076
                   pp_bmicat_1 |   1.787243   .3393619     3.06   0.002     1.231849    2.593044
                               |
                      income_1 |
                            2  |   .9935594   .5927911    -0.01   0.991     .3085591    3.199259
                            3  |   .3388616   .2544233    -1.44   0.149     .0777902    1.476113
                               |
                         _cons |   .6481043   1.642568    -0.17   0.864     .0045117    93.09937
      -------------------------+----------------------------------------------------------------
      3_PIH                    |
                     Plasma_Ca |   1.015225   .0209663     0.73   0.464     .9749526    1.057162
                               |
                      mo_eth_1 |
                            2  |   2.375781   8.267534     0.25   0.804     .0025923    2177.318
                            3  |   1.859274   6.267002     0.18   0.854     .0025132    1375.514
                               |
          mo_eth_1#c.Plasma_Ca |
                            2  |   .9890741   .0335862    -0.32   0.746     .9253891    1.057142
                            3  |   .9928487   .0345416    -0.21   0.837      .927405     1.06291
                               |
                        mo_age |   1.031042    .043247     0.73   0.466     .9496704    1.119386
                   1.mo_parity |   .7913256   .3414627    -0.54   0.588     .3396687     1.84355
      1.alcohol_consumption_pp |   .9066601   .4457744    -0.20   0.842     .3458894    2.376576
                     2.smoke_1 |    .650226   .2884432    -0.97   0.332     .2725634    1.551176
                   pp_bmicat_1 |   1.766785   .3258855     3.09   0.002     1.230775    2.536231
                               |
                      income_1 |
                            2  |   1.093425   .6334337     0.15   0.877     .3513002    3.403297
                            3  |   .4949103   .3641333    -0.96   0.339     .1170172    2.093164
                               |
                         _cons |   .0013707   .0035856    -2.52   0.012     8.13e-06    .2310087
      ------------------------------------------------------------------------------------------

      Subgroup analysis output

      Code:
      . bysort mo_eth_1:mlogit PIH c.Plasma_Ca  mo_age i.mo_parity i.alcohol_consumption_pp i.smoke_1 i.pp_bmicat_1 i.inc
      > ome_1,rrr
      
      -------------------------------------------------------------------------------------------------------------------
      -> mo_eth_1 = 1
      
      Iteration 0:   log likelihood = -137.86892  
      Iteration 1:   log likelihood = -128.72221  
      Iteration 2:   log likelihood = -120.56905  
      Iteration 3:   log likelihood = -120.27854  
      Iteration 4:   log likelihood = -120.21377  
      Iteration 5:   log likelihood = -120.19865  
      Iteration 6:   log likelihood = -120.19549  
      Iteration 7:   log likelihood = -120.19494  
      Iteration 8:   log likelihood = -120.19488  
      Iteration 9:   log likelihood = -120.19487  
      
      Multinomial logistic regression                 Number of obs     =        457
                                                      LR chi2(20)       =      35.35
                                                      Prob > chi2       =     0.0183
      Log likelihood = -120.19487                     Pseudo R2         =     0.1282
      
      ------------------------------------------------------------------------------------------
                           PIH |        RRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------------------+----------------------------------------------------------------
      1_Normal_Pregnancy       |  (base outcome)
      -------------------------+----------------------------------------------------------------
      2_Pre_eclampsia          |
                     Plasma_Ca |   .9524892   .0214404    -2.16   0.031     .9113803    .9954524
                        mo_age |   1.115733   .0634488     1.93   0.054     .9980553    1.247285
                   1.mo_parity |    .313115   .1718681    -2.12   0.034     .1067779    .9181769
      1.alcohol_consumption_pp |    1.51249    .807993     0.77   0.439     .5308464    4.309393
                     2.smoke_1 |   .6512163   .3704523    -0.75   0.451     .2135546    1.985828
                               |
                   pp_bmicat_1 |
                            2  |   1.136585   .9701318     0.15   0.881     .2133335    6.055427
                            3  |   2.622092   1.560855     1.62   0.105     .8164934    8.420604
                            4  |   4.614635    3.66375     1.93   0.054     .9734941    21.87467
                               |
                      income_1 |
                            2  |    .702979   .5133265    -0.48   0.629     .1680279    2.941056
                            3  |   .2079117   .1819947    -1.79   0.073     .0373924    1.156045
                               |
                         _cons |   .2794893   .7764954    -0.46   0.646     .0012064    64.74839
      -------------------------+----------------------------------------------------------------
      3_PIH                    |
                     Plasma_Ca |   1.014452   .0209875     0.69   0.488     .9741401    1.056432
                        mo_age |   .9358733    .060687    -1.02   0.307     .8241772    1.062707
                   1.mo_parity |   .9286885    .551614    -0.12   0.901      .289922    2.974809
      1.alcohol_consumption_pp |   1.227892   .7388327     0.34   0.733     .3775596    3.993329
                     2.smoke_1 |   .8994071   .5453941    -0.17   0.861     .2740273    2.952017
                               |
                   pp_bmicat_1 |
                            2  |   9.82e-07   .0006086    -0.02   0.982            0           .
                            3  |   1.053633   .7445023     0.07   0.941     .2637707    4.208744
                            4  |   4.905853   3.690464     2.11   0.034     1.123031    21.43074
                               |
                      income_1 |
                            2  |    1.10081   .9435272     0.11   0.911     .2051775    5.906021
                            3  |   .6699985   .6497762    -0.41   0.680     .1001299    4.483156
                               |
                         _cons |   .0638385   .1859129    -0.94   0.345     .0002119    19.23024
      ------------------------------------------------------------------------------------------
      
      -------------------------------------------------------------------------------------------------------------------
      -> mo_eth_1 = 2
      
      Iteration 0:   log likelihood = -58.579481  
      Iteration 1:   log likelihood = -51.982386  
      Iteration 2:   log likelihood = -42.217803  
      Iteration 3:   log likelihood =  -40.76897  
      Iteration 4:   log likelihood = -40.394476  
      Iteration 5:   log likelihood = -40.320994  
      Iteration 6:   log likelihood = -40.302602  
      Iteration 7:   log likelihood = -40.298777  
      Iteration 8:   log likelihood = -40.297956  
      Iteration 9:   log likelihood = -40.297773  
      Iteration 10:  log likelihood = -40.297727  
      Iteration 11:  log likelihood = -40.297718  
      Iteration 12:  log likelihood = -40.297716  
      
      Multinomial logistic regression                 Number of obs     =        229
                                                      LR chi2(20)       =      36.56
                                                      Prob > chi2       =     0.0132
      Log likelihood = -40.297716                     Pseudo R2         =     0.3121
      
      ------------------------------------------------------------------------------------------
                           PIH |        RRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------------------+----------------------------------------------------------------
      1_Normal_Pregnancy       |  (base outcome)
      -------------------------+----------------------------------------------------------------
      2_Pre_eclampsia          |
                     Plasma_Ca |   .9685247   .0381084    -0.81   0.416    .896641    1.046171
                        mo_age |   .9698363   .1168746    -0.25   0.799      .765809    1.228221
                   1.mo_parity |   .2949372   .3401705    -1.06   0.290     .0307597    2.827981
      1.alcohol_consumption_pp |   5.10e-08   .0002364    -0.00   0.997            0           .
                     2.smoke_1 |   1.459718   1.775007     0.31   0.756     .1346524    15.82428
                               |
                   pp_bmicat_1 |
                            2  |   1.418958   11703.96     0.00   1.000            0           .
                            3  |   9.97e+07   3.79e+11     0.00   0.996            0           .
                            4  |   7.65e+07   2.91e+11     0.00   0.996            0           .
                               |
                      income_1 |
                            2  |   .7498188   .9837905    -0.22   0.826     .0572991    9.812165
                            3  |   2.79e-08    .000263    -0.00   0.999            0           .
                               |
                         _cons |   7.00e-08   .0002666    -0.00   0.997            0           .
      -------------------------+----------------------------------------------------------------
      3_PIH                    |
                     Plasma_Ca |   1.016312   .0331934     0.50   0.620     .9532925    1.083497
                        mo_age |   1.109903   .0907694     1.28   0.202     .9455237    1.302859
                   1.mo_parity |   .1901403   .2082308    -1.52   0.130     .0222273    1.626527
      1.alcohol_consumption_pp |   3.46e-07   .0013172    -0.00   0.997            0           .
                     2.smoke_1 |    .282971   .2389071    -1.50   0.135     .0540865    1.480455
                               |
                   pp_bmicat_1 |
                            2  |   1.525352    8483.06     0.00   1.000            0           .
                            3  |   3.78e+07   1.02e+11     0.01   0.995            0           .
                            4  |   7.45e+07   2.02e+11     0.01   0.995            0           .
                               |
                      income_1 |
                            2  |   1.74e+07   5.04e+10     0.01   0.995            0           .
                            3  |   4.59e+07   1.33e+11     0.01   0.995            0           .
                               |
                         _cons |   4.40e-18   1.74e-14    -0.01   0.992            0           .
      ------------------------------------------------------------------------------------------
      Note: 130 observations completely determined.  Standard errors questionable.
      
      -------------------------------------------------------------------------------------------------------------------
      -> mo_eth_1 = 3
      
      Iteration 0:   log likelihood = -44.006864  
      Iteration 1:   log likelihood =  -34.18395  
      Iteration 2:   log likelihood = -27.585138  
      Iteration 3:   log likelihood = -25.400921  
      Iteration 4:   log likelihood = -24.835405  
      Iteration 5:   log likelihood = -24.683279  
      Iteration 6:   log likelihood = -24.649887  
      Iteration 7:   log likelihood = -24.642891  
      Iteration 8:   log likelihood = -24.641665  
      Iteration 9:   log likelihood = -24.641386  
      Iteration 10:  log likelihood = -24.641317  
      Iteration 11:  log likelihood = -24.641304  
      Iteration 12:  log likelihood = -24.641301  
      Iteration 13:  log likelihood =   -24.6413  
      
      Multinomial logistic regression                 Number of obs     =        158
                                                      LR chi2(20)       =      38.73
                                                      Prob > chi2       =     0.0072
      Log likelihood =   -24.6413                     Pseudo R2         =     0.4401
      
      ------------------------------------------------------------------------------------------
                           PIH |        RRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------------------+----------------------------------------------------------------
      1_Normal_Pregnancy       |  (base outcome)
      -------------------------+----------------------------------------------------------------
      2_Pre_eclampsia          |
                     Plasma_Ca |   1.066985   .0708789     0.98   0.329     .9367289    1.215355
                        mo_age |   1.214964   .5834136     0.41   0.685     .4740505     3.11388
                   1.mo_parity |   8.26e-18   6.50e-14    -0.01   0.996            0           .
      1.alcohol_consumption_pp |   3.171918   6.833979     0.54   0.592     .0464929    216.4001
                     2.smoke_1 |   3.28e-17   3.06e-13    -0.00   0.997            0           .
                               |
                   pp_bmicat_1 |
                            2  |   1.12e+10   6.26e+13     0.00   0.997            0           .
                            3  |   1.34e-07   .0010327    -0.00   0.998            0           .
                            4  |   23.85841   82.06687     0.92   0.356      .028166    20209.62
                               |
                      income_1 |
                            2  |   8.53e+07   8.28e+11     0.00   0.998            0           .
                            3  |   1.33e+08   1.29e+12     0.00   0.998            0           .
                               |
                         _cons |   1.45e-15   1.41e-11    -0.00   0.997            0           .
      -------------------------+----------------------------------------------------------------
      3_PIH                    |
                     Plasma_Ca |   .9973177    .032705    -0.08   0.935     .9352335    1.063523
                        mo_age |   1.094547   .1114492     0.89   0.375     .8965263    1.336305
                   1.mo_parity |    2.10017   2.497156     0.62   0.533     .2042461    21.59509
      1.alcohol_consumption_pp |   3.09e-09   .0000491    -0.00   0.999            0           .
                     2.smoke_1 |   1.026335   1.108025     0.02   0.981     .1236907    8.516111
                               |
                   pp_bmicat_1 |
                            2  |   6.980294   11.56096     1.17   0.241        .2717     179.332
                            3  |    2.11557   2.747022     0.58   0.564       .16602    26.95841
                            4  |   4.359074   5.696967     1.13   0.260     .3364656    56.47391
                               |
                      income_1 |
                            2  |   .3606987   .3747826    -0.98   0.326      .047066    2.764282
                            3  |   1.05e-09   .0000135    -0.00   0.999            0           .
                               |
                         _cons |   .0026241   .0115825    -1.35   0.178     4.59e-07    14.99967
      ------------------------------------------------------------------------------------------
      Note: 55 observations completely determined.  Standard errors questionable.

      Comment


      • #4
        I'm not expert in this technique, but it is obvious that you're not interacting group with all your variables. You're estimating 3 times as many free parameters (i.e., allowing more parameters to vary by subgroup) in the subgroup analysis than the interaction. These are not the comparable subgroup vs interaction models.

        Comment

        Working...
        X