Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing means across multiple groups using svy commands

    I am looking to test for differences in means between three groups. Normally I would just run a oneway anova and call it a day. However, I am using survey data and Stata does not allow the use of the anova command with the svy commands. My solution is to run an adjusted Wald test to compare the equality of means across the three groups. Would any of you approach this differently?

    Code:
    svy: mean age, over(race)
    test [age]1 = [age]2 = [age]3

  • #2
    Steven:
    there's nothing that -regress- can do worse than -anova-.
    I would go as in the following toy-example:
    Code:
    . use http://www.stata-press.com/data/r16/nhanes2.dta
    
    . svy: regress weight i.race
    (running regress on estimation sample)
    
    Survey: Linear regression
    
    Number of strata   =        31                Number of obs     =       10,351
    Number of PSUs     =        62                Population size   =  117,157,513
                                                  Design df         =           31
                                                  F(   2,     30)   =        21.04
                                                  Prob > F          =       0.0000
                                                  R-squared         =       0.0106
    
    ------------------------------------------------------------------------------
                 |             Linearized
          weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            race |
          Black  |   3.308829   .8027919     4.12   0.000     1.671524    4.946134
          Other  |  -7.697939   1.637091    -4.70   0.000    -11.03681    -4.35907
                 |
           _cons |   71.77969   .1784263   402.29   0.000     71.41578    72.14359
    ------------------------------------------------------------------------------
    
    . mat list e(b)
    
    e(b)[1,4]
                1b.          2.          3.           
              race        race        race       _cons
    y1           0   3.3088294  -7.6979386   71.779686
    
    . test 1b.race=2.race=3.race
    
    Adjusted Wald test
    
     ( 1)  1b.race - 2.race = 0
     ( 2)  1b.race - 3.race = 0
    
           F(  2,    30) =   21.04
                Prob > F =    0.0000
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Steven:
      as a secon approach, you may want to consider the following one (same Stata dataset):
      Code:
      . svy: mean  weight, over(race)
      (running mean on estimation sample)
      
      Survey: Mean estimation
      
      Number of strata =      31       Number of obs   =       10,351
      Number of PSUs   =      62       Population size =  117,157,513
                                       Design df       =           31
      
      ---------------------------------------------------------------
                    |             Linearized
                    |       Mean   Std. Err.     [95% Conf. Interval]
      --------------+------------------------------------------------
      c.weight@race |
             White  |   71.77969   .1784263      71.41578    72.14359
             Black  |   75.08852   .7304715      73.59871    76.57832
             Other  |   64.08175    1.61103      60.79603    67.36746
      ---------------------------------------------------------------
      
      . mat list e(b)
      
      e(b)[1,3]
           c.weight@  c.weight@  c.weight@
             1.race     2.race     3.race
      y1  71.779686  75.088516  64.081748
      
      . test [email protected][email protected][email protected]
      
      Adjusted Wald test
      
       ( 1)  [email protected] - [email protected] = 0
       ( 2)  [email protected] - [email protected] = 0
      
             F(  2,    30) =   21.04
                  Prob > F =    0.0000
      
      .
      As expected, -test- results are identical.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        A minor variation on Carlo's advice in #2 is to swap test with testparm.

        Code:
        testparm i.race

        Comment


        • #5
          Thank you Carlo and Leonardo. My approach yields identical results to these two, which is reassuring since that is to be expected. I am attempting to address a reviewer's concern that my current approach does not account for multiple pairwise comparisons. I am not comparing many variables in my Table 1 so I am not as worried about this. But I find it interesting that my Bonferroni-adjusted p-values are identical to the non-adjusted p-values. I guess this supports my notion that correction is not needed.
          Last edited by Steven Spivack; 02 Sep 2020, 09:45.

          Comment


          • #6
            Here is an example of one of my Table 1 variables.

            Code:
            . svy: mean q55_1, over(tablecat)
            (running mean on estimation sample)
            
            Survey: Mean estimation
            
            Number of strata =       1        Number of obs   =      1,178
            Number of PSUs   =     628        Population size = 3,266.6842
                                              Design df       =        627
            
                        1: tablecat = 1
                        2: tablecat = 2
                        3: tablecat = 3
            
            --------------------------------------------------------------
                         |             Linearized
                    Over |       Mean   Std. Err.     [95% Conf. Interval]
            -------------+------------------------------------------------
            q55_1        |
                       1 |   18.97772   2.897018      13.28869    24.66675
                       2 |   27.42266   4.367904      18.84517    36.00016
                       3 |   29.33308   4.657676      20.18654    38.47961
            --------------------------------------------------------------
            
            . test [q55_1]1 = [q55_1]2 = [q55_1]3, mtest(b)
            
            Adjusted Wald test
            
             ( 1)  [q55_1]1 - [q55_1]2 = 0
             ( 2)  [q55_1]1 - [q55_1]3 = 0
            
            ---------------------------------------
                   |   F(df,626)     df       p
            -------+-------------------------------
              (1)  |        2.58      1     0.2171 #
              (2)  |        3.56      1     0.1193 #
            -------+-------------------------------
              all  |        2.24      2     0.1068
            ---------------------------------------
                     # Bonferroni-adjusted p-values
            
            . test [q55_1]1 = [q55_1]2 = [q55_1]3
            
            Adjusted Wald test
            
             ( 1)  [q55_1]1 - [q55_1]2 = 0
             ( 2)  [q55_1]1 - [q55_1]3 = 0
            
                   F(  2,   626) =    2.24
                        Prob > F =    0.1068

            Comment


            • #7
              You're Bonferroni-adjusted p-values (in this case) will be exactly twice the values of unadjusted p-values from the regression table (for example). If by Table 1 you mean the usual demographics table, then perhaps you could include the p-value from the overall test of differences (i.e., from the "all" line in the Adjusted Wald test in post #6). Adjusting for multiple comparisons really only affects the pair-wise contrasts.

              Comment


              • #8
                Thanks Leonardo! I was presenting the p value from the Adjusted Wald test but I will make that more explicit in my methods section. I appreciate all of your help.

                Comment


                • #9
                  Thank you for this useful thread. I am also comparing the means across 3 groups (accounting for survey design). For ease, I will use the same variable names in the examples above. My main question is with the line of code to test for differences between means (including the Bonferroni correction)

                  test 1b.race=2.race=3.race, mtest(b)

                  This only tests the difference between group 1b and 2, and 1b and 3. How can I also add the comparison between group 2 and 3? I realise I could run the comparison between group 2 and 3 in a separate test command, but this then isn't included in the Bonferroni correction. I could run them all unadjusted and manually correct the p-values myself (multiply by number of comparisons), but my comparison of group 2 v 3 in my case p = 0.572, so multiplying by 3 does not seem correct? My current solution would be to adjust the alpha criterion to evaluate the unadjusted p-values to (i.e. 0.05/3 = 0.0167), but I wondered if there's another solution using the test command to get all 3 pairwise comparisons with their Bonferroni adjusted p values?

                  Thank you,
                  Jenny
                  Last edited by Jenny Fielder; 02 Feb 2023, 07:00.

                  Comment


                  • #10
                    Originally posted by Jenny Fielder View Post
                    Thank you for this useful thread. I am also comparing the means across 3 groups (accounting for survey design). For ease, I will use the same variable names in the examples above. My main question is with the line of code to test for differences between means (including the Bonferroni correction)

                    test 1b.race=2.race=3.race, mtest(b)

                    This only tests the difference between group 1b and 2, and 1b and 3. How can I also add the comparison between group 2 and 3? I realise I could run the comparison between group 2 and 3 in a separate test command, but this then isn't included in the Bonferroni correction. I could run them all unadjusted and manually correct the p-values myself (multiply by number of comparisons), but my comparison of group 2 v 3 in my case p = 0.572, so multiplying by 3 does not seem correct? My current solution would be to adjust the alpha criterion to evaluate the unadjusted p-values to (i.e. 0.05/3 = 0.0167), but I wondered if there's another solution using the test command to get all 3 pairwise comparisons with their Bonferroni adjusted p values?

                    Thank you,
                    Jenny
                    Hi Jenny, welcome. In future, please start a new thread instead of resurrecting old and answered threads. This makes it easier for people to search for questions that may help them later on.

                    The PDF documentation for -test- shows some examples of the type of syntax you need when you want to use multiplicity adjustment. If you wish to see the adjusted differences, their confidence intervals and p-values, then you could use -margins-.

                    [code]
                    Code:
                    webuse census3, clear
                    
                    regress brate i.region
                    // just perform adjustment multiple comparisons, showing only adjusted p-values and test statistics
                    test (1.region=2.region) (1.region=3.region) (1.region=4.region) (2.region=3.region) (2.region=4.region) (3.region=4.region), mtest(bonferroni)
                    
                    // alternative method, but show the estimate of each contrast
                    margins i.region, pwcompare(effect) mcomp(bonferroni)
                    Selected output

                    Code:
                    . regress brate i.region
                    ( output omitted)
                    
                    ------------------------------------------------------------------------------
                           brate | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                          region |
                       NCentral  |   30.86111   9.785122     3.15   0.003     11.16468    50.55754
                          South  |   25.67361   9.246071     2.78   0.008     7.062236    44.28499
                           West  |   59.34188   9.622477     6.17   0.000     39.97284    78.71092
                                 |
                           _cons |   136.8889   7.396857    18.51   0.000     121.9998     151.778
                    ------------------------------------------------------------------------------
                    
                    . test (1.region=2.region) (1.region=3.region) (1.region=4.region) (2.region=3.region) (2.region=4.region) (3.region=4.region), mtest(bonferroni)
                    
                    (output omitted)
                    
                    --------------------------------------
                           |    F(df,46)     df      p > F
                    -------+------------------------------
                       (1) |        9.95      1     0.0170*
                       (2) |        7.71      1     0.0475*
                       (3) |       38.03      1     0.0000*
                       (4) |        0.37      1     1.0000*
                       (5) |       10.28      1     0.0147*
                       (6) |       16.51      1     0.0011*
                    -------+------------------------------
                       All |       13.23      3     0.0000
                    --------------------------------------
                    * Bonferroni-adjusted p-values
                    
                    . margins i.region, pwcompare(effect) mcomp(bonferroni)
                    
                    Pairwise comparisons of adjusted predictions                Number of obs = 50
                    Model VCE: OLS
                    
                    Expression: Linear prediction, predict()
                    
                    ---------------------------
                                 |    Number of
                                 |  comparisons
                    -------------+-------------
                          region |            6
                    ---------------------------
                    
                    ------------------------------------------------------------------------------------
                                       |            Delta-method    Bonferroni           Bonferroni
                                       |   Contrast   std. err.      t    P>|t|     [95% conf. interval]
                    -------------------+----------------------------------------------------------------
                                region |
                       NCentral vs NE  |   30.86111   9.785122     3.15   0.017     3.881821     57.8404
                          South vs NE  |   25.67361   9.246071     2.78   0.048     .1805786    51.16664
                           West vs NE  |   59.34188   9.622477     6.17   0.000     32.81103    85.87273
                    South vs NCentral  |    -5.1875   8.474164    -0.61   1.000    -28.55225    18.17725
                     West vs NCentral  |   28.48077   8.883337     3.21   0.015     3.987856    52.97368
                        West vs South  |   33.66827   8.285826     4.06   0.001      10.8228    56.51374
                    ------------------------------------------------------------------------------------

                    Comment


                    • #11
                      Thank you Leonardo!

                      Comment


                      • #12
                        Originally posted by Carlo. Lazzaro View Post
                        Steven:
                        there's nothing that -regress- can do worse than -anova-.
                        I would go as in the following toy-example:
                        Code:
                        . use http://www.stata-press.com/data/r16/nhanes2.dta
                        
                        . svy: regress weight i.race
                        (running regress on estimation sample)
                        
                        Survey: Linear regression
                        
                        Number of strata = 31 Number of obs = 10,351
                        Number of PSUs = 62 Population size = 117,157,513
                        Design df = 31
                        F( 2, 30) = 21.04
                        Prob > F = 0.0000
                        R-squared = 0.0106
                        
                        ------------------------------------------------------------------------------
                        | Linearized
                        weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        race |
                        Black | 3.308829 .8027919 4.12 0.000 1.671524 4.946134
                        Other | -7.697939 1.637091 -4.70 0.000 -11.03681 -4.35907
                        |
                        _cons | 71.77969 .1784263 402.29 0.000 71.41578 72.14359
                        ------------------------------------------------------------------------------
                        
                        . mat list e(b)
                        
                        e(b)[1,4]
                        1b. 2. 3.
                        race race race _cons
                        y1 0 3.3088294 -7.6979386 71.779686
                        
                        . test 1b.race=2.race=3.race
                        
                        Adjusted Wald test
                        
                        ( 1) 1b.race - 2.race = 0
                        ( 2) 1b.race - 3.race = 0
                        
                        F( 2, 30) = 21.04
                        Prob > F = 0.0000
                        
                        .
                        Dear Carlo
                        Thank you for this useful method. When applying this method , I encountered some issues and would like to seek your advice.
                        svy: regress RIDAGEYR i.g4
                        mat list e(b)
                        test 1b.g4 =2.g4 =3.g4=4.g4=5.g4
                        After running the above three lines of code,I got the result of Prob > F = 0.0014
                        When I group the RIDAGEYR variable to create a new variable called 'age.' and run the code-- svy: tabulate age g4 , I got the result of P = 0.1398
                        The results of two consecutive difference tests on the same pair of variables are opposite. Does this make sense?

                        Thank you,
                        young

                        Comment


                        • #13
                          Young:
                          1) you're running two different kind of tests;
                          2) the second test (based on -svy: tabulate twoway-) test the homogeneity of g4 distribution across -age- strata (I guess so, as I do not have excerpt of your dataset). In your case, the test cannot reject the null (homogeneity).
                          Kind regards,
                          Carlo
                          (StataNow 18.5)

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            Young:
                            1) you're running two different kind of tests;
                            2) the second test (based on -svy: tabulate twoway-) test the homogeneity of g4 distribution across -age- strata (I guess so, as I do not have excerpt of your dataset). In your case, the test cannot reject the null (homogeneity).
                            I got it. Thank you Carlo!

                            Comment

                            Working...
                            X