Comparing means across multiple groups using svy commands

Steven Spivack

Join Date: Nov 2016

Posts: 27
#1

Comparing means across multiple groups using svy commands

01 Sep 2020, 19:33

I am looking to test for differences in means between three groups. Normally I would just run a oneway anova and call it a day. However, I am using survey data and Stata does not allow the use of the anova command with the svy commands. My solution is to run an adjusted Wald test to compare the equality of means across the three groups. Would any of you approach this differently?

Code:

svy: mean age, over(race) test [age]1 = [age]2 = [age]3
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17606

02 Sep 2020, 05:52

Steven:
there's nothing that -regress- can do worse than -anova-.
I would go as in the following toy-example:

Code:

. use http://www.stata-press.com/data/r16/nhanes2.dta

. svy: regress weight i.race
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =        31                Number of obs     =       10,351
Number of PSUs     =        62                Population size   =  117,157,513
                                              Design df         =           31
                                              F(   2,     30)   =        21.04
                                              Prob > F          =       0.0000
                                              R-squared         =       0.0106

------------------------------------------------------------------------------
             |             Linearized
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      Black  |   3.308829   .8027919     4.12   0.000     1.671524    4.946134
      Other  |  -7.697939   1.637091    -4.70   0.000    -11.03681    -4.35907
             |
       _cons |   71.77969   .1784263   402.29   0.000     71.41578    72.14359
------------------------------------------------------------------------------

. mat list e(b)

e(b)[1,4]
            1b.          2.          3.           
          race        race        race       _cons
y1           0   3.3088294  -7.6979386   71.779686

. test 1b.race=2.race=3.race

Adjusted Wald test

 ( 1)  1b.race - 2.race = 0
 ( 2)  1b.race - 3.race = 0

       F(  2,    30) =   21.04
            Prob > F =    0.0000

.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17606

02 Sep 2020, 05:58

Steven:
as a secon approach, you may want to consider the following one (same Stata dataset):

Code:

. svy: mean  weight, over(race)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      31       Number of obs   =       10,351
Number of PSUs   =      62       Population size =  117,157,513
                                 Design df       =           31

---------------------------------------------------------------
              |             Linearized
              |       Mean   Std. Err.     [95% Conf. Interval]
--------------+------------------------------------------------
c.weight@race |
       White  |   71.77969   .1784263      71.41578    72.14359
       Black  |   75.08852   .7304715      73.59871    76.57832
       Other  |   64.08175    1.61103      60.79603    67.36746
---------------------------------------------------------------

. mat list e(b)

e(b)[1,3]
     c.weight@  c.weight@  c.weight@
       1.race     2.race     3.race
y1  71.779686  75.088516  64.081748

. test [email protected][email protected][email protected]

Adjusted Wald test

 ( 1)  [email protected] - [email protected] = 0
 ( 2)  [email protected] - [email protected] = 0

       F(  2,    30) =   21.04
            Prob > F =    0.0000

.

As expected, -test- results are identical.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#4

02 Sep 2020, 07:19

A minor variation on Carlo's advice in #2 is to swap test with testparm.

Code:

testparm i.race
1 like
Comment
Steven Spivack

Join Date: Nov 2016

Posts: 27
#5

02 Sep 2020, 09:40

Thank you Carlo and Leonardo. My approach yields identical results to these two, which is reassuring since that is to be expected. I am attempting to address a reviewer's concern that my current approach does not account for multiple pairwise comparisons. I am not comparing many variables in my Table 1 so I am not as worried about this. But I find it interesting that my Bonferroni-adjusted p-values are identical to the non-adjusted p-values. I guess this supports my notion that correction is not needed.

Last edited by Steven Spivack; 02 Sep 2020, 09:45.
Comment

Steven Spivack

Join Date: Nov 2016
Posts: 27

02 Sep 2020, 09:49

Here is an example of one of my Table 1 variables.

Code:

. svy: mean q55_1, over(tablecat)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1        Number of obs   =      1,178
Number of PSUs   =     628        Population size = 3,266.6842
                                  Design df       =        627

            1: tablecat = 1
            2: tablecat = 2
            3: tablecat = 3

--------------------------------------------------------------
             |             Linearized
        Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
q55_1        |
           1 |   18.97772   2.897018      13.28869    24.66675
           2 |   27.42266   4.367904      18.84517    36.00016
           3 |   29.33308   4.657676      20.18654    38.47961
--------------------------------------------------------------

. test [q55_1]1 = [q55_1]2 = [q55_1]3, mtest(b)

Adjusted Wald test

 ( 1)  [q55_1]1 - [q55_1]2 = 0
 ( 2)  [q55_1]1 - [q55_1]3 = 0

---------------------------------------
       |   F(df,626)     df       p
-------+-------------------------------
  (1)  |        2.58      1     0.2171 #
  (2)  |        3.56      1     0.1193 #
-------+-------------------------------
  all  |        2.24      2     0.1068
---------------------------------------
         # Bonferroni-adjusted p-values

. test [q55_1]1 = [q55_1]2 = [q55_1]3

Adjusted Wald test

 ( 1)  [q55_1]1 - [q55_1]2 = 0
 ( 2)  [q55_1]1 - [q55_1]3 = 0

       F(  2,   626) =    2.24
            Prob > F =    0.1068

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#7

02 Sep 2020, 10:25

You're Bonferroni-adjusted p-values (in this case) will be exactly twice the values of unadjusted p-values from the regression table (for example). If by Table 1 you mean the usual demographics table, then perhaps you could include the p-value from the overall test of differences (i.e., from the "all" line in the Adjusted Wald test in post #6). Adjusting for multiple comparisons really only affects the pair-wise contrasts.
1 like
Comment
Steven Spivack

Join Date: Nov 2016

Posts: 27
#8

02 Sep 2020, 10:34

Thanks Leonardo! I was presenting the p value from the Adjusted Wald test but I will make that more explicit in my methods section. I appreciate all of your help.
Comment
Jenny Fielder

Join Date: Feb 2023

Posts: 2
#9

02 Feb 2023, 06:26

Thank you for this useful thread. I am also comparing the means across 3 groups (accounting for survey design). For ease, I will use the same variable names in the examples above. My main question is with the line of code to test for differences between means (including the Bonferroni correction)

test 1b.race=2.race=3.race, mtest(b)

This only tests the difference between group 1b and 2, and 1b and 3. How can I also add the comparison between group 2 and 3? I realise I could run the comparison between group 2 and 3 in a separate test command, but this then isn't included in the Bonferroni correction. I could run them all unadjusted and manually correct the p-values myself (multiply by number of comparisons), but my comparison of group 2 v 3 in my case p = 0.572, so multiplying by 3 does not seem correct? My current solution would be to adjust the alpha criterion to evaluate the unadjusted p-values to (i.e. 0.05/3 = 0.0167), but I wondered if there's another solution using the test command to get all 3 pairwise comparisons with their Bonferroni adjusted p values?

Thank you,
Jenny

Last edited by Jenny Fielder; 02 Feb 2023, 07:00.
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2371

#10

02 Feb 2023, 07:30

Originally posted by Jenny Fielder View Post

Thank you for this useful thread. I am also comparing the means across 3 groups (accounting for survey design). For ease, I will use the same variable names in the examples above. My main question is with the line of code to test for differences between means (including the Bonferroni correction)

test 1b.race=2.race=3.race, mtest(b)

This only tests the difference between group 1b and 2, and 1b and 3. How can I also add the comparison between group 2 and 3? I realise I could run the comparison between group 2 and 3 in a separate test command, but this then isn't included in the Bonferroni correction. I could run them all unadjusted and manually correct the p-values myself (multiply by number of comparisons), but my comparison of group 2 v 3 in my case p = 0.572, so multiplying by 3 does not seem correct? My current solution would be to adjust the alpha criterion to evaluate the unadjusted p-values to (i.e. 0.05/3 = 0.0167), but I wondered if there's another solution using the test command to get all 3 pairwise comparisons with their Bonferroni adjusted p values?

Thank you,
Jenny

Hi Jenny, welcome. In future, please start a new thread instead of resurrecting old and answered threads. This makes it easier for people to search for questions that may help them later on.

The PDF documentation for -test- shows some examples of the type of syntax you need when you want to use multiplicity adjustment. If you wish to see the adjusted differences, their confidence intervals and p-values, then you could use -margins-.

[code]

Code:

webuse census3, clear

regress brate i.region
// just perform adjustment multiple comparisons, showing only adjusted p-values and test statistics
test (1.region=2.region) (1.region=3.region) (1.region=4.region) (2.region=3.region) (2.region=4.region) (3.region=4.region), mtest(bonferroni)

// alternative method, but show the estimate of each contrast
margins i.region, pwcompare(effect) mcomp(bonferroni)

Selected output

Code:

. regress brate i.region
( output omitted)

------------------------------------------------------------------------------
       brate | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      region |
   NCentral  |   30.86111   9.785122     3.15   0.003     11.16468    50.55754
      South  |   25.67361   9.246071     2.78   0.008     7.062236    44.28499
       West  |   59.34188   9.622477     6.17   0.000     39.97284    78.71092
             |
       _cons |   136.8889   7.396857    18.51   0.000     121.9998     151.778
------------------------------------------------------------------------------

. test (1.region=2.region) (1.region=3.region) (1.region=4.region) (2.region=3.region) (2.region=4.region) (3.region=4.region), mtest(bonferroni)

(output omitted)

--------------------------------------
       |    F(df,46)     df      p > F
-------+------------------------------
   (1) |        9.95      1     0.0170*
   (2) |        7.71      1     0.0475*
   (3) |       38.03      1     0.0000*
   (4) |        0.37      1     1.0000*
   (5) |       10.28      1     0.0147*
   (6) |       16.51      1     0.0011*
-------+------------------------------
   All |       13.23      3     0.0000
--------------------------------------
* Bonferroni-adjusted p-values

. margins i.region, pwcompare(effect) mcomp(bonferroni)

Pairwise comparisons of adjusted predictions                Number of obs = 50
Model VCE: OLS

Expression: Linear prediction, predict()

---------------------------
             |    Number of
             |  comparisons
-------------+-------------
      region |            6
---------------------------

------------------------------------------------------------------------------------
                   |            Delta-method    Bonferroni           Bonferroni
                   |   Contrast   std. err.      t    P>|t|     [95% conf. interval]
-------------------+----------------------------------------------------------------
            region |
   NCentral vs NE  |   30.86111   9.785122     3.15   0.017     3.881821     57.8404
      South vs NE  |   25.67361   9.246071     2.78   0.048     .1805786    51.16664
       West vs NE  |   59.34188   9.622477     6.17   0.000     32.81103    85.87273
South vs NCentral  |    -5.1875   8.474164    -0.61   1.000    -28.55225    18.17725
 West vs NCentral  |   28.48077   8.883337     3.21   0.015     3.987856    52.97368
    West vs South  |   33.66827   8.285826     4.06   0.001      10.8228    56.51374
------------------------------------------------------------------------------------

Comment

Jenny Fielder

Join Date: Feb 2023

Posts: 2
#11

02 Feb 2023, 08:32

Thank you Leonardo!
Comment

Young Smith

Join Date: Dec 2023
Posts: 2

#12

07 Feb 2024, 06:23

Originally posted by Carlo. Lazzaro View Post

Steven:
there's nothing that -regress- can do worse than -anova-.
I would go as in the following toy-example:

Code:

. use http://www.stata-press.com/data/r16/nhanes2.dta

. svy: regress weight i.race
(running regress on estimation sample)

Survey: Linear regression

Number of strata = 31 Number of obs = 10,351
Number of PSUs = 62 Population size = 117,157,513
Design df = 31
F( 2, 30) = 21.04
Prob > F = 0.0000
R-squared = 0.0106

------------------------------------------------------------------------------
| Linearized
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
race |
Black | 3.308829 .8027919 4.12 0.000 1.671524 4.946134
Other | -7.697939 1.637091 -4.70 0.000 -11.03681 -4.35907
|
_cons | 71.77969 .1784263 402.29 0.000 71.41578 72.14359
------------------------------------------------------------------------------

. mat list e(b)

e(b)[1,4]
1b. 2. 3.
race race race _cons
y1 0 3.3088294 -7.6979386 71.779686

. test 1b.race=2.race=3.race

Adjusted Wald test

( 1) 1b.race - 2.race = 0
( 2) 1b.race - 3.race = 0

F( 2, 30) = 21.04
Prob > F = 0.0000

.

Dear Carlo
Thank you for this useful method. When applying this method , I encountered some issues and would like to seek your advice.
svy: regress RIDAGEYR i.g4
mat list e(b)
test 1b.g4 =2.g4 =3.g4=4.g4=5.g4
After running the above three lines of code，I got the result of Prob > F = 0.0014
When I group the RIDAGEYR variable to create a new variable called 'age.' and run the code-- svy: tabulate age g4 , I got the result of P = 0.1398
The results of two consecutive difference tests on the same pair of variables are opposite. Does this make sense？

Thank you,
young

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17606
#13

07 Feb 2024, 11:49

Young:
1) you're running two different kind of tests;
2) the second test (based on -svy: tabulate twoway-) test the homogeneity of g4 distribution across -age- strata (I guess so, as I do not have excerpt of your dataset). In your case, the test cannot reject the null (homogeneity).

Kind regards,
Carlo
(StataNow 18.5)
Comment
Young Smith

Join Date: Dec 2023

Posts: 2
#14

19 Feb 2024, 05:52

Originally posted by Carlo Lazzaro View Post

Young:
1) you're running two different kind of tests;
2) the second test (based on -svy: tabulate twoway-) test the homogeneity of g4 distribution across -age- strata (I guess so, as I do not have excerpt of your dataset). In your case, the test cannot reject the null (homogeneity).

I got it. Thank you Carlo!
Comment

Announcement