difference-of-means test for overlapping groups

David Radwin

Join Date: Mar 2014
Posts: 368

difference-of-means test for overlapping groups

24 Oct 2017, 10:45

I need to perform a difference-of-means test for overlapping groups--that is, where individuals can be in group A, group B, both, or neither. Using the nlsw88.dta dataset, for example, I want to test whether the mean wage for married individuals is statistically significantly different from the mean wage for individuals living in the South. The tricky part is that some individuals are both married and living in the South.

Is the following approach using regress, suest, and test appropriate? If not, is there a better alternative?

Code:

. sysuse nlsw88
(NLSW, 1988 extract)

. tab married south // note overlap

           |    lives in south
   married |         0          1 |     Total
-----------+----------------------+----------
    single |       464        340 |       804
   married |       840        602 |     1,442
-----------+----------------------+----------
     Total |     1,304        942 |     2,246


. quietly regress wage married

. estimates store eq1

. quietly regress wage south

. estimates store eq2

. suest eq1 eq2, coeflegend

Simultaneous results for eq1, eq2

                                                Number of obs     =      2,246

------------------------------------------------------------------------------
             |      Coef.  Legend
-------------+----------------------------------------------------------------
eq1_mean     |
     married |  -.4887873  _b[eq1_mean:married]
       _cons |   8.080765  _b[eq1_mean:_cons]
-------------+----------------------------------------------------------------
eq1_lnvar    |
       _cons |   3.499106  _b[eq1_lnvar:_cons]
-------------+----------------------------------------------------------------
eq2_mean     |
       south |  -1.514791  _b[eq2_mean:south]
       _cons |   8.402271  _b[eq2_mean:_cons]
-------------+----------------------------------------------------------------
eq2_lnvar    |
       _cons |   3.483747  _b[eq2_lnvar:_cons]
------------------------------------------------------------------------------

. test [eq1_mean]married + [eq1_mean]_cons = [eq2_mean]south + [eq2_mean]_cons

 ( 1)  [eq1_mean]married + [eq1_mean]_cons - [eq2_mean]south - [eq2_mean]_cons = 0

           chi2(  1) =   17.47
         Prob > chi2 =    0.0000

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Tags: Overlapping, test

David Radwin

Join Date: Mar 2014
Posts: 368

27 Jun 2018, 16:16

For posterity's sake, I think I have an answer to my own question. A statistician colleague suggested a solution using the statistical package SUDAAN, which I was able to recreate in Stata.

It is largely the same as the example above, but the code is slightly simpler and yields a t-statistic instead of a chi-square statistic. The p-value is very similar but not identical.

Code:

. sysuse nlsw88
(NLSW, 1988 extract)

. svyset _n

      pweight: <none>
          VCE: linearized
  Single unit: missing
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: <zero>

. svy: regress wage if married == 1
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs     =      1,442
Number of PSUs     =     1,442                  Population size   =      1,442
                                                Design df         =      1,441
                                                F(   0,   1441)   =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0000

------------------------------------------------------------------------------
             |             Linearized
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   7.591978   .1421835    53.40   0.000     7.313069    7.870887
------------------------------------------------------------------------------

. estimates store eq1

. svy: regress wage if south == 1
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs     =        942
Number of PSUs     =       942                  Population size   =        942
                                                Design df         =        941
                                                F(   0,    941)   =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0000

------------------------------------------------------------------------------
             |             Linearized
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |    6.88748   .1721331    40.01   0.000     6.549671    7.225289
------------------------------------------------------------------------------

. estimates store eq2

. suest eq1 eq2

Simultaneous survey results for eq1, eq2

Number of strata   =         1                  Number of obs     =      1,782
Number of PSUs     =     1,782                  Population size   =      1,782
                                                Design df         =      1,781

------------------------------------------------------------------------------
             |             Linearized
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eq1          |
       _cons |   7.591978   .1421741    53.40   0.000     7.313132    7.870824
-------------+----------------------------------------------------------------
eq2          |
       _cons |    6.88748     .17209    40.02   0.000     6.549961       7.225
------------------------------------------------------------------------------

. lincom [eq1]_cons - [eq2]_cons, noci

 ( 1)  [eq1]_cons - [eq2]_cons = 0

-----------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|
-------------+---------------------------------------
         (1) |   .7044979   .1685593     4.18   0.000
-----------------------------------------------------

Disclosure: SUDAAN is produced by RTI International, my employer.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Announcement

difference-of-means test for overlapping groups

Comment