Differences in standard errors using 'mean' or 'proportion' for indicator variables?

Paul Jasper

Join Date: Jun 2014

Posts: 3
#1

Differences in standard errors using 'mean' or 'proportion' for indicator variables?

13 Jun 2014, 10:30

Dear Statalist,

my question is about the differences between the 'mean' and 'proportion' command in Stata, and whether the calculation of standard errors differs between these two commands.

For example, with a survey data set, I would like to calculate the proportion of female individuals in the sample, and the gender of individuals is coded in a dummy variable (say, 'gender' with 0==male, 1==female). I could then either run:

svy: mean gender

or

svy: proportion gender

Both would give me the same point estimate of the proportion of female students in the sample. Confidence intervals differ, though. But it seems that standard errors do not.

From the Stata documentation (r.pdf, page 1684) I get that with the 'proportion' command, Stata uses a logit transformation on the estimated proportion so that endpoints of the confidence intervals lie within 0 and 1. My statistics knowledge is limited, but I think that this does not affect standard errors. Is that correct?

In addition, if standard errors do not differ, this means that hypothesis testing after estimating proportions is also not affected, right?

Thank you!

Paul

PS: I am using Stata 13.0.
Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 686

13 Jun 2014, 12:49

Paul is correct, the logit transformation used in proportion does not
affect the reported standard error.

Paul then asks

In addition, if standard errors do not differ, this means that hypothesis
testing after estimating proportions is also not affected, right?

This is also correct, and easy to verify.

Consider the following minimal example using the auto data.

Code:

. sysuse auto
(1978 Automobile Data)
        
. svyset _n

      pweight: <none>
          VCE: linearized
  Single unit: missing
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: <zero>

. svy: proportion foreign 
(running proportion on estimation sample)

Survey: Proportion estimation
Number of strata =       1          Number of obs    =      74
Number of PSUs   =      74          Population size  =      74
                                    Design df        =      73

--------------------------------------------------------------
             |             Linearized
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
foreign      | 
    Domestic |   .7027027   .0534958      .5865827    .7974684
     Foreign |   .2972973   .0534958      .2025316    .4134173
--------------------------------------------------------------

. test _b[Foreign] = 0.25

Adjusted Wald test

 ( 1)  [foreign]Foreign = .25

       F(  1,    73) =    0.78
            Prob > F =    0.3795

. svy: mean foreign
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =      74
Number of PSUs   =      74          Population size  =      74
                                    Design df        =      73

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     foreign |   .2972973   .0534958      .1906803    .4039143
--------------------------------------------------------------

. test _b[foreign] = 0.25

Adjusted Wald test

 ( 1)  foreign = .25

       F(  1,    73) =    0.78
            Prob > F =    0.3795

Comment

Paul Jasper

Join Date: Jun 2014

Posts: 3
#3

16 Jun 2014, 05:23

Hi Jeff,

many thanks for your swift and clear reply!

Best

Paul
Comment

Announcement

Differences in standard errors using 'mean' or 'proportion' for indicator variables?

Comment

Comment