calculating the difference between two proportions (independent samples) and generating confidence intervals, from aggregate data

Joanna Davies

Join Date: Nov 2015

Posts: 57
#1

calculating the difference between two proportions (independent samples) and generating confidence intervals, from aggregate data

20 Jan 2022, 03:58

Hello,

I have aggregate data for men and women in different social class categories (10 lines of data in total):
var1: proportion a
var2: sample size a
var3: proportion b
var4: sample size b
var5: proportion a - proportion b
The proportions are derived and saved from separate margins models, so they are predicted proportions adjusted for covariates. Sample a and sample b are independent.

I want to calculate confidence intervals for var5. I can do this in excel with a formula but is there a way to do this in stata?

Thank you.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

20 Jan 2022, 06:10

Joanna:
the following toy-example springs to my mind:

Code:

. prtesti 100 .10 100 .15

Two-sample test of proportions                     x: Number of obs =      100
                                                   y: Number of obs =      100
------------------------------------------------------------------------------
             |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |         .1        .03                      .0412011    .1587989
           y |        .15   .0357071                      .0800153    .2199847
-------------+----------------------------------------------------------------
        diff |       -.05   .0466369                     -.1414066    .0414066
             |  under H0:   .0467707    -1.07   0.285
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -1.0690
    H0: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.1425         Pr(|Z| > |z|) = 0.2850          Pr(Z > z) = 0.8575

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Joanna Davies

Join Date: Nov 2015

Posts: 57
#3

20 Jan 2022, 06:34

Thanks Carlo,

I did wonder if immediate commands could help me....but i have many % to compare (not only the 10 lines i mention above - i have to repeat this for multiple aggregate tables) so i dont want to type each one manually. Any ideas how to do this for the whole dataset and export or save the results to plot later?
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#4

20 Jan 2022, 06:51

Originally posted by Joanna Davies View Post

The proportions are derived and saved from separate margins models, so they are predicted proportions adjusted for covariates. Sample a and sample b are independent. I want to calculate confidence intervals for var5. I can do this in excel with a formula but is there a way to do this in stata?

I think you should show the full commands that you used to derive -margins-. I assume by 'separate margins models' you meant Stata's -margins' suit followed by some sort of regression commands. If that is true, have you tried -lincom- command that can be used derive differences with 95% CIs post regression.

Roman
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#5

20 Jan 2022, 06:53

In #1, Joanna Davies wrote:

The proportions are derived and saved from separate margins models, so they are predicted proportions adjusted for covariates. Sample a and sample b are independent.

I want to calculate confidence intervals for var5. I can do this in excel with a formula but is there a way to do this in stata?

Hello Joanna. What equation are you using (or would you use) in Excel? I am wondering how (or if) it takes into account the adjustment for covariates that you described. Thanks for clarifying.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#6

20 Jan 2022, 08:46

Hi Roman and Bruce - the code

Model 1

Code:

poisson y x1 x2 x3 if x4=1, vce(robust) margins v1, atmeans over(v4) saving("`tmp'")

Model 2

Code:

poisson y x1 x2 x3 if x4=1, vce(robust) margins v1, atmeans over(v4) saving("`tmp'")

Then i just merge the 2 files that contain the margins predicted probability estimates - ie. the predicted adjusted percentages. The first model is for 2019 and the second for 2020 - i want to produce a bar graph that shows the difference or increase between 2019 and 2020 for each category of v1

I dont think that lincom can help but maybe im missing something

Bruce - the excel formula would just use sample size and the predicted % - similar to an online calculator (https://www.medcalc.org/calc/compari...roportions.php) - it wouldnt take account of the adjustments - i just want the difference between the two adjusted proportions (which i can calculate easily) but i also want the CI which is the bit im struggling to do in stata without using the immediate approach that Carlo suggested. I basically want to use the immediate calculation 'prtesti' - but i want to apply it to several rows of data and without manually typing in the values. I thought maybe a loop could help but im stuck.

thanks
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

20 Jan 2022, 09:17

A loop may look something like this.

Code:

clear
input obs1 p1 obs2 p2
100 .10 100 .15
100 .09 300 .30
90 .10 20 .25
200 .01 200 .22
end

gen diff=.
gen ll=.
gen ul=.
forval i=1/`=_N'{
    prtesti `=obs1[`i']' `=p1[`i']' `=obs2[`i']' `=p2[`i']'
    replace diff=  r(P_diff) in `i'
    replace ll=   r(lb_diff) in `i'
    replace ul=   r(ub_diff) in `i'
}

Res.:

Code:

. l

     +--------------------------------------------------------+
     | obs1    p1   obs2    p2   diff          ll          ul |
     |--------------------------------------------------------|
  1. |  100    .1    100   .15   -.05   -.1414066    .0414066 |
  2. |  100   .09    300    .3   -.21   -.2863883   -.1336117 |
  3. |   90    .1     20   .25   -.15   -.3496375    .0496375 |
  4. |  200   .01    200   .22   -.21   -.2690434   -.1509566 |
     +--------------------------------------------------------+

Comment

Joanna Davies

Join Date: Nov 2015

Posts: 57
#8

20 Jan 2022, 09:35

Thank you Andrew! - this looks like just what i need. Frustratingly, i cant try it until tomorrow afternoon - i'll let you know how i get on
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#9

20 Jan 2022, 10:33

#6, I failed to understand commands you provided. v1 is not in your listed as a covariate in your regression command, how margins recognizes it as a covariate? you also are using the -if qualifier- in a wrong way and Stata should throw error message. Please read the forum policies on providing codes and data examples

Last edited by Roman Mostazir; 20 Jan 2022, 10:35.

Roman
2 likes
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#10

20 Jan 2022, 13:39

If I understand the question correctly, I see a distinct problem: The default values predicted by -margins- for -poisson- are documented as the "number of events," not "predicted probability estimates." You can get predicted probabilities in this context with the -pr(n)- option, where n is the particular number of events of interest. That issue aside, the formulae used by -prtest- (or that in in the webpage mentioned or in Excel) do not apply at all, as they presume a simple count of number of successes, rather than the non-linear model, with covariates, involved in -poisson-. Unless I'm misunderstanding something, we're definitely going in the wrong direction here.

I think a different -poisson- model is needed, and a different -margins- command, but exactly what those might be would require seeing some example data.
3 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#11

20 Jan 2022, 16:08

Adding to what Mike wrote in #10, if you want to report risk differences using a Poisson model, I think you would need to use -glm- with a Poisson distribution and an identity (rather than log) link. But I don't think the pr(n) option (for -margins-) that Mike mentioned is available following -glm-. So it could be a bit tricky. HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#12

24 Jan 2022, 04:52

Hi all, thanks for your responses - some updates/responses from me below:

Andrew - thank you, the loop worked and gave me just what i was after - and now i know how to use immediate commands with a loop!

Roman - sorry error in my code in #6 v1 should say x1 and model 2 should say if x4==2.

Mike and Bruce - i think what you are getting at is that i) a simple comparison of proportions using prtest may not be appropriate when the proportions are derived using a model, and ii) margins after poisson may not be giving me what i want. To give a bit more context, i want the proportion of death at home (v death in hospital) for 2019 (model 1) and 2020 (model 2), stratified by sex and deprivation quintile, adjusted for age. I think margins after poisson gives me in effect, the predicted adjusted proportion - am i wrong?

Re appropriateness of comparing the adjusted % for 2019 and 2020 - the comparison is for descriptive purposes only but i do take on board the problem you flag Mike.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#13

24 Jan 2022, 06:38

Hi Joanna. Are you able to provide a sample dataset (via -dataex-) showing exactly what data you do have, and more importantly what you do not have? I am wondering, for example if you have the SEs from the -margins- output you mentioned. If you have those SEs, and if the 2019 and 2020 estimates are independent, you can compute the SE of the difference.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Joanna Davies

Join Date: Nov 2015

Posts: 57
#14

24 Jan 2022, 06:51

Hi Bruce,

there are restrictions on the data im using so i cant post any real data. The data i have is all from the margins saving temp file - so i do have the SE of the margin estimate, as well as the p value and CI. For descriptive purposes im leaning more towards just presenting the difference between the proportions and leave out the CI - we may do something more analytical further down the line. But thank you for the helpful discussions.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#15

24 Jan 2022, 07:12

If you have the two SEs, and if the two point estimates are independent, the SE of the difference = SQRT(SE₁² + SE₂²).

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment

Announcement

calculating the difference between two proportions (independent samples) and generating confidence intervals, from aggregate data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment