Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculating the difference between two proportions (independent samples) and generating confidence intervals, from aggregate data

    Hello,

    I have aggregate data for men and women in different social class categories (10 lines of data in total):
    var1: proportion a
    var2: sample size a
    var3: proportion b
    var4: sample size b
    var5: proportion a - proportion b
    The proportions are derived and saved from separate margins models, so they are predicted proportions adjusted for covariates. Sample a and sample b are independent.

    I want to calculate confidence intervals for var5. I can do this in excel with a formula but is there a way to do this in stata?

    Thank you.

  • #2
    Joanna:
    the following toy-example springs to my mind:
    Code:
    . prtesti 100 .10 100 .15
    
    Two-sample test of proportions                     x: Number of obs =      100
                                                       y: Number of obs =      100
    ------------------------------------------------------------------------------
                 |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
               x |         .1        .03                      .0412011    .1587989
               y |        .15   .0357071                      .0800153    .2199847
    -------------+----------------------------------------------------------------
            diff |       -.05   .0466369                     -.1414066    .0414066
                 |  under H0:   .0467707    -1.07   0.285
    ------------------------------------------------------------------------------
            diff = prop(x) - prop(y)                                  z =  -1.0690
        H0: diff = 0
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(Z < z) = 0.1425         Pr(|Z| > |z|) = 0.2850          Pr(Z > z) = 0.8575
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thanks Carlo,

      I did wonder if immediate commands could help me....but i have many % to compare (not only the 10 lines i mention above - i have to repeat this for multiple aggregate tables) so i dont want to type each one manually. Any ideas how to do this for the whole dataset and export or save the results to plot later?

      Comment


      • #4
        Originally posted by Joanna Davies View Post
        The proportions are derived and saved from separate margins models, so they are predicted proportions adjusted for covariates. Sample a and sample b are independent. I want to calculate confidence intervals for var5. I can do this in excel with a formula but is there a way to do this in stata?
        I think you should show the full commands that you used to derive -margins-. I assume by 'separate margins models' you meant Stata's -margins' suit followed by some sort of regression commands. If that is true, have you tried -lincom- command that can be used derive differences with 95% CIs post regression.
        Roman

        Comment


        • #5
          In #1, Joanna Davies wrote:

          The proportions are derived and saved from separate margins models, so they are predicted proportions adjusted for covariates. Sample a and sample b are independent.

          I want to calculate confidence intervals for var5. I can do this in excel with a formula but is there a way to do this in stata?

          Hello Joanna. What equation are you using (or would you use) in Excel? I am wondering how (or if) it takes into account the adjustment for covariates that you described. Thanks for clarifying.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Hi Roman and Bruce - the code

            Model 1
            Code:
            poisson y x1 x2 x3 if x4=1, vce(robust)
            margins v1, atmeans over(v4) saving("`tmp'")
            Model 2
            Code:
            poisson y x1 x2 x3 if x4=1, vce(robust)
            margins v1, atmeans over(v4) saving("`tmp'")
            Then i just merge the 2 files that contain the margins predicted probability estimates - ie. the predicted adjusted percentages. The first model is for 2019 and the second for 2020 - i want to produce a bar graph that shows the difference or increase between 2019 and 2020 for each category of v1

            I dont think that lincom can help but maybe im missing something

            Bruce - the excel formula would just use sample size and the predicted % - similar to an online calculator (https://www.medcalc.org/calc/compari...roportions.php) - it wouldnt take account of the adjustments - i just want the difference between the two adjusted proportions (which i can calculate easily) but i also want the CI which is the bit im struggling to do in stata without using the immediate approach that Carlo suggested. I basically want to use the immediate calculation 'prtesti' - but i want to apply it to several rows of data and without manually typing in the values. I thought maybe a loop could help but im stuck.

            thanks

            Comment


            • #7
              A loop may look something like this.

              Code:
              clear
              input obs1 p1 obs2 p2
              100 .10 100 .15
              100 .09 300 .30
              90 .10 20 .25
              200 .01 200 .22
              end
              
              gen diff=.
              gen ll=.
              gen ul=.
              forval i=1/`=_N'{
                  prtesti `=obs1[`i']' `=p1[`i']' `=obs2[`i']' `=p2[`i']'
                  replace diff=  r(P_diff) in `i'
                  replace ll=   r(lb_diff) in `i'
                  replace ul=   r(ub_diff) in `i'
              }
              Res.:

              Code:
              . l
              
                   +--------------------------------------------------------+
                   | obs1    p1   obs2    p2   diff          ll          ul |
                   |--------------------------------------------------------|
                1. |  100    .1    100   .15   -.05   -.1414066    .0414066 |
                2. |  100   .09    300    .3   -.21   -.2863883   -.1336117 |
                3. |   90    .1     20   .25   -.15   -.3496375    .0496375 |
                4. |  200   .01    200   .22   -.21   -.2690434   -.1509566 |
                   +--------------------------------------------------------+

              Comment


              • #8
                Thank you Andrew! - this looks like just what i need. Frustratingly, i cant try it until tomorrow afternoon - i'll let you know how i get on

                Comment


                • #9
                  #6, I failed to understand commands you provided. v1 is not in your listed as a covariate in your regression command, how margins recognizes it as a covariate? you also are using the -if qualifier- in a wrong way and Stata should throw error message. Please read the forum policies on providing codes and data examples
                  Last edited by Roman Mostazir; 20 Jan 2022, 11:35.
                  Roman

                  Comment


                  • #10
                    If I understand the question correctly, I see a distinct problem: The default values predicted by -margins- for -poisson- are documented as the "number of events," not "predicted probability estimates." You can get predicted probabilities in this context with the -pr(n)- option, where n is the particular number of events of interest. That issue aside, the formulae used by -prtest- (or that in in the webpage mentioned or in Excel) do not apply at all, as they presume a simple count of number of successes, rather than the non-linear model, with covariates, involved in -poisson-. Unless I'm misunderstanding something, we're definitely going in the wrong direction here.

                    I think a different -poisson- model is needed, and a different -margins- command, but exactly what those might be would require seeing some example data.

                    Comment


                    • #11
                      Adding to what Mike wrote in #10, if you want to report risk differences using a Poisson model, I think you would need to use -glm- with a Poisson distribution and an identity (rather than log) link. But I don't think the pr(n) option (for -margins-) that Mike mentioned is available following -glm-. So it could be a bit tricky. HTH.
                      --
                      Bruce Weaver
                      Email: [email protected]
                      Version: Stata/MP 18.5 (Windows)

                      Comment


                      • #12
                        Hi all, thanks for your responses - some updates/responses from me below:

                        Andrew - thank you, the loop worked and gave me just what i was after - and now i know how to use immediate commands with a loop!

                        Roman - sorry error in my code in #6 v1 should say x1 and model 2 should say if x4==2.

                        Mike and Bruce - i think what you are getting at is that i) a simple comparison of proportions using prtest may not be appropriate when the proportions are derived using a model, and ii) margins after poisson may not be giving me what i want. To give a bit more context, i want the proportion of death at home (v death in hospital) for 2019 (model 1) and 2020 (model 2), stratified by sex and deprivation quintile, adjusted for age. I think margins after poisson gives me in effect, the predicted adjusted proportion - am i wrong?

                        Re appropriateness of comparing the adjusted % for 2019 and 2020 - the comparison is for descriptive purposes only but i do take on board the problem you flag Mike.




                        Comment


                        • #13
                          Hi Joanna. Are you able to provide a sample dataset (via -dataex-) showing exactly what data you do have, and more importantly what you do not have? I am wondering, for example if you have the SEs from the -margins- output you mentioned. If you have those SEs, and if the 2019 and 2020 estimates are independent, you can compute the SE of the difference.
                          --
                          Bruce Weaver
                          Email: [email protected]
                          Version: Stata/MP 18.5 (Windows)

                          Comment


                          • #14
                            Hi Bruce,

                            there are restrictions on the data im using so i cant post any real data. The data i have is all from the margins saving temp file - so i do have the SE of the margin estimate, as well as the p value and CI. For descriptive purposes im leaning more towards just presenting the difference between the proportions and leave out the CI - we may do something more analytical further down the line. But thank you for the helpful discussions.

                            Comment


                            • #15
                              If you have the two SEs, and if the two point estimates are independent, the SE of the difference = SQRT(SE12 + SE22).
                              --
                              Bruce Weaver
                              Email: [email protected]
                              Version: Stata/MP 18.5 (Windows)

                              Comment

                              Working...
                              X