Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample size calculation of proportions with 4 independent groups

    Hello all,

    I am trying to calculate (as a favour to a friend) sample size for a project with 4 independent groups. The outcome is a proportion.

    From the earlier work the proportions in the four groups would be:

    p1 = 0.27
    p2 = 0.34
    p3 = 0.28
    p4 = 0. 32

    using
    Code:
    power twoportions
    is only useful for two proportions and despite searching I can't find a way to generalize to 4 groups (as one would with anova if these were means).

    I don't know if there is a more general way to do this or to use a Bonferonni correction.

    Many thanks for any assistance anyone can provide.

    Sincerely,

    Chris

  • #2
    Hi, Chris.

    I would approach this power calculation using Monte Carlo simulation. Hopefully someone can suggest a bit of code used in previous projects. You will need to correct the alpha level, but this may depend on the hypothesis to be tested. Power can be calculated assuming different hypotheses in your case.

    All the best,

    Tiago
    Last edited by Tiago Pereira; 07 Sep 2024, 15:10.

    Comment


    • #3
      +1 to Tiago's suggestion. Can you clarify whether the same individual could have a value of 1 in more than one group or does each individual contribute a 0/1 to just a single group?

      Comment


      • #4
        Originally posted by Erik Ruzek View Post
        +1 to Tiago's suggestion. Can you clarify whether the same individual could have a value of 1 in more than one group or does each individual contribute a 0/1 to just a single group?
        Each individual can be in only one group. The groups are completely separate and independent.

        Any help would be appreciated. Coding Monte Carlo simulation in Stata might be a bit beyond me.

        Comment


        • #5
          It is possible that somebody here will take the time to show you how to code a Monte Carlo simulation for this problem. Or perhaps not. Unfortunately Stata's official power and sample size commands do not deal with this situation, nor do any of the user-written commands that I'm aware of. And, I'm also not aware of any other freeware sample-size calculators that handle it. (GPower, for example, does not.)

          There are more advanced power and sample size programs out there that do cover this, and many other situations that Stata and freeware calculators do not. These tend to be pretty expensive. Mostly these programs are purchased by professional statisticians who face this kind of problem with sufficient frequency to warrant the investment. If you do not get a taker on writing the Monte Carlo simulation, I suggest you ask around in your own institution to see if there is somebody who can do this for you. If they have advanced sample size software it would only take them a few minutes to enter the parameters of your problem into the software and get an almost instantaneous answer. If you are at an academic research institution, or any other organization that funds some or all of their research with grants, there are surely such people there: you can't get a grant funded without power/sample size calculations, and there are going to be at least some people using complicated designs that need this.

          By the way, whether you end up doing a Monte Carlo simulation, or seeking an answer from a nearby statistician with advanced power/sample size software, you absolutely must provide some key information that was left out of #1: are the four groups going to contain equal numbers of participants? If not, what are the ratios of the sizes of the four groups? What significance level will be used: .05 is the thoughtless conventional one, but any number between 0 and 1 is possible, and often 0.05 is simply not a reasonable choice for this. Finally, what level of statistical power is desired? Conventional values are 0.8 and 0.9, but, again, there may be good reasons to use a different value.

          Comment


          • #6
            Thanks Clyde. I was trying to help out a colleague but I will punt the problem back to them and tell them that they will have to seek out someone else to do their sample size calculation for them.

            Best wishes,

            Chris

            Comment


            • #7
              Chuck Huber (StataCorp) has posted 4 items on Monte Carlo simulations for power analysis to the Stata Blog. Meanwhile, you wrote in #1:
              From the earlier work the proportions in the four groups would be:

              p1 = 0.27
              p2 = 0.34
              p3 = 0.28
              p4 = 0. 32
              If you mean that those are the proportions that were observed in a previous study, I would caution you that sometimes, the smallest effect size of interest may not match exactly the observed effect size from an earlier study.

              Finally, those 4 proportions do not vary a whole lot, so I think you'll need a very large sample size. Unless I missed it, you did not specify whether you intend the group sizes to be the same. I assumed you did, and plugged those proportions into PASS 2021. The output is in the attached PDF. I hope this helps.

              Cheers,
              Bruce

              Chris Labos PASS Output.pdf
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Originally posted by Chris Labos View Post
                I was trying to help out a colleague but I will punt the problem back to them and tell them that they will have to seek out someone else to do their sample size calculation for them.
                Here.
                Code:
                version 18.0
                
                clear *
                
                // seedem
                set seed 139232804
                
                program define simEm, rclass
                    version 18.0
                    syntax , Means(name) [n(integer 100) Alpha(real 0.05)]
                
                    drop _all
                    set obs `=colsof(`means')'
                    generate byte grp = _n - 1
                
                    generate double pro = `means'[1, 1]
                    forvalues i = 2/`=_N' {
                        replace pro = `means'[1, `i'] in `i'
                    }
                
                    generate int trials = `n'
                    generate int successes = rbinomial(trials, pro)
                
                    glm successes i.grp, family(binomial trials) link(logit)
                    testparm i.grp
                    return scalar pos = r(p) < `alpha'
                end
                
                tempname Means
                matrix input `Means' = (0.27 0.34 0.28 0.32)
                
                power twoproportions 0.27 0.34
                local from = round(r(init) - 25, 25)
                local to = round(r(init) + 275, 25)
                
                frame create ForGraph int n double p
                
                forvalues n = `from'(50)`to' {
                    quietly simulate pos = r(pos), reps(3000): simEm , m(`Means') n(`n')
                    summarize pos, meanonly
                    display in smcl as text _newline(1) ///
                        "Sample size (per group) = `n' Power = " as result %04.2f r(mean)
                    frame post ForGraph (`n') (r(mean))
                }
                
                frame ForGraph: graph twoway connected p n, sort scheme(s2color) ///
                    lcolor(black) mcolor(black) mfcolor(white) msize(medium) ///
                    ytitle(Power) ylabel( , format(%04.2f) angle(horizontal) nogrid) ///
                    xtitle(Sample size per group)
                quietly graph export Pow4prop.png
                
                exit
                Complete do-file and log-file attached if you want to customize it, for example, for different levels of Type I error rate, group-size ratio and the like.

                Click image for larger version

Name:	Pow4prop.png
Views:	1
Size:	23.6 KB
ID:	1763343
                Attached Files

                Comment


                • #9
                  Bruce and Joseph have nailed it. Here's a slightly different approach that yields similar numbers. My results differ likely due to the randomness by which I assign members to groups in the gen group = runiform() line and then I introduce a little more randomness when generating the outcome variable. To the extent that one wanted to eliminate those sources of randomness in the simulation, they could do so by altering the code:
                  Code:
                  capture program drop simprop
                  program simprop, rclass
                      version 16.1
                      // Define input parameters and default values
                       syntax, n(integer)          ///  Sample size - required, all optional in []
                            [ alpha(real 0.05)    ///  Alpha level, default is .05
                              p1(real .27)        ///  Proportion in grp 1
                              p2(real .35)        ///  Proportion in grp 2
                              p3(real .29)         ///  Proportion in grp 3
                              p4(real .33)]        //  Proportion in grp 4
                      
                      //    Sample size initiation
                      set obs `n'
                      
                      // Generate approximately equal-sized groups
                      gen group = runiform()
                      replace group = 1 if group < 0.25  // 25% of observations in group 1
                      replace group = 2 if group >= 0.25 & group < 0.5  // 25% in group 2
                      replace group = 3 if group >= 0.5 & group < 0.75  // 25% in group 3
                      replace group = 4 if group >= 0.75 & group < 1  // 25% in group 4
                  
                      // Generate binary outcome with different proportions for each group
                      gen outcome = .
                      replace outcome = 1 if group == 1 & runiform() <= `p1'  
                      replace outcome = 1 if group == 2 & runiform() <= `p2'  
                      replace outcome = 1 if group == 3 & runiform() <= `p3'  
                      replace outcome = 1 if group == 4 & runiform() <= `p4'  
                      replace outcome = 0 if outcome == .
                  
                      // Test null hypothesis
                      tab outcome group, chi2
                      clear
                      
                      // Return results
                      return scalar reject = (r(p)<`alpha')    
                  end
                  
                  ** Run a single iteration
                  simprop, n(2000) alpha(.05)
                  return list
                  
                  
                  ** Run 500 simulations (reps(500)) for a given range of sample sizes - adjust accordingly
                  foreach samp_size of numlist 2000(100)2800 {
                      qui simulate reject=r(reject), reps(500) seed(12345):              ///
                          simprop, n(`samp_size') alpha(0.05)         // input parameters
                          di "sample size = "`samp_size'
                          sum reject
                           }
                  Last edited by Erik Ruzek; 09 Sep 2024, 14:34. Reason: Note about additional source of randomness; edited code

                  Comment


                  • #10
                    Originally posted by Erik Ruzek View Post
                    Bruce and Joseph have nailed it. Here's a slightly different approach that yields similar numbers. My results differ likely due to the randomness by which I assign members to groups in the gen group = runiform() line and then I introduce a little more randomness when generating the outcome variable. To the extent that one wanted to eliminate those sources of randomness in the simulation, they could do so by altering the code:
                    Hello Erik,

                    Thank you for the help back in September. I had one follow-up question on your code. I was trying to use it in a related but different project for the same friend. This one had different results and therefore a smaller sample size but when I tried to modify the code I kept getting the following error:

                    observation number out of range
                    Observation number must be between 500 and 2,147,483,619. (Observation numbers
                    are typed without commas.)
                    an error occurred when simulate executed simprop
                    r(198);
                    I realize it's something to do with the number of observations but I am at a loss as to why it is happening.

                    Thanks for the help,

                    Chris

                    Code:
                    capture program drop simprop
                    program simprop, rclass
                        version 16.1
                        // Define input parameters and default values
                         syntax, n(integer)          ///  Sample size - required, all optional in []
                              [ alpha(real 0.05)    ///  Alpha level, default is .05
                                p1(real .795)        ///  Proportion in grp 1
                                p2(real .3137)        ///  Proportion in grp 2
                                p3(real .5356)         ///  Proportion in grp 3
                                p4(real .0791)]        //  Proportion in grp 4
                        
                        //    Sample size initiation
                        set obs `n'
                        
                        // Generate approximately equal-sized groups
                        gen group = runiform()
                        replace group = 1 if group < 0.25  // 25% of observations in group 1
                        replace group = 2 if group >= 0.25 & group < 0.5  // 25% in group 2
                        replace group = 3 if group >= 0.5 & group < 0.75  // 25% in group 3
                        replace group = 4 if group >= 0.75 & group < 1  // 25% in group 4
                    
                        // Generate binary outcome with different proportions for each group
                        gen outcome = .
                        replace outcome = 1 if group == 1 & runiform() <= `p1'  
                        replace outcome = 1 if group == 2 & runiform() <= `p2'  
                        replace outcome = 1 if group == 3 & runiform() <= `p3'  
                        replace outcome = 1 if group == 4 & runiform() <= `p4'  
                        replace outcome = 0 if outcome == .
                    
                        // Test null hypothesis
                        tab outcome group, chi2
                        clear
                        
                        // Return results
                        return scalar reject = (r(p)<`alpha')    
                    end
                    
                    ** Run a single iteration
                    simprop, n(25) alpha(.05)
                    return list
                    
                    
                    ** Run 500 simulations (reps(500)) for a given range of sample sizes - adjust accordingly
                    foreach samp_size of numlist 25 30 35 40 {
                        qui simulate reject=r(reject), reps(500) seed(12345):              ///
                            simprop, n(`samp_size') alpha(0.05)         // input parameters
                            di "sample size = "`samp_size'
                            sum reject
                             }

                    Comment


                    • #11
                      Just throw in a
                      Code:
                      drop _all
                      before the line with
                      Code:
                      set obs `n'
                      and I think that should work.

                      Comment


                      • #12
                        Yup, that did it. Thanks!!

                        Comment

                        Working...
                        X