Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical power to see a difference across two or three groups

    Hi Stata folks,

    I've asked this question before, and am still struggling to find the right stata code (https://www.statalist.org/forums/for...nd-sample-size). I find power and sample size calculations hard to understand in Stata. I'm trying to estimate power to see a difference across groups, given a set sample size, for a proposal we're writing.

    I have a hypothetical 200-person cohort, with a binary outcome (which I'll call "disease", it's yes/no) that affects 50% of my cohort (i.e., 100 are "disease=yes", 100 are "disease=no"). I have another binary variable in my cohort, for the sake of argument, we'll call this age. I'll assume the age groups are similar in size (i.e., 100 young and 100 old). What code would you recommend to allow me to see what statistical power we have to see a difference in my yes/no outcome when comparing young to old?

    For the sake of argument, what's our power if, in our data, the difference is 10% (i.e., 45% of youngsters have the disease, while 55% of elders have the disease).

    How could I graph the power, if we assume the difference is 20%, 15%, 10%, or 5%?

    How might this change if I set the binary outcome to a prevalence of 40%? 60%?

    Thanks in advance!

  • #2
    Might help get you started.

    Code:
    clear all
    
    matrix R = J(12,3,.)
    local i 1
    forv outer = 0.4(0.1)0.6 {
        forv inner = 0.05(0.05)0.20 {
        local diff = `outer'+`inner'    
        qui power twoproportions `outer' `diff', n(200) nratio(1) 
        matrix R[`i',1] = `outer'
        matrix R[`i',2] = `inner'
        matrix R[`i',3] = r(power)
        di `i'
        local i = `i'+1
        }
    }
    capture drop R*
    svmat R
    lgraph R3 R2, by(R1)

    Comment


    • #3
      Thank you for the fast reply. It's coding that's a bit beyond my skill level, but if I understand this correctly, the y axis of the resultant graph is power (ranging from about 0.1 up to around 0.8, if I'm reading this correctly) and the x axis is the hypothetical "true" difference between age groups, and the legend shows the overall prevalence of the condition, 40%, 50%, or 60%.

      I was expecting a bigger difference as we allow the overall prevalence to vary, but it seems the power is almost identical for the conditions where we set the prevalence to 40% or 50%, and only improves modestly where we set the prevalence to 60%.

      Can anyone recommend videos or reading material to help me learn about matrixes? I would love to be able to follow this code better, as it's well above my skill level.

      Comment


      • #4
        Here, the matrix is just a box to store stuff in. There are other ways to do it, some perhaps better, but I'm a creature of habit until someone here shows me a much better way to do it.

        You could just run the power command repeated, changing the values and writing them down or coding them in Excel. This just automates the process and stores the results in the matrix R for later use in graphing.

        Yes. The graph shows the power (y axis) at various levels of a means difference. The three lines are at different means of prevalence. 0.4 to 0.6 is tightly around 0.5, so I wouldn't expect too much difference in power between them. Big differences in power will come more from sample size then the means prevalence.

        I think twoproportions is what you want. But check to make sure.



        Comment


        • #5
          Hi, I'm still trying to wrap my head around the matrix here.

          Above, we have
          matrix R = J(12,3,.) to allow us to look at three levels of prevalence (0.4, 0.5, 0.6) and 4 levels of presumed "true" difference (0.05, 0.10, 0.15, and 0.20)

          How do I change the matrix above if I want to look at four levels of prevalence (e.g., 0.3, 0.4, 0.5 and 0.6)? What about 5 levels of difference (e.g., 0.05, 0.10, 0.15, 0.20, and 0.25). I changed matrix R = J(12,3,.) to matrix R = J(16,4,.) and the graph did not graph the additional line. Any recommendations on how to change the code to allow flexibility in my assumptions?

          Thanks again for your help.

          Comment


          • #6
            Code:
            clear all
            
            matrix R = J(20,3,.)
            local i 1
            forv outer = 0.3(0.1)0.6 {
                forv inner = 0.05(0.05)0.25 {
                local diff = `outer'+`inner'    
                qui power twoproportions `outer' `diff', n(200) nratio(1) 
                matrix R[`i',1] = round(`outer',0.01)
                matrix R[`i',2] = `inner'
                matrix R[`i',3] = r(power)
                di `i'
                local i = `i'+1
                }
            }
            capture drop R*
            svmat R
            replace R1 = round(R1,0.01)
            lgraph R3 R2, by(R1)

            Comment

            Working...
            X