Statistical power to see a difference across two or three groups

Matt Price

Join Date: Aug 2023

Posts: 43
#1

Statistical power to see a difference across two or three groups

21 Jun 2024, 13:20

Hi Stata folks,

I've asked this question before, and am still struggling to find the right stata code (https://www.statalist.org/forums/for...nd-sample-size). I find power and sample size calculations hard to understand in Stata. I'm trying to estimate power to see a difference across groups, given a set sample size, for a proposal we're writing.

I have a hypothetical 200-person cohort, with a binary outcome (which I'll call "disease", it's yes/no) that affects 50% of my cohort (i.e., 100 are "disease=yes", 100 are "disease=no"). I have another binary variable in my cohort, for the sake of argument, we'll call this age. I'll assume the age groups are similar in size (i.e., 100 young and 100 old). What code would you recommend to allow me to see what statistical power we have to see a difference in my yes/no outcome when comparing young to old?

For the sake of argument, what's our power if, in our data, the difference is 10% (i.e., 45% of youngsters have the disease, while 55% of elders have the disease).

How could I graph the power, if we assume the difference is 20%, 15%, 10%, or 5%?

How might this change if I set the binary outcome to a prevalence of 40%? 60%?

Thanks in advance!
Tags: None

George Ford

Join Date: Aug 2014
Posts: 3148

21 Jun 2024, 14:11

Might help get you started.

Code:

clear all

matrix R = J(12,3,.)
local i 1
forv outer = 0.4(0.1)0.6 {
    forv inner = 0.05(0.05)0.20 {
    local diff = `outer'+`inner'    
    qui power twoproportions `outer' `diff', n(200) nratio(1) 
    matrix R[`i',1] = `outer'
    matrix R[`i',2] = `inner'
    matrix R[`i',3] = r(power)
    di `i'
    local i = `i'+1
    }
}
capture drop R*
svmat R
lgraph R3 R2, by(R1)

Comment

Matt Price

Join Date: Aug 2023

Posts: 43
#3

21 Jun 2024, 17:24

Thank you for the fast reply. It's coding that's a bit beyond my skill level, but if I understand this correctly, the y axis of the resultant graph is power (ranging from about 0.1 up to around 0.8, if I'm reading this correctly) and the x axis is the hypothetical "true" difference between age groups, and the legend shows the overall prevalence of the condition, 40%, 50%, or 60%.

I was expecting a bigger difference as we allow the overall prevalence to vary, but it seems the power is almost identical for the conditions where we set the prevalence to 40% or 50%, and only improves modestly where we set the prevalence to 60%.

Can anyone recommend videos or reading material to help me learn about matrixes? I would love to be able to follow this code better, as it's well above my skill level.
Comment
George Ford

Join Date: Aug 2014

Posts: 3148
#4

21 Jun 2024, 17:44

Here, the matrix is just a box to store stuff in. There are other ways to do it, some perhaps better, but I'm a creature of habit until someone here shows me a much better way to do it.

You could just run the power command repeated, changing the values and writing them down or coding them in Excel. This just automates the process and stores the results in the matrix R for later use in graphing.

Yes. The graph shows the power (y axis) at various levels of a means difference. The three lines are at different means of prevalence. 0.4 to 0.6 is tightly around 0.5, so I wouldn't expect too much difference in power between them. Big differences in power will come more from sample size then the means prevalence.

I think twoproportions is what you want. But check to make sure.
Comment
Matt Price

Join Date: Aug 2023

Posts: 43
#5

26 Jun 2024, 17:01

Hi, I'm still trying to wrap my head around the matrix here.

Above, we have
matrix R = J(12,3,.) to allow us to look at three levels of prevalence (0.4, 0.5, 0.6) and 4 levels of presumed "true" difference (0.05, 0.10, 0.15, and 0.20)

How do I change the matrix above if I want to look at four levels of prevalence (e.g., 0.3, 0.4, 0.5 and 0.6)? What about 5 levels of difference (e.g., 0.05, 0.10, 0.15, 0.20, and 0.25). I changed matrix R = J(12,3,.) to matrix R = J(16,4,.) and the graph did not graph the additional line. Any recommendations on how to change the code to allow flexibility in my assumptions?

Thanks again for your help.
Comment

George Ford

Join Date: Aug 2014
Posts: 3148

27 Jun 2024, 11:31

Code:

clear all

matrix R = J(20,3,.)
local i 1
forv outer = 0.3(0.1)0.6 {
    forv inner = 0.05(0.05)0.25 {
    local diff = `outer'+`inner'    
    qui power twoproportions `outer' `diff', n(200) nratio(1) 
    matrix R[`i',1] = round(`outer',0.01)
    matrix R[`i',2] = `inner'
    matrix R[`i',3] = r(power)
    di `i'
    local i = `i'+1
    }
}
capture drop R*
svmat R
replace R1 = round(R1,0.01)
lgraph R3 R2, by(R1)

Announcement

Statistical power to see a difference across two or three groups

Comment

Comment

Comment

Comment

Comment