Hi statalisters,
I’m using data from a pilot RCT to determine the sample size necessary to find a desired effect sizes in a two-mean comparison. I want to know if I should be using the mean and standard deviation of the difference in differences of the variable of interest (difference between treatment and control between T1 and T2) to estimate desired sample size, or if I should use the mean and standard deviation of the baseline levels of the variable of interest for the total sample, or some alternative.
All analysis is done in Stata 13 (I believe the command “power” does not work in earlier versions of Stata). For example using blood pressure data (assuming the variable sex is actually measures treatment/control).
It intuitively makes sense to me to want to use the parameters of the “Dif” variable, since I’m interested in differences between the control and treatment group. However, the mean and STD on difference always lead to very high estimated sample sizes due to the large variance in treatment effects! Therefore, I feel inclined to use the sample mean but am not sure if it is more logical to use estimates from T1 or T2, or from the control of experimental group.
Additionally, if I wanted to calculate the sample size needed for a subgroup analysis (such as agegrp=49-59) to yield reasonable estimates of effects at set power levels, should I still use mean and standard deviation from the total population?
Any help would be kindly appreciated, and I would be happy to provide additional information as required.
I’m using data from a pilot RCT to determine the sample size necessary to find a desired effect sizes in a two-mean comparison. I want to know if I should be using the mean and standard deviation of the difference in differences of the variable of interest (difference between treatment and control between T1 and T2) to estimate desired sample size, or if I should use the mean and standard deviation of the baseline levels of the variable of interest for the total sample, or some alternative.
All analysis is done in Stata 13 (I believe the command “power” does not work in earlier versions of Stata). For example using blood pressure data (assuming the variable sex is actually measures treatment/control).
Code:
sysuse bpwide.dta, clear rename sex Treatment label define sex 0 "control" 1 "treatment", modify gen Dif= bp_after- bp_before reg Dif Treatment //I believe this demonstrates that there is not sig. difference between the treatment and control group ///My question is whether I should be using the "Dif" variable or the "bp_before/bp_after" variable to estimate what sample size I need to detect a sig. difference of 2 units. sum Dif power twomeans -5.091667 -3.091667, sd(16.7136) power(.8) sum bp_before power twomeans 156.45 154.45, sd( 11.38985 ) power(.8) sum bp_after power twomeans 151.3583 149.3583, sd( 14.17762 ) power(.8)
Additionally, if I wanted to calculate the sample size needed for a subgroup analysis (such as agegrp=49-59) to yield reasonable estimates of effects at set power levels, should I still use mean and standard deviation from the total population?
Any help would be kindly appreciated, and I would be happy to provide additional information as required.