Hello,
I'm running a simulation study looking at when a student t test becomes unreliable c.f. the welch t test. The main issue I'm having is that I would like to store the output from the student t test and welch test. I can store means for each sample, SDs, t values, p values, but I would also like to store the standard errors (I've embedded the code I've tried to use for one of the samples). On top of this I'd also like to store confidence intervals. I've been calculating these using the code below, but I plan on altering the sample size and SD quite frequently and was hoping to avoid readjusting the t value manually each time (I'm using the t distribution as that is what Stata uses for these tests). In terms of calculating the difference in means, again I can do this manually (and the CIs for the student test), but I'd also like to store the standard errors and confidence intervals produced by the student and welch test (the welch test in particularly as it is more difficult to calculate).
I've limited the reps to 20 just for speed of calculations here.
Thank you very much,
Don
I'm running a simulation study looking at when a student t test becomes unreliable c.f. the welch t test. The main issue I'm having is that I would like to store the output from the student t test and welch test. I can store means for each sample, SDs, t values, p values, but I would also like to store the standard errors (I've embedded the code I've tried to use for one of the samples). On top of this I'd also like to store confidence intervals. I've been calculating these using the code below, but I plan on altering the sample size and SD quite frequently and was hoping to avoid readjusting the t value manually each time (I'm using the t distribution as that is what Stata uses for these tests). In terms of calculating the difference in means, again I can do this manually (and the CIs for the student test), but I'd also like to store the standard errors and confidence intervals produced by the student and welch test (the welch test in particularly as it is more difficult to calculate).
I've limited the reps to 20 just for speed of calculations here.
Thank you very much,
Don
Code:
clear set seed 09806527 local a = 100 local c = 2 local b = 98 local d = 20 local obs1 = 200 local obs2 = 400 local loops = 20 postfile sim x_mean x_sd x_n x_se y_mean y_sd y_n t1 Pvalue1 CI1 ci_lb1 ci_ub1 t2 Pavlue2 CI2 ci_lb2 ci_ub2 meandiff s_p SE CID_lb CID_ub using simresults1, replace forvalues i=1/`loops' { drop _all set obs `obs1' generate x= rnormal(`a',`c') quietly summarize x scalar x_mean=r(mean) scalar x_sd=r(sd) scalar x_n=r(N) scalar x_se=r(se) set obs `obs2' generate y= rnormal(`b',`d') quietly summarize y scalar y_mean=r(mean) scalar y_sd=r(sd) scalar y_n=r(N) ttest x=y, unpaired scalar t1 = r(t) scalar Pvalue1 = r(p) scalar CI1 = r(level) ttest x=y, unpaired unequal scalar t2 = r(t) scalar Pvalue2 = r(p) scalar CI2 = r(level) scalar ci_lb1 = x_mean - 1.972*(x_sd/sqrt(`obs1')) scalar ci_ub1 = x_mean + 1.972*(x_sd/sqrt(`obs1')) scalar ci_lb2 = y_mean - 1.966*(y_sd/sqrt(`obs2')) scalar ci_ub2 = y_mean + 1.966*(y_sd/sqrt(`obs2')) scalar meandiff = x_mean - y_mean scalar s_p = sqrt(((`obs1'-1)*(x_sd)^2 + (`obs2'-1)*(y_sd)^2)/(`obs1' + `obs2' -2)) generate SE = s_p/10 scalar CID_lbt = meandiff - 1.964*s_p*sqrt(1/`obs1' + 1/`obs2') scalar CID_ubt = meandiff + 1.964*s_p*sqrt(1/`obs1' + 1/`obs2') post sim (x_mean) (x_sd) (x_n) (x_se) (y_mean) (y_sd) (y_n) (t1) (Pvalue1) (CI1) (ci_lb1) (ci_ub1) (t2) (Pvalue2) (CI2) (ci_lb2) (ci_ub2) (meandiff) (s_p) (SE) (CID_lb) (CID_ub) } postclose sim