Dear Statalists,
I would like to test the effect of sample size on standard errors of interaction effects. The model results from the survey data shows a pattern of interaction effects but the interaction effects do not reach statistical significance. I am interested to find out how large the sample size needs to be in order to be statistically significant.
The idea is to generate a new data set with the same distribution, correlation matrix, regression coefficients as the real data, but a larger sample size where the interaction effects of interest reach statistical significance.
I may need to consider complex survey design as well.
I have considered the following options.
1.The command
would have been ideal if it worked well with logistic regression and categorical variables.
2. Following Buis' s discussion(i.e., M.L. Buis (2007) "Stata tip 48: Discrete uses for uniform()), I was able to simulate a data set for logistic regression with specified distributions, but failed to replicate regression coefficients. The regression coefficients in the simulated data set only approximate those specified. I cannot reproduce the correlation matrix either.
The approach is something similar to this post. https://www.stata.com/statalist/arch.../msg00018.html
3. A very vague idea is to use probit regression. I may simulate a data using corr2data and transform the outcome variables using the probit link functions. It is a long shot and I have not been able to figure out how to do it yet.
Any suggestion is appreciated.
The logistic regression results I wish to simulate:
I use Stata 15, Windows 64bit.
Many thanks.
Min
I would like to test the effect of sample size on standard errors of interaction effects. The model results from the survey data shows a pattern of interaction effects but the interaction effects do not reach statistical significance. I am interested to find out how large the sample size needs to be in order to be statistically significant.
The idea is to generate a new data set with the same distribution, correlation matrix, regression coefficients as the real data, but a larger sample size where the interaction effects of interest reach statistical significance.
I may need to consider complex survey design as well.
I have considered the following options.
1.The command
Code:
corr2data
2. Following Buis' s discussion(i.e., M.L. Buis (2007) "Stata tip 48: Discrete uses for uniform()), I was able to simulate a data set for logistic regression with specified distributions, but failed to replicate regression coefficients. The regression coefficients in the simulated data set only approximate those specified. I cannot reproduce the correlation matrix either.
The approach is something similar to this post. https://www.stata.com/statalist/arch.../msg00018.html
3. A very vague idea is to use probit regression. I may simulate a data using corr2data and transform the outcome variables using the probit link functions. It is a long shot and I have not been able to figure out how to do it yet.
Any suggestion is appreciated.
The logistic regression results I wish to simulate:
Code:
. svyset psu [pw=xw], strata(strata) singleunit(scaled) pweight: xw VCE: linearized Single unit: scaled Strata 1: strata SU 1: psu FPC 1: <zero> . svy: logit y i.x1##i.x2 i.x3 c.x4 (running logit on estimation sample) Survey: Logistic regression Number of strata = 1,630 Number of obs = 7,355 Number of PSUs = 3,232 Population size = 7,976.7239 Design df = 1,602 F( 13, 1590) = 12.36 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.x1 | .3586926 .1260393 2.85 0.004 .1114734 .6059119 | x2 | 3 | .0491903 .1609868 0.31 0.760 -.2665767 .3649573 4 | .1623764 .1383986 1.17 0.241 -.109085 .4338377 7 | .0937721 .1003796 0.93 0.350 -.1031171 .2906613 | x1#x2 | 1 3 | .0348744 .2815246 0.12 0.901 -.5173208 .5870697 1 4 | .3241205 .2281414 1.42 0.156 -.1233666 .7716076 1 7 | -.0562443 .2055027 -0.27 0.784 -.4593267 .3468381 | x3 | 1 | .109137 .1153704 0.95 0.344 -.1171559 .3354298 2 | .4191621 .1126151 3.72 0.000 .1982738 .6400505 3 | .4574391 .1259686 3.63 0.000 .2103585 .7045197 4 | .8478119 .1286161 6.59 0.000 .5955382 1.100085 5 | 1.051244 .1458429 7.21 0.000 .7651811 1.337307 | x4 | .1644617 .0746806 2.20 0.028 .0179798 .3109435 _cons | -1.4128 .6050882 -2.33 0.020 -2.599648 -.2259526 ------------------------------------------------------------------------------ Note: Variance scaled to handle strata with a single sampling unit.
Many thanks.
Min
Comment