Dear all,
due to problems with homoscedasticity and normal distribution in my data, I find bootstrap as an adecuate solution to this. Unfortunately, I do not fully understand how to do it in Stata.
The regression part looks like this in the un-bootstrapped version:
I added the vce(bootstrap) option to get the bootstrap estimation (with 1000 replicates and a seed number for reproduction):
Is this how it works? Does the error message matter? As I understand it, it means that 18 out of 1000 replicates could not be estimated, which - in my opinion - should not be too much as a problem? Or, in other words, at how many non-estimated replications would it become problematic?
However, the first part of my analysis is to test the difference of the mean between two groups (t-test) and I do not get how this is done in a bootstrapped version. The vce(bootstrap) option does not work here, so I tried to build the command with Stata's help bootstrap, but I'm nearly lost ...
This is the result of the "normal" t-test (with preceding Levene's test for equal variances:
Then I requested the stored results and tried to create the bootstrap command with the following results:
But I really don't know if this is what I want and the correct version of the command. After all, I need the group means and the p-value (significance) of the difference of the means ... and here are separate significances given out for the three estimates? This confuses me, all in all ...
Thanks for any hints and help!
due to problems with homoscedasticity and normal distribution in my data, I find bootstrap as an adecuate solution to this. Unfortunately, I do not fully understand how to do it in Stata.
The regression part looks like this in the un-bootstrapped version:
Code:
. reg ch_helpfreq_abs /// > i.ch_female i.ch_employment i.ch_partner c.ch_nrkids i.ch_coresiding i.ch_faraway i.ch_educhigh i.transfer_childpar i.transfer_parchild c.ch_age /// > c.nr_sons c.nr_daught /// > c.r_age i.r_female i.r_partner i.r_educhigh c.lnr_hhincome c.health_lim if sample_main==1 Source | SS df MS Number of obs = 229 -------------+---------------------------------- F(19, 209) = 5.56 Model | 390155.989 19 20534.5257 Prob > F = 0.0000 Residual | 771894.928 209 3693.27717 R-squared = 0.3357 -------------+---------------------------------- Adj R-squared = 0.2754 Total | 1162050.92 228 5096.71455 Root MSE = 60.772 ------------------------------------------------------------------------------------- ch_helpfreq_abs | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------+---------------------------------------------------------------- ch_female | 1. Yes | 13.01426 12.3828 1.05 0.294 -11.39694 37.42546 | ch_employment | Employed part-time | 10.28111 15.54928 0.66 0.509 -20.37241 40.93464 Employed full-time | 4.257441 12.18111 0.35 0.727 -19.75615 28.27103 | ch_partner | 1. Yes | -8.814668 10.92722 -0.81 0.421 -30.35637 12.72703 ch_nrkids | -.2130886 4.415956 -0.05 0.962 -8.918613 8.492435 | ch_coresiding | 1. Yes | 29.34034 11.27993 2.60 0.010 7.103317 51.57735 | ch_faraway | 1. Yes | -11.79207 9.713842 -1.21 0.226 -30.94174 7.3576 | ch_educhigh | 1. Yes | 17.65725 9.575949 1.84 0.067 -1.220576 36.53508 | transfer_childpar | 1. Yes | 83.87799 32.69203 2.57 0.011 19.4296 148.3264 | transfer_parchild | 1. Yes | -6.66581 10.32876 -0.65 0.519 -27.02772 13.6961 ch_age | 1.341725 .9014449 1.49 0.138 -.4353653 3.118814 nr_sons | 1.576808 6.353965 0.25 0.804 -10.94927 14.10288 nr_daught | -1.500502 5.74762 -0.26 0.794 -12.83124 9.830239 r_age | -.8002276 .908177 -0.88 0.379 -2.590589 .9901339 | r_female | 1. Yes | -5.036884 8.952904 -0.56 0.574 -22.68646 12.61269 | r_partner | 1. Yes | -8.088559 11.96358 -0.68 0.500 -31.67332 15.49621 | r_educhigh | 1. Yes | -3.394952 10.96352 -0.31 0.757 -25.00822 18.21831 lnr_hhincome | 7.603345 5.235145 1.45 0.148 -2.717113 17.9238 health_lim | 10.04284 1.382689 7.26 0.000 7.317033 12.76864 _cons | -75.22747 60.65407 -1.24 0.216 -194.7997 44.34472 -------------------------------------------------------------------------------------
Code:
. // bootstrap . reg ch_helpfreq_abs /// > i.ch_female i.ch_employment i.ch_partner c.ch_nrkids i.ch_coresiding i.ch_faraway i.ch_educhigh i.transfer_childpar i.transfer_parchild c.ch_age /// > c.nr_sons c.nr_daught /// > c.r_age i.r_female i.r_partner i.r_educhigh c.lnr_hhincome c.health_lim if sample_main==1, vce(bootstrap, reps(1000) seed(8086411)) (running regress on estimation sample) Bootstrap replications (1000) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 ............................x..................... 100 ......x........................................... 150 .................................................. 200 ............x.................x................... 250 ........x......................................... 300 ..x.......x....................................... 350 x......................x.......................... 400 .................................................. 450 .................................................. 500 .................................................. 550 x........................................x........ 600 .................................................x 650 .................................................. 700 .................................................. 750 .......x.......................................... 800 .x................................................ 850 .................................................. 900 ......x..........x....................x........... 950 ........x......................................... 1000 Linear regression Number of obs = 229 Replications = 982 Wald chi2(19) = 21.81 Prob > chi2 = 0.2939 R-squared = 0.3357 Adj R-squared = 0.2754 Root MSE = 60.7723 ------------------------------------------------------------------------------------- | Observed Bootstrap Normal-based ch_helpfreq_abs | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------------+---------------------------------------------------------------- ch_female | 1. Yes | 13.01426 12.65994 1.03 0.304 -11.79876 37.82728 | ch_employment | Employed part-time | 10.28111 15.54569 0.66 0.508 -20.18788 40.75011 Employed full-time | 4.257441 11.95636 0.36 0.722 -19.1766 27.69148 | ch_partner | 1. Yes | -8.814668 10.55624 -0.84 0.404 -29.50452 11.87518 ch_nrkids | -.2130886 5.190599 -0.04 0.967 -10.38648 9.960298 | ch_coresiding | 1. Yes | 29.34034 14.3009 2.05 0.040 1.311084 57.36959 | ch_faraway | 1. Yes | -11.79207 6.482636 -1.82 0.069 -24.4978 .9136635 | ch_educhigh | 1. Yes | 17.65725 10.12226 1.74 0.081 -2.182018 37.49653 | transfer_childpar | 1. Yes | 83.87799 81.37532 1.03 0.303 -75.61471 243.3707 | transfer_parchild | 1. Yes | -6.66581 11.79164 -0.57 0.572 -29.77699 16.44537 ch_age | 1.341725 1.077577 1.25 0.213 -.7702883 3.453737 nr_sons | 1.576808 5.564748 0.28 0.777 -9.329897 12.48351 nr_daught | -1.500502 4.458608 -0.34 0.736 -10.23921 7.23821 r_age | -.8002276 .9779172 -0.82 0.413 -2.71691 1.116455 | r_female | 1. Yes | -5.036884 9.440082 -0.53 0.594 -23.53911 13.46534 | r_partner | 1. Yes | -8.088559 13.75266 -0.59 0.556 -35.04328 18.86617 | r_educhigh | 1. Yes | -3.394952 9.514389 -0.36 0.721 -22.04281 15.25291 lnr_hhincome | 7.603345 6.883106 1.10 0.269 -5.887295 21.09399 health_lim | 10.04284 2.891411 3.47 0.001 4.375774 15.7099 _cons | -75.22747 57.22399 -1.31 0.189 -187.3844 36.92949 ------------------------------------------------------------------------------------- Note: One or more parameters could not be estimated in 18 bootstrap replicates; standard-error estimates include only complete replications.
However, the first part of my analysis is to test the difference of the mean between two groups (t-test) and I do not get how this is done in a bootstrapped version. The vce(bootstrap) option does not work here, so I tried to build the command with Stata's help bootstrap, but I'm nearly lost ...
This is the result of the "normal" t-test (with preceding Levene's test for equal variances:
Code:
. sdtest ch_helpfreq_abs if sample_main==1, by(ch_female) Variance ratio test ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0. No | 107 14 4.962334 51.33078 4.16169 23.83831 1. Yes | 122 24.58197 7.70499 85.10439 9.327908 39.83603 ---------+-------------------------------------------------------------------- combined | 229 19.63755 4.717668 71.39128 10.34175 28.93336 ------------------------------------------------------------------------------ ratio = sd(0. No) / sd(1. Yes) f = 0.3638 Ho: ratio = 1 degrees of freedom = 106, 121 Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1 Pr(F < f) = 0.0000 2*Pr(F < f) = 0.0000 Pr(F > f) = 1.0000 . . * t-Test . ttest ch_helpfreq_abs if sample_main==1, by(ch_female) unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0. No | 107 14 4.962334 51.33078 4.16169 23.83831 1. Yes | 122 24.58197 7.70499 85.10439 9.327908 39.83603 ---------+-------------------------------------------------------------------- combined | 229 19.63755 4.717668 71.39128 10.34175 28.93336 ---------+-------------------------------------------------------------------- diff | -10.58197 9.164694 -28.65247 7.488534 ------------------------------------------------------------------------------ diff = mean(0. No) - mean(1. Yes) t = -1.1546 Ho: diff = 0 Satterthwaite's degrees of freedom = 202.439 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.1248 Pr(|T| > |t|) = 0.2496 Pr(T > t) = 0.8752
Code:
. return list scalars: r(level) = 95 r(sd) = 71.39127781724939 r(sd_2) = 85.10439288697191 r(sd_1) = 51.33078079090336 r(se) = 9.16469442146575 r(p_u) = .8752014715105874 r(p_l) = .1247985284894127 r(p) = .2495970569788254 r(t) = -1.154644849732189 r(df_t) = 202.4387773576472 r(mu_2) = 24.58196721311475 r(N_2) = 122 r(mu_1) = 14 r(N_1) = 107 . . * t-Test (bootstrap) . set seed 8086411 . bootstrap meanM=r(mu_1) meanF=r(mu_2) sig=r(p), reps(1000): ttest ch_helpfreq_abs if sample_main==1, by(ch_female) unequal (running ttest on estimation sample) Warning: Because ttest is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means that no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data. Bootstrap replications (1000) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 .................................................. 250 .................................................. 300 .................................................. 350 .................................................. 400 .................................................. 450 .................................................. 500 .................................................. 550 .................................................. 600 .................................................. 650 .................................................. 700 .................................................. 750 .................................................. 800 .................................................. 850 .................................................. 900 .................................................. 950 .................................................. 1000 Bootstrap results Number of obs = 229 Replications = 1,000 command: ttest ch_helpfreq_abs, by(ch_female) unequal meanM: r(mu_1) meanF: r(mu_2) sig: r(p) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- meanM | 14 5.098768 2.75 0.006 4.006598 23.9934 meanF | 24.58197 7.851312 3.13 0.002 9.193678 39.97026 sig | .2495971 .2908453 0.86 0.391 -.3204492 .8196433 ------------------------------------------------------------------------------ .
Thanks for any hints and help!
Comment