Hello! I have a coding that results in a point estimate. My goal is to use Monte Carlo simulation to repeat the procedur "n" times, obtain a mean of the estimation and thus a more robust value. To this concern I used the simulate command, create my program "myboot" and since i have the same sample always I relied on bsample to use a given percentage of the population with replacement. I am not pretty sure about it because I am having a quite high value of standard errors in the final output of the mean even when I use 99% of the original sample.
Code:
use "${gsdClean}/Pseudo_panel_yearly.dta", clear cd "${gsdOut}/3. Partial correlation coefficient" ************************************************************************************************************************************ ************************************************************************************************************************************ **** PROGRAM ****************************************************************************************************************** capture program drop myboot program myboot, rclass *** Regression inputs and other parameters local indvars = "f age age2 ed marital munder_3 tot4_24 older_65 i.region" local depvar = "pci" *** Age range local agevar = "age" scalar min_age = 23 scalar max_age = 55 *** Survey years local yearvar = "year" *** Other parameters local cohvar = "age region nivel_ed_a" local idvar = "id" *** Survey settings svyset _n [weight=weight] local weight = "weight" preserve bsample round(0.99*_N) local j = 1 local i=2003 local year1 = `i' local year2 = `i' + 1 local diff = `year2'-`year1' collapse `depvar', by(`cohvar' `yearvar') replace `agevar' = `agevar' - `diff' if `yearvar' == `year2' reshape wide `depvar', i(`cohvar') j(`yearvar') corr `depvar'`year1' `depvar'`year2' if `agevar' >= min_age & `agevar' <= max_age scalar rho_s = r(rho) restore, preserve **** DROP OBS OUTSIDE OF AGE RANGE drop if (`yearvar' == `year1' & `agevar' < min_age) | (`yearvar' == `year1' & `agevar' > max_age) drop if (`yearvar' == `year2' & `agevar' < (min_age +`diff')) | (`yearvar' == `year2' & `agevar' > (max_age +`diff' )) **** INCOME MODEL ************************************************************* **** POINT ESTIMATES OF CONDITIONAL AND UNCONDITIONAL PROBABLITIES *** regressions to obtain inputs for estimation of rho and probabilities bivariate normal distribution *** regression year 1 svy: reg `depvar' `indvars' if `yearvar' == `year1' predict fitinc`year1'_x`year1' if e(sample) == 1, xb predict fitinc`year1'_x`year2' if e(sample) == 0 & `yearvar' ==`year2', xb matrix b1 = e(b) predict res`year1' if e(sample), residuals sum res`year1' if `yearvar' == `year1' [aw=`weight'] scalar sd_res`year1' = r(sd) sum `depvar' if e(sample) [aw=`weight'] scalar vary1 = r(Var) *** regression year 2 svy: reg `depvar' `indvars' if `yearvar' == `year2' predict fitinc`year2'_x`year2' if e(sample) == 1, xb matrix b2 = e(b) predict res`year2' if e(sample), residuals sum res`year2' if `yearvar' == `year2' [aw=`weight'] scalar sd_res`year2' = r(sd) sum `depvar' if e(sample) [aw=`weight'] scalar vary2 = r(Var) ** calculate variance-covariance matrix for explanatory variables in year 2 matrix accum VARX2 = `indvars' if e(sample), deviations noconstant mat VARX2 = VARX2/(r(N)-1) mat zero= J(1, colsof(e(b))- 1, 0) mat VARX2= VARX2\zero mat zero= J(colsof(e(b)),1,0) mat VARX2= VARX2, zero mat li VARX2 ***** ESTIMATION OF PARTIAL CORRELATION COEFFICIENT RHO ********************** matrix RHO_HAT_`year1'_`year2' = (rho_s*sqrt(vary1*vary2)-b1*VARX2*b2')/(sd_res`year1'*sd_res`year2') // eq. 5 DL 2019 mat li RHO_HAT_`year1'_`year2' return scalar rho_hat = RHO_HAT_`year1'_`year2'[1,1] restore, preserve end simulate rho_bs = r(rho_hat), reps(15): myboot summarize
Comment