Hello,
I’m working on a Bayesian analysis with staged data collection and could use some guidance on implementing this in Stata. Here’s an outline of my approach:
This is the output:
Thank you for your help!
I’m working on a Bayesian analysis with staged data collection and could use some guidance on implementing this in Stata. Here’s an outline of my approach:
- Initially, I analyse data=0 with an uninformative prior.
- I then plan to use the results from this first analysis as a prior for a Bayesian analysis of data=1.
- Analyse the accumulated data at the interim stage with an uninformative prior.
- Use the posterior from this interim analysis as the prior for the final analysis, incorporating the additional data collected (data=1).
- Is this staged approach equivalent to a single Bayesian analysis using an uninformative prior at both the interim and final stages but applying it to all available data at each point (i.e., at the interim stage using only data=0, and at the final stage using data=0 + data=1)?
- Could anyone provide insight into how to best implement this process in Stata?
Code:
clear set seed 15 set obs 100 // Generate data indicator and treatment assignment gen data = (_n <= 50) generate trt = rbinomial(1, 0.5) // Generate a binary outcome (highbp) based on treatment effect generate xb = (ln(0.17) + ln(0.70) * trt) generate highbp = rlogistic(xb, 1) > 0 // ################################## // Step 1: Fit Bayesian model in data=0 using uninformative priors bayes, prior({highbp:trt}, normal(0, 10000)) /// prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt if data == 0, or // Step 2: Save posterior mean and variance for parameters trt and _cons bayesstats summary // (you may specify particular parameters here if desired) local trt_mean = r(summary)[1, 1] // posterior mean of trt local trt_variance = r(summary)[1, 2]^2 // square of posterior std dev for trt local cons_mean = r(summary)[2, 1] // posterior mean of _cons local cons_variance = r(summary)[2, 2]^2 // square of posterior std dev for _cons display `trt_mean' display `trt_variance' display `cons_mean' display `cons_variance' // ################################## // Step 3: Fit Bayesian model in data=1, using posterior results from Step 1 as priors bayes, prior({highbp:trt}, normal(`trt_mean', `trt_variance')) /// prior({highbp:_cons}, normal(`cons_mean', `cons_variance')) : logit highbp trt if data == 1, or // ################################## // Step 4: Compare with Bayesian analysis on full dataset (data = 0 + data = 1) // This model uses uninformative priors, as in Step 1 bayes, prior({highbp:trt}, normal(0, 10000)) /// prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt, or // Step 5: Compare with frequentist approach logit highbp trt, or
HTML Code:
. clear
. set seed 15
. set obs 100
Number of observations (_N) was 0, now 100.
.
. // Generate data indicator and treatment assignment
. gen data = (_n <= 50)
. generate trt = rbinomial(1, 0.5)
.
. // Generate a binary outcome (highbp) based on treatment effect
. generate xb = (ln(0.17) + ln(0.70) * trt)
. generate highbp = rlogistic(xb, 1) > 0
.
. // ##################################
.
. // Step 1: Fit Bayesian model in data=0 using uninformative priors
. bayes, prior({highbp:trt}, normal(0, 10000)) ///
> prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt if data == 0, or
Burn-in ...
Simulation ...
Model summary
------------------------------------------------------------------------------
Likelihood:
highbp ~ logit(xb_highbp)
Prior:
{highbp:trt _cons} ~ normal(0,10000) (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_highbp.
Bayesian logistic regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 50
Acceptance rate = .224
Efficiency: min = .1033
avg = .1107
Log marginal-likelihood = -31.280054 max = .1182
------------------------------------------------------------------------------
| Equal-tailed
highbp |Odds ratio Std. dev. MCSE Median [95% cred. interval]
-------------+----------------------------------------------------------------
trt | 3.563619 7.428146 .231171 1.936356 .3569163 15.21196
_cons | .1385592 .1091119 .003173 .1106197 .0158556 .4079769
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
.
. // Step 2: Save posterior mean and variance for parameters trt and _cons
. bayesstats summary // (you may specify particular parameters here if desired)
Posterior summary statistics MCMC sample size = 10,000
------------------------------------------------------------------------------
| Equal-tailed
highbp | Mean Std. dev. MCSE Median [95% cred. interval]
-------------+----------------------------------------------------------------
trt | .715502 .9554699 .030759 .6608079 -1.030254 2.722082
_cons | -2.280574 .8439397 .027449 -2.201657 -4.144231 -.8965447
------------------------------------------------------------------------------
.
. local trt_mean = r(summary)[1, 1] // posterior mean of trt
. local trt_variance = r(summary)[1, 2]^2 // square of posterior std dev for trt
. local cons_mean = r(summary)[2, 1] // posterior mean of _cons
. local cons_variance = r(summary)[2, 2]^2 // square of posterior std dev for _cons
.
. display `trt_mean'
.71550199
. display `trt_variance'
.91292272
. display `cons_mean'
-2.2805745
. display `cons_variance'
.71223416
.
. // ##################################
.
. // Step 3: Fit Bayesian model in data=1, using posterior results from Step 1 as priors
. bayes, prior({highbp:trt}, normal(`trt_mean', `trt_variance')) ///
> prior({highbp:_cons}, normal(`cons_mean', `cons_variance')) : logit highbp trt if data == 1, or
Burn-in ...
Simulation ...
Model summary
------------------------------------------------------------------------------
Likelihood:
highbp ~ logit(xb_highbp)
Priors:
{highbp:trt} ~ normal(.7155019937060871,.9129227171295571) (1)
{highbp:_cons} ~ normal(-2.280574490467969,.71223415887348) (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_highbp.
Bayesian logistic regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 50
Acceptance rate = .2595
Efficiency: min = .1442
avg = .1445
Log marginal-likelihood = -14.502944 max = .1447
------------------------------------------------------------------------------
| Equal-tailed
highbp |Odds ratio Std. dev. MCSE Median [95% cred. interval]
-------------+----------------------------------------------------------------
trt | 1.666449 1.26166 .033219 1.34577 .3618972 4.878666
_cons | .0702522 .0396812 .001043 .0629117 .0192435 .1638918
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
.
. // ##################################
.
. // Step 4: Compare with Bayesian analysis on full dataset (data = 0 + data = 1)
. // This model uses uninformative priors, as in Step 1
. bayes, prior({highbp:trt}, normal(0, 10000)) ///
> prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt, or
Burn-in ...
Simulation ...
Model summary
------------------------------------------------------------------------------
Likelihood:
highbp ~ logit(xb_highbp)
Prior:
{highbp:trt _cons} ~ normal(0,10000) (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_highbp.
Bayesian logistic regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 100
Acceptance rate = .2048
Efficiency: min = .1112
avg = .1189
Log marginal-likelihood = -45.624597 max = .1266
------------------------------------------------------------------------------
| Equal-tailed
highbp |Odds ratio Std. dev. MCSE Median [95% cred. interval]
-------------+----------------------------------------------------------------
trt | 2.556934 2.864324 .085913 1.790165 .4753463 8.62226
_cons | .0893983 .0540235 .001518 .0773762 .0178525 .2237933
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
.
. // Step 5: Compare with frequentist approach
. logit highbp trt, or
Iteration 0: Log likelihood = -34.651534
Iteration 1: Log likelihood = -34.393309
Iteration 2: Log likelihood = -34.390499
Iteration 3: Log likelihood = -34.390498
Logistic regression Number of obs = 100
LR chi2(1) = 0.52
Prob > chi2 = 0.4700
Log likelihood = -34.390498 Pseudo R2 = 0.0075
------------------------------------------------------------------------------
highbp | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
trt | 1.648484 1.172543 0.70 0.482 .4089194 6.645567
_cons | .0882353 .0531426 -4.03 0.000 .0271005 .2872809
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Thank you for your help!