Bayesian Analysis in Stages with Stata

Andrew Xavier

Join Date: Jul 2018
Posts: 57

Bayesian Analysis in Stages with Stata

12 Nov 2024, 09:18

Hello,

I’m working on a Bayesian analysis with staged data collection and could use some guidance on implementing this in Stata. Here’s an outline of my approach:

Initially, I analyse data=0 with an uninformative prior.
I then plan to use the results from this first analysis as a prior for a Bayesian analysis of data=1.

This setup mirrors a trial with an interim analysis, where I:

Analyse the accumulated data at the interim stage with an uninformative prior.
Use the posterior from this interim analysis as the prior for the final analysis, incorporating the additional data collected (data=1).

My questions are as follows:

Is this staged approach equivalent to a single Bayesian analysis using an uninformative prior at both the interim and final stages but applying it to all available data at each point (i.e., at the interim stage using only data=0, and at the final stage using data=0 + data=1)?
Could anyone provide insight into how to best implement this process in Stata?

Here is my current simulated code:

Code:

clear
set seed 15
set obs 100

// Generate data indicator and treatment assignment
gen data = (_n <= 50)
generate trt = rbinomial(1, 0.5)

// Generate a binary outcome (highbp) based on treatment effect
generate xb = (ln(0.17) + ln(0.70) * trt)
generate highbp = rlogistic(xb, 1) > 0

// ##################################

// Step 1: Fit Bayesian model in data=0 using uninformative priors
bayes, prior({highbp:trt}, normal(0, 10000)) ///
       prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt if data == 0, or

// Step 2: Save posterior mean and variance for parameters trt and _cons
bayesstats summary // (you may specify particular parameters here if desired)

local trt_mean = r(summary)[1, 1]        // posterior mean of trt
local trt_variance = r(summary)[1, 2]^2  // square of posterior std dev for trt
local cons_mean = r(summary)[2, 1]       // posterior mean of _cons
local cons_variance = r(summary)[2, 2]^2 // square of posterior std dev for _cons

display `trt_mean'
display `trt_variance'
display `cons_mean'
display `cons_variance'

// ##################################

// Step 3: Fit Bayesian model in data=1, using posterior results from Step 1 as priors
bayes, prior({highbp:trt}, normal(`trt_mean', `trt_variance')) ///
       prior({highbp:_cons}, normal(`cons_mean', `cons_variance')) : logit highbp trt if data == 1, or

// ##################################

// Step 4: Compare with Bayesian analysis on full dataset (data = 0 + data = 1)
// This model uses uninformative priors, as in Step 1
bayes, prior({highbp:trt}, normal(0, 10000)) ///
       prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt, or

// Step 5: Compare with frequentist approach
logit highbp trt, or

This is the output:

HTML Code:

. clear

. set seed 15

. set obs 100
Number of observations (_N) was 0, now 100.

. 
. // Generate data indicator and treatment assignment
. gen data = (_n <= 50)

. generate trt = rbinomial(1, 0.5)

. 
. // Generate a binary outcome (highbp) based on treatment effect
. generate xb = (ln(0.17) + ln(0.70) * trt)

. generate highbp = rlogistic(xb, 1) > 0

. 
. // ##################################
. 
. // Step 1: Fit Bayesian model in data=0 using uninformative priors
. bayes, prior({highbp:trt}, normal(0, 10000)) ///
>        prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt if data == 0, or
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  highbp ~ logit(xb_highbp)

Prior: 
  {highbp:trt _cons} ~ normal(0,10000)                                     (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_highbp.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         50
                                                 Acceptance rate  =       .224
                                                 Efficiency:  min =      .1033
                                                              avg =      .1107
Log marginal-likelihood = -31.280054                          max =      .1182
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
      highbp |Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
         trt |  3.563619   7.428146   .231171   1.936356   .3569163   15.21196
       _cons |  .1385592   .1091119   .003173   .1106197   .0158556   .4079769
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. 
. // Step 2: Save posterior mean and variance for parameters trt and _cons
. bayesstats summary // (you may specify particular parameters here if desired)

Posterior summary statistics                      MCMC sample size =    10,000
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
      highbp |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
         trt |   .715502   .9554699   .030759   .6608079  -1.030254   2.722082
       _cons | -2.280574   .8439397   .027449  -2.201657  -4.144231  -.8965447
------------------------------------------------------------------------------

. 
. local trt_mean = r(summary)[1, 1]        // posterior mean of trt

. local trt_variance = r(summary)[1, 2]^2  // square of posterior std dev for trt

. local cons_mean = r(summary)[2, 1]       // posterior mean of _cons

. local cons_variance = r(summary)[2, 2]^2 // square of posterior std dev for _cons

. 
. display `trt_mean'
.71550199

. display `trt_variance'
.91292272

. display `cons_mean'
-2.2805745

. display `cons_variance'
.71223416

. 
. // ##################################
. 
. // Step 3: Fit Bayesian model in data=1, using posterior results from Step 1 as priors
. bayes, prior({highbp:trt}, normal(`trt_mean', `trt_variance')) ///
>        prior({highbp:_cons}, normal(`cons_mean', `cons_variance')) : logit highbp trt if data == 1, or
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  highbp ~ logit(xb_highbp)

Priors: 
    {highbp:trt} ~ normal(.7155019937060871,.9129227171295571)             (1)
  {highbp:_cons} ~ normal(-2.280574490467969,.71223415887348)              (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_highbp.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         50
                                                 Acceptance rate  =      .2595
                                                 Efficiency:  min =      .1442
                                                              avg =      .1445
Log marginal-likelihood = -14.502944                          max =      .1447
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
      highbp |Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
         trt |  1.666449    1.26166   .033219    1.34577   .3618972   4.878666
       _cons |  .0702522   .0396812   .001043   .0629117   .0192435   .1638918
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. 
. // ##################################
. 
. // Step 4: Compare with Bayesian analysis on full dataset (data = 0 + data = 1)
. // This model uses uninformative priors, as in Step 1
. bayes, prior({highbp:trt}, normal(0, 10000)) ///
>        prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt, or
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  highbp ~ logit(xb_highbp)

Prior: 
  {highbp:trt _cons} ~ normal(0,10000)                                     (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_highbp.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        100
                                                 Acceptance rate  =      .2048
                                                 Efficiency:  min =      .1112
                                                              avg =      .1189
Log marginal-likelihood = -45.624597                          max =      .1266
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
      highbp |Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
         trt |  2.556934   2.864324   .085913   1.790165   .4753463    8.62226
       _cons |  .0893983   .0540235   .001518   .0773762   .0178525   .2237933
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. 
. // Step 5: Compare with frequentist approach
. logit highbp trt, or

Iteration 0:  Log likelihood = -34.651534  
Iteration 1:  Log likelihood = -34.393309  
Iteration 2:  Log likelihood = -34.390499  
Iteration 3:  Log likelihood = -34.390498  

Logistic regression                                     Number of obs =    100
                                                        LR chi2(1)    =   0.52
                                                        Prob > chi2   = 0.4700
Log likelihood = -34.390498                             Pseudo R2     = 0.0075

------------------------------------------------------------------------------
      highbp | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         trt |   1.648484   1.172543     0.70   0.482     .4089194    6.645567
       _cons |   .0882353   .0531426    -4.03   0.000     .0271005    .2872809
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Thank you for your help!

Tags: None

Announcement

Bayesian Analysis in Stages with Stata