Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bayesian Analysis in Stages with Stata

    Hello,

    I’m working on a Bayesian analysis with staged data collection and could use some guidance on implementing this in Stata. Here’s an outline of my approach:
    1. Initially, I analyse data=0 with an uninformative prior.
    2. I then plan to use the results from this first analysis as a prior for a Bayesian analysis of data=1.
    This setup mirrors a trial with an interim analysis, where I:
    • Analyse the accumulated data at the interim stage with an uninformative prior.
    • Use the posterior from this interim analysis as the prior for the final analysis, incorporating the additional data collected (data=1).
    My questions are as follows:
    1. Is this staged approach equivalent to a single Bayesian analysis using an uninformative prior at both the interim and final stages but applying it to all available data at each point (i.e., at the interim stage using only data=0, and at the final stage using data=0 + data=1)?
    2. Could anyone provide insight into how to best implement this process in Stata?
    Here is my current simulated code:


    Code:
    clear
    set seed 15
    set obs 100
    
    // Generate data indicator and treatment assignment
    gen data = (_n <= 50)
    generate trt = rbinomial(1, 0.5)
    
    // Generate a binary outcome (highbp) based on treatment effect
    generate xb = (ln(0.17) + ln(0.70) * trt)
    generate highbp = rlogistic(xb, 1) > 0
    
    // ##################################
    
    // Step 1: Fit Bayesian model in data=0 using uninformative priors
    bayes, prior({highbp:trt}, normal(0, 10000)) ///
           prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt if data == 0, or
    
    // Step 2: Save posterior mean and variance for parameters trt and _cons
    bayesstats summary // (you may specify particular parameters here if desired)
    
    local trt_mean = r(summary)[1, 1]        // posterior mean of trt
    local trt_variance = r(summary)[1, 2]^2  // square of posterior std dev for trt
    local cons_mean = r(summary)[2, 1]       // posterior mean of _cons
    local cons_variance = r(summary)[2, 2]^2 // square of posterior std dev for _cons
    
    display `trt_mean'
    display `trt_variance'
    display `cons_mean'
    display `cons_variance'
    
    // ##################################
    
    // Step 3: Fit Bayesian model in data=1, using posterior results from Step 1 as priors
    bayes, prior({highbp:trt}, normal(`trt_mean', `trt_variance')) ///
           prior({highbp:_cons}, normal(`cons_mean', `cons_variance')) : logit highbp trt if data == 1, or
    
    // ##################################
    
    // Step 4: Compare with Bayesian analysis on full dataset (data = 0 + data = 1)
    // This model uses uninformative priors, as in Step 1
    bayes, prior({highbp:trt}, normal(0, 10000)) ///
           prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt, or
    
    // Step 5: Compare with frequentist approach
    logit highbp trt, or
    This is the output:

    HTML Code:
    . clear
    
    . set seed 15
    
    . set obs 100
    Number of observations (_N) was 0, now 100.
    
    . 
    . // Generate data indicator and treatment assignment
    . gen data = (_n <= 50)
    
    . generate trt = rbinomial(1, 0.5)
    
    . 
    . // Generate a binary outcome (highbp) based on treatment effect
    . generate xb = (ln(0.17) + ln(0.70) * trt)
    
    . generate highbp = rlogistic(xb, 1) > 0
    
    . 
    . // ##################################
    . 
    . // Step 1: Fit Bayesian model in data=0 using uninformative priors
    . bayes, prior({highbp:trt}, normal(0, 10000)) ///
    >        prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt if data == 0, or
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood: 
      highbp ~ logit(xb_highbp)
    
    Prior: 
      {highbp:trt _cons} ~ normal(0,10000)                                     (1)
    ------------------------------------------------------------------------------
    (1) Parameters are elements of the linear form xb_highbp.
    
    Bayesian logistic regression                     MCMC iterations  =     12,500
    Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =         50
                                                     Acceptance rate  =       .224
                                                     Efficiency:  min =      .1033
                                                                  avg =      .1107
    Log marginal-likelihood = -31.280054                          max =      .1182
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
          highbp |Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]
    -------------+----------------------------------------------------------------
             trt |  3.563619   7.428146   .231171   1.936356   .3569163   15.21196
           _cons |  .1385592   .1091119   .003173   .1106197   .0158556   .4079769
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    . 
    . // Step 2: Save posterior mean and variance for parameters trt and _cons
    . bayesstats summary // (you may specify particular parameters here if desired)
    
    Posterior summary statistics                      MCMC sample size =    10,000
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
          highbp |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
    -------------+----------------------------------------------------------------
             trt |   .715502   .9554699   .030759   .6608079  -1.030254   2.722082
           _cons | -2.280574   .8439397   .027449  -2.201657  -4.144231  -.8965447
    ------------------------------------------------------------------------------
    
    . 
    . local trt_mean = r(summary)[1, 1]        // posterior mean of trt
    
    . local trt_variance = r(summary)[1, 2]^2  // square of posterior std dev for trt
    
    . local cons_mean = r(summary)[2, 1]       // posterior mean of _cons
    
    . local cons_variance = r(summary)[2, 2]^2 // square of posterior std dev for _cons
    
    . 
    . display `trt_mean'
    .71550199
    
    . display `trt_variance'
    .91292272
    
    . display `cons_mean'
    -2.2805745
    
    . display `cons_variance'
    .71223416
    
    . 
    . // ##################################
    . 
    . // Step 3: Fit Bayesian model in data=1, using posterior results from Step 1 as priors
    . bayes, prior({highbp:trt}, normal(`trt_mean', `trt_variance')) ///
    >        prior({highbp:_cons}, normal(`cons_mean', `cons_variance')) : logit highbp trt if data == 1, or
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood: 
      highbp ~ logit(xb_highbp)
    
    Priors: 
        {highbp:trt} ~ normal(.7155019937060871,.9129227171295571)             (1)
      {highbp:_cons} ~ normal(-2.280574490467969,.71223415887348)              (1)
    ------------------------------------------------------------------------------
    (1) Parameters are elements of the linear form xb_highbp.
    
    Bayesian logistic regression                     MCMC iterations  =     12,500
    Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =         50
                                                     Acceptance rate  =      .2595
                                                     Efficiency:  min =      .1442
                                                                  avg =      .1445
    Log marginal-likelihood = -14.502944                          max =      .1447
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
          highbp |Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]
    -------------+----------------------------------------------------------------
             trt |  1.666449    1.26166   .033219    1.34577   .3618972   4.878666
           _cons |  .0702522   .0396812   .001043   .0629117   .0192435   .1638918
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    . 
    . // ##################################
    . 
    . // Step 4: Compare with Bayesian analysis on full dataset (data = 0 + data = 1)
    . // This model uses uninformative priors, as in Step 1
    . bayes, prior({highbp:trt}, normal(0, 10000)) ///
    >        prior({highbp:_cons}, normal(0, 10000)) : logit highbp trt, or
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood: 
      highbp ~ logit(xb_highbp)
    
    Prior: 
      {highbp:trt _cons} ~ normal(0,10000)                                     (1)
    ------------------------------------------------------------------------------
    (1) Parameters are elements of the linear form xb_highbp.
    
    Bayesian logistic regression                     MCMC iterations  =     12,500
    Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =        100
                                                     Acceptance rate  =      .2048
                                                     Efficiency:  min =      .1112
                                                                  avg =      .1189
    Log marginal-likelihood = -45.624597                          max =      .1266
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
          highbp |Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]
    -------------+----------------------------------------------------------------
             trt |  2.556934   2.864324   .085913   1.790165   .4753463    8.62226
           _cons |  .0893983   .0540235   .001518   .0773762   .0178525   .2237933
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    . 
    . // Step 5: Compare with frequentist approach
    . logit highbp trt, or
    
    Iteration 0:  Log likelihood = -34.651534  
    Iteration 1:  Log likelihood = -34.393309  
    Iteration 2:  Log likelihood = -34.390499  
    Iteration 3:  Log likelihood = -34.390498  
    
    Logistic regression                                     Number of obs =    100
                                                            LR chi2(1)    =   0.52
                                                            Prob > chi2   = 0.4700
    Log likelihood = -34.390498                             Pseudo R2     = 0.0075
    
    ------------------------------------------------------------------------------
          highbp | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             trt |   1.648484   1.172543     0.70   0.482     .4089194    6.645567
           _cons |   .0882353   .0531426    -4.03   0.000     .0271005    .2872809
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    

    Thank you for your help!
Working...
X