How to replicate this simple (Bayesian) calculation in Stata

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#1

How to replicate this simple (Bayesian) calculation in Stata

07 Mar 2020, 08:22

Dear Forum Members,

I came a cross a quite simple example of Bayesian analysis (from the book Think Bayes, written by Allen Downeyfor Python users) which can be done directly, I mean, just by using the Bayes' theorem.

In short, there are 2 baskets with cookies. In basket 1, 30 vanilla cookies and 10 chocolate cookies. In basket 2, 20 vanilla cookies and 20 chocolate cookies.

Below, I generate a dataset accordingly:

Code:

input basket str20 cookie freq 1 vanilla 30 1 chocolate 10 2 vanilla 20 2 chocolate 20 end expand freq encode cookie, gen(vanilla) drop freq

The question is: What is the probability of having the basket 1 if I've got a vanilla cookie?

Using the Bayes formula, we get: Posterior = (3/4 *1/2)/ (50/80) = 0.35/0.625 = 0.56 = 56%

That said, I wish to perform the estimation in Stata.

I tryed hard with - bayesmh -, to no avail so far.

Thank you in advance for any advice.

Best regards,

Marcos
Tags: None

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2389

07 Mar 2020, 21:31

Hi Marcos,

The example serves to highlight that one may use Bayes' Theorem directly to calculate the probability exactly. I'll take a shot to help you out, though I am still a relative novice to Bayesian estimation. If someone finds a mistake, I will be happily corrected.

We can show that the posterior probability for P(Basket 1 | vanilla) may be computed as follows:

Code:

P(Basket 1 | vanilla) = Pr(vanilla | basket 1) * Pr(basket 1) / Pr(vanilla)
where Pr(vanilla) = Pr(vanilla | basket 1) * Pr(basket 1) +Pr(vanilla | basket 2) * Pr(basket 2), by the law of total probability
= (30 / 40) * (40 / 80) / [(30/40)*(40/80) + (20/40)*(40/80)]
= (3/4) * (1/2) / (5/8)
= 3/5
= 0.6

A short-cut to modeling can be achieved by modeling the outcome as basket 1, and vanilla cookies as the single covariate. This is nice for estimation, but hides the learning process.

Code:

input basket str20 cookie freq
1 vanilla 30
1 chocolate 10
2 vanilla 20
2 chocolate 20
end
expand freq
gen byte vanilla = cookie=="vanilla"
drop freq
gen byte b1 = basket==1

bayes, rseed(2141): logit b1 i.vanilla

(output omitted)
------------------------------------------------------------------------------
             |                                                Equal-tailed
          b1 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
   1.vanilla |  1.131882   .4960571   .014503   1.116939    .184443   2.092777
       _cons | -.7280998   .3917349   .012005  -.7134492   -1.48863   .0107272
------------------------------------------------------------------------------

Computing the linear prediction for vanilla cookies, then back-transforming gives the expected 0.60, even with default priors and sampling error.

However, the above code doesn't directly reflect how the problem was posed, which was as two competing hypothetical models (or baskets of cookies). We can construct that model now with -bayesmh-, where I assume that the proportion of cookies from each basket are modeled as Bernoulli random variables, each with their own parameter, -theta-. Then I have placed flat priors on each of those parameters for no particular reason other than it assumes little about the prior distribution of cookies in baskets. The -block- statements are placed to make the estimation within Stata faster because each model may be thought of as orthogonal to each other and so their estimates are independent of each other.

Code:

. bayesmh (M1: vanilla if basket==1, likelihood(dbernoulli({theta1}))) (M2: vanilla if basket==2, likelihood(dbernoulli({theta2}))), prior({theta1}, beta(1, 1)) prior({theta2}, beta(1, 1)) block({theta1}) block({theta2}) blocksummary

 (output omitted)

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
      theta1 |   .738461   .0648749   .001368   .7434887   .6021477   .8538606
      theta2 |  .5008934   .0767359   .001682   .5017964   .3494025    .648646
------------------------------------------------------------------------------

The posterior means are reasonably close to their true values, which is reassuring that we have estimated the right model. Now we can return to your original question, which was to know the conditional probability of choosing a cookie from basket 1 if you pulled a vanilla cookie. To do this, you would take the observed data and each posterior estimate and plug them directly into Bayes Theorem.

Code:

. tempfile memhold
. bayesmh (M1: vanilla if basket==1, likelihood(dbernoulli({theta1}))) (M2: vanilla if basket==2, likelihood(dbernoulli({theta2}))), prior({theta1}, beta(1, 1)) prior({theta2}, beta(1, 1)) block({theta1}) block({theta2}) blocksummary saving(`memhold', replace)
. est sto Model
. frame create Res
. frame change Res
. use `memhold'
. expand _freq
. drop _*
. rename (eq0_p*) (theta*)

. gen double myprob = theta1 / (theta1 + theta2) // after factoring out common value for Pr(basket 1)=1/2
. label var myprob "Pr[basket 1 | vanilla]"
. summ myprob


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      myprob |     10,000    .5970368    .0431613   .4500239   .7766708

And this does recover the expected 0.60 probability.

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

08 Mar 2020, 07:15

Leonardo Guizzetti : thank you very much for the reply and for the insightful comments.

Thank you also for noticing my mistyping (0.35 where it should be 0.375, and that gave wrongly 0.56 instead of 0.60).

I just have a question with regard to your comment:

1)

Computing the linear prediction for vanilla cookies, then back-transforming gives the expected 0.60, even with default priors and sampling error.

By this, you mean something under the Bayesian umbrella? If so, please let me know the command.

Or just this, under the frequentist approach:

Code:

quiet logit basket1 i.vanilla margins vanilla Adjusted predictions Number of obs = 80 Model VCE : OIM Expression : Pr(basket1), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- vanilla | 0 | .3333333 .0860663 3.87 0.000 .1646465 .5020202 1 | .6 .069282 8.66 0.000 .4642097 .7357903 ------------------------------------------------------------------------------

Under the frequentist umbrella, I still get the proportion of 60%, but no Bayesian output.

The same with the - bayesmh - command - under the tempfile code.

Just a small note, I'm still using Stata 15.1 (I should have informed this beforehand). Unfortunately, I cannot deal with - frames - command for now.

Thank you again.

Last edited by Marcos Almeida; 08 Mar 2020, 07:19.

Best regards,

Marcos
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#4

08 Mar 2020, 07:54

Hi Marcos,

Certainly I can clarify what I have written. The first code block you ask about (my shortcut) gives the regression table as follows:

Code:

------------------------------------------------------------------------------ | Equal-tailed b1 | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- 1.vanilla | 1.131882 .4960571 .014503 1.116939 .184443 2.092777 _cons | -.7280998 .3917349 .012005 -.7134492 -1.48863 .0107272 ------------------------------------------------------------------------------

This is a logistic regression model output, so the coefficients are interpreted the same as if you I had used -logit- and received exp(b), and coefficients may be linearly combined in the same way. For concreteness, to get the expected proportion of cookies in basket 1 that are vanilla:

Code:

* add mean estimate of _cons + 1.vanilla, yielding linear prediction on log-odds scale for vanilla cookies, then backtransform to the probability scale using the inverse-logit function. . di invlogit(-0.7280998 + 1.131882) .59959603

Unfortunately, accessing parameters from Bayesian estimations in Stata do have the convenient -_b[]- notation because there is no single estimate (there are many of them resulting from the MCMC chans). The means displayed in the above table are returned in the -e(means)- matrix, so you could programmatically access that if it suits your purposes. Note also that -margins- will not work after Bayesian estimation commands, though you have already recovered the same information in this toy example.

Second, let's look at the tempfile block. This can be modified for use without frames easily enough. I will do so here.

Code:

set seed 586575 tempfile obsdata mcmcdata bayesmh (M1: vanilla if basket==1, likelihood(dbernoulli({theta1}))) (M2: vanilla if basket==2, likelihood(dbernoulli({theta2}))), prior({theta1}, beta(1, 1)) prior({theta2}, beta(1, 1)) block({theta1}) block({theta2}) blocksummary saving(`mcmcdata', replace) ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- theta1 | .7379666 .0667479 .00152 .7401983 .5977132 .8570518 theta2 | .4971312 .0756229 .001542 .4974695 .3508224 .6439335 ------------------------------------------------------------------------------ est sto Model save `obsdata', replace clear use `mcmcdata' expand _freq drop _* rename (eq0_p*) (theta*) gen double myprob = theta1 / (theta1 + theta2) // after factoring out common value for Pr(basket 1)=1/2 label var myprob "Pr[basket 1 | vanilla]" summ myprop

There's one key difference with the model parameterized this way using -bayesmh-. The cookies pulled form each basket are modeled as independent Bernoulli trials, with each bsaket having its own probability of being vanilla (theta1 and theta2). These are probabilities already, and so there is no need to further manipulate them to understand their meaning. In this case, they are reasonably close to the maximum likelihood estimates (75% and 50%, respectively). As these are already probabilities, I can plug each into Bayes' theorem to give a posterior distribution for Pr(basket 1 | vanilla). With these estimates, you now have a distribution that you can summarize as you would any other Bayesian paramter (find credible intervals, estimate the mean/median/mode, etc.).

Last edited by Leonardo Guizzetti; 08 Mar 2020, 07:56.
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

08 Mar 2020, 17:37

Leonardo Guizzetti Thank you very much for the clarification, the additional remarks and the code!

Best regards,

Marcos
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

09 Mar 2020, 05:51

This is just a following of the discussion. Now that I learned how to get the posterior of this simple example with cookies, I'm still struggling to get the CRIs, acceptance rate, etc.

For example, if I sample from the posterior distribution of the theta (concerning vanilla cookies), I get this:

Code:

. bayes, rseed(2141) saving(testing2): logit basket1 i.vanilla
  
Burn-in ...
Simulation ...

file testing2.dta saved

Model summary
------------------------------------------------------------------------------
Likelihood:
  basket1 ~ logit(xb_basket1)

Prior:
  {basket1:1.vanilla _cons} ~ normal(0,10000)                              (1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_basket1.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         80
                                                 Acceptance rate  =       .249
                                                 Efficiency:  min =      .1065
                                                              avg =      .1117
Log marginal likelihood = -64.108202                          max =       .117
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
     basket1 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
   1.vanilla |  1.131882   .4960571   .014503   1.116939    .184443   2.092777
       _cons | -.7280998   .3917349   .012005  -.7134492   -1.48863   .0107272
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.

*/ If I try to get the CRIs according to the formula, I get this:

. di invlogit(-0.7280998 + 1.131882)
.59959603

. di "CRIS are = " invlogit(-1.48863 + 0.184443) "   and   "  invlogit(0.0107272 + 2.09277) "   respectively"
CRIS are = .2134612   and   .89124262   respectively

.*/ Therefore, quite a large CRI

. */Checking what happens if I try to sample from the posterior distribution of theta,
. clear

. use "C:\Users\Marcos\Documents\Statistics\Exercícios Stata\testing1.dta"

. bayes: regress eq1_p2
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  eq1_p2 ~ regress({eq1_p2:_cons},{sigma2})

Priors:
  {eq1_p2:_cons} ~ normal(0,10000)
        {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      5,933
                                                 Acceptance rate  =      .4293
                                                 Efficiency:  min =      .1946
                                                              avg =      .2143
Log marginal likelihood =  7136.9868                          max =       .234
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
eq1_p2       |
       _cons |   .597613   .0009722   .000022   .5976115   .5957271   .5994736
-------------+----------------------------------------------------------------
      sigma2 |  .0052507    .000097   2.0e-06   .0052499   .0050616   .0054452
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.

In short, in spite of getting the aproximately "correct" proportion (round 0.6), the credible intervals are rather short and seemingly wrong (yes, they come from the MCSE sampling. By "wrong", I meant the model's CRIs).

On account of this issues, I decided to go a different way. I used - regress - with - no baseline for vanilla plus the -nocons - option:

Code:

. bayes, rseed(2141) : regress basket1 ibn.vanilla, nocons
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  basket1 ~ regress(xb_basket1,{sigma2})

Priors:
  {basket1:i.vanilla} ~ normal(0,10000)                                    (1)
             {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_basket1.

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         80
                                                 Acceptance rate  =      .3477
                                                 Efficiency:  min =     .09442
                                                              avg =      .1274
Log marginal likelihood = -75.067316                          max =      .1823
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
basket1      |
     vanilla |
          0  |  .3311991   .0905553   .002947   .3287412   .1606548   .5059707
          1  |  .5968769   .0696866   .002146   .5954611   .4584318    .731114
-------------+----------------------------------------------------------------
      sigma2 |  .2457587   .0409932    .00096   .2410068   .1786182   .3390889
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.

By doing so, the CRIS are similar to the margins approach under the frequentist frame (shown in #3), hence we may have produced a complete (and propably correct) Bayesian output (not only the mean proportion, but also the median, CRIs, acceptance rate, etc.).

To end, I wish to ask whether this can be taken as an appropriate approach, or not.

Last edited by Marcos Almeida; 09 Mar 2020, 05:59.

Best regards,

Marcos

Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

09 Mar 2020, 06:05

By the way, if we add Gibbs sampling, we get practically the same results shown in #3 (under the frequentist frame). Furthermore, the acceptance rate as wel as the efficiency provided quite high values:

Code:

. bayes, rseed(2141) gibbs: regress basket1 ibn.vanilla, nocons
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  basket1 ~ normal(xb_basket1,{sigma2})

Priors:
  {basket1:i.vanilla} ~ normal(0,10000)                                    (1)
             {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_basket1.

Bayesian linear regression                       MCMC iterations  =     12,500
Gibbs sampling                                   Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         80
                                                 Acceptance rate  =          1
                                                 Efficiency:  min =      .9543
                                                              avg =      .9848
Log marginal likelihood = -75.067285                          max =          1
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
basket1      |
     vanilla |
          0  |  .3340963   .0908025   .000908   .3336246   .1555174   .5137073
          1  |  .6004836   .0701764   .000702   .6001525   .4622256    .738848
-------------+----------------------------------------------------------------
      sigma2 |  .2460556   .0404214   .000414   .2419893    .179176    .337767
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.

Last edited by Marcos Almeida; 09 Mar 2020, 06:07.

Best regards,

Marcos

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#8

09 Mar 2020, 07:41

Hi Marcos,

I think there is a misunderstanding that I may have helped mislead you to, quite accidentally. With this Bayesian model, we have departed from exact application of Bayes theorem (in which there is precisely one answer from one set of observations and problem set up).

Instead, we moved into a realm of simulations in which we infer about the model based on the observed data and a general idea of the model structure (two distinct Bernoulli random processes that generate vanilla or chocolate cookies). In doing so, we had to assign prior information (priors) to the values of the model parameters (theta1 and theta2). Here the assumption I made was a flat prior over the support of the Bernoulli variable [0,1], which computationally is turned into a Beta(1,1) distribution. The Beta distribution is used for two very convenient reasons. Mainly because the Beta prior is conjugate to a Bernoulli probability distribution, which makes the math convenient to write out. Secondly, the Beta(1,1) gives a uniform prior over the support range of [0,1].

If one were to solve for the posterior by exact mathematical analysis, in a one basket world, the givens are:
* N identical Bernoulli trials with x successes -- say 30 vanilla and 10 chocolate cookies from Basket 1
* Beta(a, b) prior on theta -- say Beta(1,1)
* posterior distribution given by Beta(x + a, N - x + b)
( A proof of said calculation may be found many places, the one I'm referring to comes from Krushke's Doing Bayesian Data Analaysis, 2nd edition).

What this means is that even though we started with a vague, low-information prior, it has influenced the posterior by essentially adding pseudo-observations to our data. In this case, 1 pseudo-observation to the number of vanilla cookies and one more to the chocolate ones. The expected mean of this Beta(31, 11) distribution is a/(a+b) = 31/42 = 0.7381 (approx.). This is the value recovered in #4 for theta1 (again with sampling error), which happens to closely agree with the expected estimate of 0.75 from Bayes theorem.

With more data, like say 10 times as many cookies, the data information will overpower the prior information. The expected posterior mean would be 301/402 ~= 0.60. This is a good thing and seen as a huge benefit of Bayesian analysis over frequentist/likelihoodist analysis because it means that informative data should be able to change prior beliefs. In this case, more information will cause results to asymptotically converge.

In a general sense, there's no reason for Bayesian vs frequentist results to exactly, or even closely, agree. It depends heavily on the given information, the priors and the likelihood function.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#9

09 Mar 2020, 07:48

Some additional comments:

* Gibbs sampling is generally more efficient than Metropolis-Hastings -- this is dependent on the shape of posterior though

* the approach in #6 using -bayes, rseed(2141) : regress basket1 ibn.vanilla, nocons- is not strictly appropriate for the same reasons one would not use OLS regression for binary outcomes in the frequentist paradigm. The model will happen to be well enough behaved because its a toy model with probabilities close to the middle of its range. The expected value here will also agree with 60% because the prior information adds a mean zero value to the data and does not add pseudo-observations in an analogous way as the Beta-Binomial model. That doesn't make the model more correct, just differently wrong, per se. (The Beta-Binomial model is wrong in the sense that we have assumed a data generating mechanism that is Bernoulli distributed, which is not true about how one makes cookies, but is a useful thought exercise.)

* to see the influence of the prior, try change the prior under the -bayesmh- approach to say Beta(20, 2) and Beta(5, 15) for theta1 and theta2, respectively. See what that does to your estimates of the thetas.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

10 Mar 2020, 06:36

Leonardo Guizzetti Thank you again for the reply and additional comments in #8 and #9.

Best regards,

Marcos
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#11

10 Mar 2020, 07:02

You're welcome Marcos Almeida
Comment

Announcement