Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replicate this simple (Bayesian) calculation in Stata


    Dear Forum Members,

    I came a cross a quite simple example of Bayesian analysis (from the book Think Bayes, written by Allen Downeyfor Python users) which can be done directly, I mean, just by using the Bayes' theorem.

    In short, there are 2 baskets with cookies. In basket 1, 30 vanilla cookies and 10 chocolate cookies. In basket 2, 20 vanilla cookies and 20 chocolate cookies.

    Below, I generate a dataset accordingly:

    Code:
    input basket str20 cookie freq
    1 vanilla 30
    1 chocolate 10
    2 vanilla 20
    2 chocolate 20
    end
    expand freq
    encode cookie, gen(vanilla)
    drop freq
    The question is: What is the probability of having the basket 1 if I've got a vanilla cookie?

    Using the Bayes formula, we get: Posterior = (3/4 *1/2)/ (50/80) = 0.35/0.625 = 0.56 = 56%

    That said, I wish to perform the estimation in Stata.

    I tryed hard with - bayesmh -, to no avail so far.

    Thank you in advance for any advice.

    Best regards,

    Marcos

  • #2
    Hi Marcos,

    The example serves to highlight that one may use Bayes' Theorem directly to calculate the probability exactly. I'll take a shot to help you out, though I am still a relative novice to Bayesian estimation. If someone finds a mistake, I will be happily corrected.

    We can show that the posterior probability for P(Basket 1 | vanilla) may be computed as follows:

    Code:
    P(Basket 1 | vanilla) = Pr(vanilla | basket 1) * Pr(basket 1) / Pr(vanilla)
    where Pr(vanilla) = Pr(vanilla | basket 1) * Pr(basket 1) +Pr(vanilla | basket 2) * Pr(basket 2), by the law of total probability
    = (30 / 40) * (40 / 80) / [(30/40)*(40/80) + (20/40)*(40/80)]
    = (3/4) * (1/2) / (5/8)
    = 3/5
    = 0.6
    A short-cut to modeling can be achieved by modeling the outcome as basket 1, and vanilla cookies as the single covariate. This is nice for estimation, but hides the learning process.

    Code:
    input basket str20 cookie freq
    1 vanilla 30
    1 chocolate 10
    2 vanilla 20
    2 chocolate 20
    end
    expand freq
    gen byte vanilla = cookie=="vanilla"
    drop freq
    gen byte b1 = basket==1
    
    bayes, rseed(2141): logit b1 i.vanilla
    
    (output omitted)
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
              b1 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
       1.vanilla |  1.131882   .4960571   .014503   1.116939    .184443   2.092777
           _cons | -.7280998   .3917349   .012005  -.7134492   -1.48863   .0107272
    ------------------------------------------------------------------------------
    Computing the linear prediction for vanilla cookies, then back-transforming gives the expected 0.60, even with default priors and sampling error.

    However, the above code doesn't directly reflect how the problem was posed, which was as two competing hypothetical models (or baskets of cookies). We can construct that model now with -bayesmh-, where I assume that the proportion of cookies from each basket are modeled as Bernoulli random variables, each with their own parameter, -theta-. Then I have placed flat priors on each of those parameters for no particular reason other than it assumes little about the prior distribution of cookies in baskets. The -block- statements are placed to make the estimation within Stata faster because each model may be thought of as orthogonal to each other and so their estimates are independent of each other.

    Code:
    . bayesmh (M1: vanilla if basket==1, likelihood(dbernoulli({theta1}))) (M2: vanilla if basket==2, likelihood(dbernoulli({theta2}))), prior({theta1}, beta(1, 1)) prior({theta2}, beta(1, 1)) block({theta1}) block({theta2}) blocksummary
    
     (output omitted)
    
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
          theta1 |   .738461   .0648749   .001368   .7434887   .6021477   .8538606
          theta2 |  .5008934   .0767359   .001682   .5017964   .3494025    .648646
    ------------------------------------------------------------------------------
    The posterior means are reasonably close to their true values, which is reassuring that we have estimated the right model. Now we can return to your original question, which was to know the conditional probability of choosing a cookie from basket 1 if you pulled a vanilla cookie. To do this, you would take the observed data and each posterior estimate and plug them directly into Bayes Theorem.

    Code:
    . tempfile memhold
    . bayesmh (M1: vanilla if basket==1, likelihood(dbernoulli({theta1}))) (M2: vanilla if basket==2, likelihood(dbernoulli({theta2}))), prior({theta1}, beta(1, 1)) prior({theta2}, beta(1, 1)) block({theta1}) block({theta2}) blocksummary saving(`memhold', replace)
    . est sto Model
    . frame create Res
    . frame change Res
    . use `memhold'
    . expand _freq
    . drop _*
    . rename (eq0_p*) (theta*)
    
    . gen double myprob = theta1 / (theta1 + theta2) // after factoring out common value for Pr(basket 1)=1/2
    . label var myprob "Pr[basket 1 | vanilla]"
    . summ myprob
    
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          myprob |     10,000    .5970368    .0431613   .4500239   .7766708
    And this does recover the expected 0.60 probability.

    Comment


    • #3
      Leonardo Guizzetti : thank you very much for the reply and for the insightful comments.

      Thank you also for noticing my mistyping (0.35 where it should be 0.375, and that gave wrongly 0.56 instead of 0.60).

      I just have a question with regard to your comment:

      1)

      Computing the linear prediction for vanilla cookies, then back-transforming gives the expected 0.60, even with default priors and sampling error.
      By this, you mean something under the Bayesian umbrella? If so, please let me know the command.

      Or just this, under the frequentist approach:

      Code:
      quiet logit basket1 i.vanilla
      margins vanilla
      
      Adjusted predictions                            Number of obs     =         80
      Model VCE    : OIM
      
      Expression   : Pr(basket1), predict()
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           vanilla |
                0  |   .3333333   .0860663     3.87   0.000     .1646465    .5020202
                1  |         .6    .069282     8.66   0.000     .4642097    .7357903
      ------------------------------------------------------------------------------
      Under the frequentist umbrella, I still get the proportion of 60%, but no Bayesian output.

      The same with the - bayesmh - command - under the tempfile code.

      Just a small note, I'm still using Stata 15.1 (I should have informed this beforehand). Unfortunately, I cannot deal with - frames - command for now.

      Thank you again.
      Last edited by Marcos Almeida; 08 Mar 2020, 07:19.
      Best regards,

      Marcos

      Comment


      • #4
        Hi Marcos,

        Certainly I can clarify what I have written. The first code block you ask about (my shortcut) gives the regression table as follows:

        Code:
        ------------------------------------------------------------------------------
                     |                                                Equal-tailed
                  b1 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
        -------------+----------------------------------------------------------------
           1.vanilla |  1.131882   .4960571   .014503   1.116939    .184443   2.092777
               _cons | -.7280998   .3917349   .012005  -.7134492   -1.48863   .0107272
        ------------------------------------------------------------------------------
        This is a logistic regression model output, so the coefficients are interpreted the same as if you I had used -logit- and received exp(b), and coefficients may be linearly combined in the same way. For concreteness, to get the expected proportion of cookies in basket 1 that are vanilla:

        Code:
        * add mean estimate of _cons  + 1.vanilla, yielding linear prediction on log-odds scale for vanilla cookies, then backtransform to the probability scale using the inverse-logit function.
        . di invlogit(-0.7280998 + 1.131882)
        .59959603
        Unfortunately, accessing parameters from Bayesian estimations in Stata do have the convenient -_b[]- notation because there is no single estimate (there are many of them resulting from the MCMC chans). The means displayed in the above table are returned in the -e(means)- matrix, so you could programmatically access that if it suits your purposes. Note also that -margins- will not work after Bayesian estimation commands, though you have already recovered the same information in this toy example.

        Second, let's look at the tempfile block. This can be modified for use without frames easily enough. I will do so here.

        Code:
        set seed 586575
        tempfile obsdata mcmcdata
        bayesmh (M1: vanilla if basket==1, likelihood(dbernoulli({theta1}))) (M2: vanilla if basket==2, likelihood(dbernoulli({theta2}))), prior({theta1}, beta(1, 1)) prior({theta2}, beta(1, 1)) block({theta1}) block({theta2}) blocksummary saving(`mcmcdata', replace)
        
        ------------------------------------------------------------------------------
        | Equal-tailed
        | Mean Std. Dev. MCSE Median [95% Cred. Interval]
        -------------+----------------------------------------------------------------
        theta1 | .7379666 .0667479 .00152 .7401983 .5977132 .8570518
        theta2 | .4971312 .0756229 .001542 .4974695 .3508224 .6439335
        ------------------------------------------------------------------------------
        
        
        est sto Model
        save `obsdata', replace
        
        clear
        use `mcmcdata'
        expand _freq
        drop _*
        rename (eq0_p*) (theta*)
        
        gen double myprob = theta1 / (theta1 + theta2) // after factoring out common value for Pr(basket 1)=1/2
        label var myprob "Pr[basket 1 | vanilla]"
        summ myprop
        There's one key difference with the model parameterized this way using -bayesmh-. The cookies pulled form each basket are modeled as independent Bernoulli trials, with each bsaket having its own probability of being vanilla (theta1 and theta2). These are probabilities already, and so there is no need to further manipulate them to understand their meaning. In this case, they are reasonably close to the maximum likelihood estimates (75% and 50%, respectively). As these are already probabilities, I can plug each into Bayes' theorem to give a posterior distribution for Pr(basket 1 | vanilla). With these estimates, you now have a distribution that you can summarize as you would any other Bayesian paramter (find credible intervals, estimate the mean/median/mode, etc.).
        Last edited by Leonardo Guizzetti; 08 Mar 2020, 07:56.

        Comment


        • #5
          Leonardo Guizzetti Thank you very much for the clarification, the additional remarks and the code!
          Best regards,

          Marcos

          Comment


          • #6
            This is just a following of the discussion. Now that I learned how to get the posterior of this simple example with cookies, I'm still struggling to get the CRIs, acceptance rate, etc.

            For example, if I sample from the posterior distribution of the theta (concerning vanilla cookies), I get this:

            Code:
            . bayes, rseed(2141) saving(testing2): logit basket1 i.vanilla
              
            Burn-in ...
            Simulation ...
            
            file testing2.dta saved
            
            Model summary
            ------------------------------------------------------------------------------
            Likelihood:
              basket1 ~ logit(xb_basket1)
            
            Prior:
              {basket1:1.vanilla _cons} ~ normal(0,10000)                              (1)
            ------------------------------------------------------------------------------
            (1) Parameters are elements of the linear form xb_basket1.
            
            Bayesian logistic regression                     MCMC iterations  =     12,500
            Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                             MCMC sample size =     10,000
                                                             Number of obs    =         80
                                                             Acceptance rate  =       .249
                                                             Efficiency:  min =      .1065
                                                                          avg =      .1117
            Log marginal likelihood = -64.108202                          max =       .117
             
            ------------------------------------------------------------------------------
                         |                                                Equal-tailed
                 basket1 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
            -------------+----------------------------------------------------------------
               1.vanilla |  1.131882   .4960571   .014503   1.116939    .184443   2.092777
                   _cons | -.7280998   .3917349   .012005  -.7134492   -1.48863   .0107272
            ------------------------------------------------------------------------------
            Note: Default priors are used for model parameters.
            
            */ If I try to get the CRIs according to the formula, I get this:
            
            . di invlogit(-0.7280998 + 1.131882)
            .59959603
            
            . di "CRIS are = " invlogit(-1.48863 + 0.184443) "   and   "  invlogit(0.0107272 + 2.09277) "   respectively"
            CRIS are = .2134612   and   .89124262   respectively
            
            .*/ Therefore, quite a large CRI
            
            . */Checking what happens if I try to sample from the posterior distribution of theta,
            . clear
            
            . use "C:\Users\Marcos\Documents\Statistics\ExercĂ­cios Stata\testing1.dta"
            
            . bayes: regress eq1_p2
              
            Burn-in ...
            Simulation ...
            
            Model summary
            ------------------------------------------------------------------------------
            Likelihood:
              eq1_p2 ~ regress({eq1_p2:_cons},{sigma2})
            
            Priors:
              {eq1_p2:_cons} ~ normal(0,10000)
                    {sigma2} ~ igamma(.01,.01)
            ------------------------------------------------------------------------------
            
            Bayesian linear regression                       MCMC iterations  =     12,500
            Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                             MCMC sample size =     10,000
                                                             Number of obs    =      5,933
                                                             Acceptance rate  =      .4293
                                                             Efficiency:  min =      .1946
                                                                          avg =      .2143
            Log marginal likelihood =  7136.9868                          max =       .234
             
            ------------------------------------------------------------------------------
                         |                                                Equal-tailed
                         |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
            -------------+----------------------------------------------------------------
            eq1_p2       |
                   _cons |   .597613   .0009722   .000022   .5976115   .5957271   .5994736
            -------------+----------------------------------------------------------------
                  sigma2 |  .0052507    .000097   2.0e-06   .0052499   .0050616   .0054452
            ------------------------------------------------------------------------------
            Note: Default priors are used for model parameters.
            In short, in spite of getting the aproximately "correct" proportion (round 0.6), the credible intervals are rather short and seemingly wrong (yes, they come from the MCSE sampling. By "wrong", I meant the model's CRIs).


            On account of this issues, I decided to go a different way. I used - regress - with - no baseline for vanilla plus the -nocons - option:

            Code:
            . bayes, rseed(2141) : regress basket1 ibn.vanilla, nocons
              
            Burn-in ...
            Simulation ...
            
            Model summary
            ------------------------------------------------------------------------------
            Likelihood:
              basket1 ~ regress(xb_basket1,{sigma2})
            
            Priors:
              {basket1:i.vanilla} ~ normal(0,10000)                                    (1)
                         {sigma2} ~ igamma(.01,.01)
            ------------------------------------------------------------------------------
            (1) Parameters are elements of the linear form xb_basket1.
            
            Bayesian linear regression                       MCMC iterations  =     12,500
            Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                             MCMC sample size =     10,000
                                                             Number of obs    =         80
                                                             Acceptance rate  =      .3477
                                                             Efficiency:  min =     .09442
                                                                          avg =      .1274
            Log marginal likelihood = -75.067316                          max =      .1823
             
            ------------------------------------------------------------------------------
                         |                                                Equal-tailed
                         |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
            -------------+----------------------------------------------------------------
            basket1      |
                 vanilla |
                      0  |  .3311991   .0905553   .002947   .3287412   .1606548   .5059707
                      1  |  .5968769   .0696866   .002146   .5954611   .4584318    .731114
            -------------+----------------------------------------------------------------
                  sigma2 |  .2457587   .0409932    .00096   .2410068   .1786182   .3390889
            ------------------------------------------------------------------------------
            Note: Default priors are used for model parameters.
            By doing so, the CRIS are similar to the margins approach under the frequentist frame (shown in #3), hence we may have produced a complete (and propably correct) Bayesian output (not only the mean proportion, but also the median, CRIs, acceptance rate, etc.).

            To end, I wish to ask whether this can be taken as an appropriate approach, or not.
            Last edited by Marcos Almeida; 09 Mar 2020, 05:59.
            Best regards,

            Marcos

            Comment


            • #7
              By the way, if we add Gibbs sampling, we get practically the same results shown in #3 (under the frequentist frame). Furthermore, the acceptance rate as wel as the efficiency provided quite high values:

              Code:
              . bayes, rseed(2141) gibbs: regress basket1 ibn.vanilla, nocons
                
              Burn-in ...
              Simulation ...
              
              Model summary
              ------------------------------------------------------------------------------
              Likelihood:
                basket1 ~ normal(xb_basket1,{sigma2})
              
              Priors:
                {basket1:i.vanilla} ~ normal(0,10000)                                    (1)
                           {sigma2} ~ igamma(.01,.01)
              ------------------------------------------------------------------------------
              (1) Parameters are elements of the linear form xb_basket1.
              
              Bayesian linear regression                       MCMC iterations  =     12,500
              Gibbs sampling                                   Burn-in          =      2,500
                                                               MCMC sample size =     10,000
                                                               Number of obs    =         80
                                                               Acceptance rate  =          1
                                                               Efficiency:  min =      .9543
                                                                            avg =      .9848
              Log marginal likelihood = -75.067285                          max =          1
               
              ------------------------------------------------------------------------------
                           |                                                Equal-tailed
                           |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
              -------------+----------------------------------------------------------------
              basket1      |
                   vanilla |
                        0  |  .3340963   .0908025   .000908   .3336246   .1555174   .5137073
                        1  |  .6004836   .0701764   .000702   .6001525   .4622256    .738848
              -------------+----------------------------------------------------------------
                    sigma2 |  .2460556   .0404214   .000414   .2419893    .179176    .337767
              ------------------------------------------------------------------------------
              Note: Default priors are used for model parameters.
              Last edited by Marcos Almeida; 09 Mar 2020, 06:07.
              Best regards,

              Marcos

              Comment


              • #8
                Hi Marcos,

                I think there is a misunderstanding that I may have helped mislead you to, quite accidentally. With this Bayesian model, we have departed from exact application of Bayes theorem (in which there is precisely one answer from one set of observations and problem set up).

                Instead, we moved into a realm of simulations in which we infer about the model based on the observed data and a general idea of the model structure (two distinct Bernoulli random processes that generate vanilla or chocolate cookies). In doing so, we had to assign prior information (priors) to the values of the model parameters (theta1 and theta2). Here the assumption I made was a flat prior over the support of the Bernoulli variable [0,1], which computationally is turned into a Beta(1,1) distribution. The Beta distribution is used for two very convenient reasons. Mainly because the Beta prior is conjugate to a Bernoulli probability distribution, which makes the math convenient to write out. Secondly, the Beta(1,1) gives a uniform prior over the support range of [0,1].

                If one were to solve for the posterior by exact mathematical analysis, in a one basket world, the givens are:
                * N identical Bernoulli trials with x successes -- say 30 vanilla and 10 chocolate cookies from Basket 1
                * Beta(a, b) prior on theta -- say Beta(1,1)
                * posterior distribution given by Beta(x + a, N - x + b)
                ( A proof of said calculation may be found many places, the one I'm referring to comes from Krushke's Doing Bayesian Data Analaysis, 2nd edition).

                What this means is that even though we started with a vague, low-information prior, it has influenced the posterior by essentially adding pseudo-observations to our data. In this case, 1 pseudo-observation to the number of vanilla cookies and one more to the chocolate ones. The expected mean of this Beta(31, 11) distribution is a/(a+b) = 31/42 = 0.7381 (approx.). This is the value recovered in #4 for theta1 (again with sampling error), which happens to closely agree with the expected estimate of 0.75 from Bayes theorem.

                With more data, like say 10 times as many cookies, the data information will overpower the prior information. The expected posterior mean would be 301/402 ~= 0.60. This is a good thing and seen as a huge benefit of Bayesian analysis over frequentist/likelihoodist analysis because it means that informative data should be able to change prior beliefs. In this case, more information will cause results to asymptotically converge.

                In a general sense, there's no reason for Bayesian vs frequentist results to exactly, or even closely, agree. It depends heavily on the given information, the priors and the likelihood function.

                Comment


                • #9
                  Some additional comments:

                  * Gibbs sampling is generally more efficient than Metropolis-Hastings -- this is dependent on the shape of posterior though

                  * the approach in #6 using -bayes, rseed(2141) : regress basket1 ibn.vanilla, nocons- is not strictly appropriate for the same reasons one would not use OLS regression for binary outcomes in the frequentist paradigm. The model will happen to be well enough behaved because its a toy model with probabilities close to the middle of its range. The expected value here will also agree with 60% because the prior information adds a mean zero value to the data and does not add pseudo-observations in an analogous way as the Beta-Binomial model. That doesn't make the model more correct, just differently wrong, per se. (The Beta-Binomial model is wrong in the sense that we have assumed a data generating mechanism that is Bernoulli distributed, which is not true about how one makes cookies, but is a useful thought exercise.)

                  * to see the influence of the prior, try change the prior under the -bayesmh- approach to say Beta(20, 2) and Beta(5, 15) for theta1 and theta2, respectively. See what that does to your estimates of the thetas.

                  Comment


                  • #10
                    Leonardo Guizzetti Thank you again for the reply and additional comments in #8 and #9.

                    Best regards,

                    Marcos

                    Comment


                    • #11
                      You're welcome Marcos Almeida

                      Comment

                      Working...
                      X