Bayesian approach: what to do when it differs from the frequentist approach?

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

Bayesian approach: what to do when it differs from the frequentist approach?

23 May 2019, 05:13

Dear Forum Members,

I'm somewhat puzzled by the results given from 2 down-to-earth models, one using - bayes - prefix - and the other under - regress - command.

Below, a toy example:

Code:

. sysuse auto
(1978 Automobile Data)

. su price mpg

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41

. regress price

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(0, 73)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |   635065396        73  8699525.97   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |   635065396        73  8699525.97   Root MSE        =    2949.5

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
------------------------------------------------------------------------------

. regress mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(0, 73)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |  2443.45946        73  33.4720474   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |  2443.45946        73  33.4720474   Root MSE        =    5.7855

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |    21.2973   .6725511    31.67   0.000      19.9569    22.63769
------------------------------------------------------------------------------

. */ Let's check a Bayesian approach

. bayes: regress mpg
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  mpg ~ regress({mpg:_cons},{sigma2})

Priors:
  {mpg:_cons} ~ normal(0,10000)
     {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =       .421
                                                 Efficiency:  min =      .2193
                                                              avg =      .2246
Log marginal likelihood = -244.90098                          max =      .2299
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
mpg          |
       _cons |   21.3047   .6779059   .014139   21.30774   19.93476   22.61922
-------------+----------------------------------------------------------------
      sigma2 |  34.26699   5.788511   .123607   33.69243   24.66202   47.59297
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.

. */ so far, so good

. */ let's check with another variable

. bayes: regress price
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  price ~ regress({price:_cons},{sigma2})

Priors:
  {price:_cons} ~ normal(0,10000)
       {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .6707
                                                 Efficiency:  min =     .01234
                                                              avg =     .09647
Log marginal likelihood = -763.50183                          max =      .1806
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |  99.62768   100.5112   2.36511   102.1462  -98.38006   295.3294
-------------+----------------------------------------------------------------
      sigma2 |  4.66e+07    7703121    693447   4.60e+07   3.36e+07   6.30e+07
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: Adaptation tolerance is not met in at least one of the blocks.

. */ ops, that's quite a difference! We see there is a problem with tolerance...

. */ let's try to "fix" this with a block for the variance plus extra burn-in, etc.

. bayes, block({sigma2}) burnin(10000) mcmcsize(20000) adaptation(tolerance(0.8)) gibbs: regress price
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  price ~ normal({price:_cons},{sigma2})

Priors:
  {price:_cons} ~ normal(0,10000)
       {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     30,000
Metropolis-Hastings and Gibbs sampling           Burn-in          =     10,000
                                                 MCMC sample size =     20,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =          1
                                                 Efficiency:  min =    .001135
                                                              avg =      .5006
Log marginal likelihood =  -860.1099                          max =          1
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |   483.244   95.47607   .675118   482.3642   295.6941    670.667
-------------+----------------------------------------------------------------
      sigma2 |   8699171   90.86103    19.072    8699157    8699031    8699347
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: There is a high autocorrelation after 500 lags.

*/ we see there is high autocorrelation. Let's try to tackle this issue

. bayes, block({sigma2}) burnin(20000) mcmcsize(30000) adaptation(tolerance(0.8)) gibbs: regress price
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  price ~ normal({price:_cons},{sigma2})

Priors:
  {price:_cons} ~ normal(0,10000)
       {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     50,000
Metropolis-Hastings and Gibbs sampling           Burn-in          =     20,000
                                                 MCMC sample size =     30,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =          1
                                                 Efficiency:  min =    .001019
                                                              avg =      .5005
Log marginal likelihood = -859.21765                          max =          1
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |  483.0878    96.4316   .556748   483.0736   291.8467   671.6999
-------------+----------------------------------------------------------------
      sigma2 |   8699465    218.056   39.4317    8699483    8699049    8699774
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: There is a high autocorrelation after 500 lags.

*/ I know I can enlarge the thinning. In order to decrease the time of analysis, I selected a short MCMC size and burn-period, but I chose Gibbs samples and a large thinning

. bayes, block({sigma2}) burnin(250) mcmcsize(1000) adaptation(tolerance(0.8)) thinning(600) gibbs: regress price
note: discarding every 599 sample observations; using observations 1 and 601
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood:
  price ~ normal({price:_cons},{sigma2})

Priors:
  {price:_cons} ~ normal(0,10000)
       {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =    599,651
Metropolis-Hastings and Gibbs sampling           Burn-in          =        250
                                                 MCMC sample size =      1,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =          1
                                                 Efficiency:  min =    .005903
                                                              avg =       .503
Log marginal likelihood = -858.13439                          max =          1
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |   481.385   98.38691   3.11127   481.6254   294.3904   686.2036
-------------+----------------------------------------------------------------
      sigma2 |   8699642   620.4574   255.378    8699604    8698622    8700712
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: Adaptation continues during simulation.

In short, with the variable mpg, I could get quite similar results. However, with the variable price, many attempts were failed. I gather I could reach a better scenario by selecting informative priors.

That being said, with these commands, by selecting the variable price, I found a difference between - regress - and - bayes: regress - above 1000%.

To some extent, high autocorrelation is one of the culprits.

But, according to the Stata Manual:

Once convergence is established, the presence of high autocorrelation will
typically mean low precision for some parameter estimates in the model.
Depending on the magnitudes of the parameters and your research objective,
you may be satisfied with the obtained precision, in which case you can
ignore the reported note. If the level of precision is unacceptable, you
may try to reduce autocorrelation in your model. We recommend you try to
do it even if the level of precision is acceptable to you.

Well, it's crystal clear that the model converged .

This notwithstanding, the disparity is humongous, to say the least.

I wonder why such a dismal disparity happened with price, but not mpg, in the very same data set.

Also, I'd like to know what to do in such cases, for example, when we wish to compare several variables under an uniformative "default" prior.

Last edited by Marcos Almeida; 23 May 2019, 05:21.

Best regards,

Marcos

Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3824

23 May 2019, 05:58

This is discussed in Example 4 in the pdf documentation of the bayes prefix. The most relevant part is

You should be aware that the default priors are provided for convenience and are not guaranteed
to be uninformative in all cases. They are designed to have little effect on model parameters, the
maximum likelihood estimates of which are of moderate size, say, less than 100 in absolute value.
For large-scale parameters, as in this example, the default priors can become informative

Try a flat prior

Code:

bayes , prior({price: } , flat) : regress price

to see that the default is highly informative in your example. Or, just rescale your predictors

Code:

generate price1000 = price/1000
bayes : regress price1000

By the way, I never really understood the idea of choosing a prior as least informative as possible; why not just stick with ML in the first place then?

Edit

The output

Code:

. bayes , prior({price: } , flat) : regress price
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  price ~ regress({price:_cons},{sigma2})

Priors: 
  {price:_cons} ~ 1 (flat)
       {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .4679
                                                 Efficiency:  min =       .177
                                                              avg =      .2065
Log marginal likelihood = -694.55318                          max =       .236
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
price        |
       _cons |  6158.603   349.1515   7.18754   6155.259   5473.588    6856.03
-------------+----------------------------------------------------------------
      sigma2 |   8926671    1503206   35727.2    8749958    6443051   1.23e+07
------------------------------------------------------------------------------
Note: Default priors are used for some model parameters.
Note: Adaptation tolerance is not met in at least one of the blocks.

. generate price1000 = price/1000

. bayes : regress price1000
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  price1000 ~ regress({price1000:_cons},{sigma2})

Priors: 
  {price1000:_cons} ~ normal(0,10000)
           {sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .4423
                                                 Efficiency:  min =      .2098
                                                              avg =      .2263
Log marginal likelihood = -195.72851                          max =      .2428
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
price1000    |
       _cons |  6.172187   .3422885   .006947   6.169311    5.49067   6.846441
-------------+----------------------------------------------------------------
      sigma2 |  8.860962   1.455718   .031782   8.716194   6.438895   12.14453
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.

Best
Daniel

Last edited by daniel klein; 23 May 2019, 06:03.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

23 May 2019, 11:05

I never really understood the idea of choosing a prior as least informative as possible; why not just stick with ML in the first place then?

TL; DR: That was likely a rhetorical question that called for no answer.

Back in the day, arguing the philosophy of inference was a popular pastime among theoretical statisticians. Debates between followers of the frequentist, subjective Bayesian, empirical Bayesian, and likelihood schools, and others, brought "Game of Thrones" levels of intensity to the conference halls, seminar rooms and candidate interviews. At that time arguments raised against maximum likelihood as a guiding principle presented unintuitive results under seemingly reasonable, if uncommon, circumstances. These unintuitive results were not found in Bayesian formulations of the same problems, and thus the Bayesian approach has been seen as a better principle for inference.

That's a very short explanation of my understanding refreshed by the two pages in Kadane's Principles of Uncertainty in which he dismisses maximum likelihood estimation as a competitor to his preferred subjective Bayesian approach. Berger and Wolpert's The Likelihood Principle discuss these issues at length and if I recall correctly attempts to ameliorate them with extensions to the likelihood principle.
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

24 May 2019, 04:30

Thank you daniel klein and William Lisowski for the helpful replies! I really appreciated both comments.

With regards to the question:

By the way, I never really understood the idea of choosing a prior as least informative as possible; why not just stick with ML in the first place then?

Well, specifically in my case, I just wished to teach students how to produce a Bayesian analogue to a frequentist approach.

I'm fully aware that Bayesian analysis is a world in itself. However, in order to create interest amongst a frequentist audience, I believe such comparisons with toy examples are a sound starting point.

Thank you both again.

Best regards,

Marcos
Comment

Announcement

Bayesian approach: what to do when it differs from the frequentist approach?

Comment

Comment

Comment