Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bayesian approach: what to do when it differs from the frequentist approach?

    Dear Forum Members,

    I'm somewhat puzzled by the results given from 2 down-to-earth models, one using - bayes - prefix - and the other under - regress - command.

    Below, a toy example:

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . su price mpg
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         74    6165.257    2949.496       3291      15906
             mpg |         74     21.2973    5.785503         12         41
    
    . regress price
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(0, 73)        =      0.00
           Model |           0         0           .   Prob > F        =         .
        Residual |   635065396        73  8699525.97   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =    0.0000
           Total |   635065396        73  8699525.97   Root MSE        =    2949.5
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
    ------------------------------------------------------------------------------
    
    . regress mpg
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(0, 73)        =      0.00
           Model |           0         0           .   Prob > F        =         .
        Residual |  2443.45946        73  33.4720474   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =    0.0000
           Total |  2443.45946        73  33.4720474   Root MSE        =    5.7855
    
    ------------------------------------------------------------------------------
             mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |    21.2973   .6725511    31.67   0.000      19.9569    22.63769
    ------------------------------------------------------------------------------
    
    . */ Let's check a Bayesian approach
    
    . bayes: regress mpg
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood:
      mpg ~ regress({mpg:_cons},{sigma2})
    
    Priors:
      {mpg:_cons} ~ normal(0,10000)
         {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =     12,500
    Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =       .421
                                                     Efficiency:  min =      .2193
                                                                  avg =      .2246
    Log marginal likelihood = -244.90098                          max =      .2299
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    mpg          |
           _cons |   21.3047   .6779059   .014139   21.30774   19.93476   22.61922
    -------------+----------------------------------------------------------------
          sigma2 |  34.26699   5.788511   .123607   33.69243   24.66202   47.59297
    ------------------------------------------------------------------------------
    Note: Default priors are used for model parameters.
    
    . */ so far, so good
    
    . */ let's check with another variable
    
    . bayes: regress price
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood:
      price ~ regress({price:_cons},{sigma2})
    
    Priors:
      {price:_cons} ~ normal(0,10000)
           {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =     12,500
    Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =      .6707
                                                     Efficiency:  min =     .01234
                                                                  avg =     .09647
    Log marginal likelihood = -763.50183                          max =      .1806
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |  99.62768   100.5112   2.36511   102.1462  -98.38006   295.3294
    -------------+----------------------------------------------------------------
          sigma2 |  4.66e+07    7703121    693447   4.60e+07   3.36e+07   6.30e+07
    ------------------------------------------------------------------------------
    Note: Default priors are used for model parameters.
    Note: Adaptation tolerance is not met in at least one of the blocks.
    
    . */ ops, that's quite a difference! We see there is a problem with tolerance...
    
    . */ let's try to "fix" this with a block for the variance plus extra burn-in, etc.
    
    . bayes, block({sigma2}) burnin(10000) mcmcsize(20000) adaptation(tolerance(0.8)) gibbs: regress price
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood:
      price ~ normal({price:_cons},{sigma2})
    
    Priors:
      {price:_cons} ~ normal(0,10000)
           {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =     30,000
    Metropolis-Hastings and Gibbs sampling           Burn-in          =     10,000
                                                     MCMC sample size =     20,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =          1
                                                     Efficiency:  min =    .001135
                                                                  avg =      .5006
    Log marginal likelihood =  -860.1099                          max =          1
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |   483.244   95.47607   .675118   482.3642   295.6941    670.667
    -------------+----------------------------------------------------------------
          sigma2 |   8699171   90.86103    19.072    8699157    8699031    8699347
    ------------------------------------------------------------------------------
    Note: Default priors are used for model parameters.
    Note: There is a high autocorrelation after 500 lags.
    
    */ we see there is high autocorrelation. Let's try to tackle this issue
    
    . bayes, block({sigma2}) burnin(20000) mcmcsize(30000) adaptation(tolerance(0.8)) gibbs: regress price
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood:
      price ~ normal({price:_cons},{sigma2})
    
    Priors:
      {price:_cons} ~ normal(0,10000)
           {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =     50,000
    Metropolis-Hastings and Gibbs sampling           Burn-in          =     20,000
                                                     MCMC sample size =     30,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =          1
                                                     Efficiency:  min =    .001019
                                                                  avg =      .5005
    Log marginal likelihood = -859.21765                          max =          1
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |  483.0878    96.4316   .556748   483.0736   291.8467   671.6999
    -------------+----------------------------------------------------------------
          sigma2 |   8699465    218.056   39.4317    8699483    8699049    8699774
    ------------------------------------------------------------------------------
    Note: Default priors are used for model parameters.
    Note: There is a high autocorrelation after 500 lags.
    
    */ I know I can enlarge the thinning. In order to decrease the time of analysis, I selected a short MCMC size and burn-period, but I chose Gibbs samples and a large thinning
    
    . bayes, block({sigma2}) burnin(250) mcmcsize(1000) adaptation(tolerance(0.8)) thinning(600) gibbs: regress price
    note: discarding every 599 sample observations; using observations 1 and 601
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood:
      price ~ normal({price:_cons},{sigma2})
    
    Priors:
      {price:_cons} ~ normal(0,10000)
           {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =    599,651
    Metropolis-Hastings and Gibbs sampling           Burn-in          =        250
                                                     MCMC sample size =      1,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =          1
                                                     Efficiency:  min =    .005903
                                                                  avg =       .503
    Log marginal likelihood = -858.13439                          max =          1
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |   481.385   98.38691   3.11127   481.6254   294.3904   686.2036
    -------------+----------------------------------------------------------------
          sigma2 |   8699642   620.4574   255.378    8699604    8698622    8700712
    ------------------------------------------------------------------------------
    Note: Default priors are used for model parameters.
    Note: Adaptation continues during simulation.
    In short, with the variable mpg, I could get quite similar results. However, with the variable price, many attempts were failed. I gather I could reach a better scenario by selecting informative priors.

    That being said, with these commands, by selecting the variable price, I found a difference between - regress - and - bayes: regress - above 1000%.

    To some extent, high autocorrelation is one of the culprits.

    But, according to the Stata Manual:


    Once convergence is established, the presence of high autocorrelation will
    typically mean low precision for some parameter estimates in the model.
    Depending on the magnitudes of the parameters and your research objective,
    you may be satisfied with the obtained precision, in which case you can
    ignore the reported note.
    If the level of precision is unacceptable, you
    may try to reduce autocorrelation in your model. We recommend you try to
    do it even if the level of precision is acceptable to you.

    Well, it's crystal clear that the model converged .

    This notwithstanding, the disparity is humongous, to say the least.

    I wonder why such a dismal disparity happened with price, but not mpg, in the very same data set.

    Also, I'd like to know what to do in such cases, for example, when we wish to compare several variables under an uniformative "default" prior.
    Last edited by Marcos Almeida; 23 May 2019, 05:21.
    Best regards,

    Marcos

  • #2
    This is discussed in Example 4 in the pdf documentation of the bayes prefix. The most relevant part is

    You should be aware that the default priors are provided for convenience and are not guaranteed
    to be uninformative in all cases. They are designed to have little effect on model parameters, the
    maximum likelihood estimates of which are of moderate size, say, less than 100 in absolute value.
    For large-scale parameters, as in this example, the default priors can become informative
    Try a flat prior

    Code:
    bayes , prior({price: } , flat) : regress price
    to see that the default is highly informative in your example. Or, just rescale your predictors

    Code:
    generate price1000 = price/1000
    bayes : regress price1000
    By the way, I never really understood the idea of choosing a prior as least informative as possible; why not just stick with ML in the first place then?

    Edit

    The output

    Code:
    . bayes , prior({price: } , flat) : regress price
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood: 
      price ~ regress({price:_cons},{sigma2})
    
    Priors: 
      {price:_cons} ~ 1 (flat)
           {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =     12,500
    Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =      .4679
                                                     Efficiency:  min =       .177
                                                                  avg =      .2065
    Log marginal likelihood = -694.55318                          max =       .236
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    price        |
           _cons |  6158.603   349.1515   7.18754   6155.259   5473.588    6856.03
    -------------+----------------------------------------------------------------
          sigma2 |   8926671    1503206   35727.2    8749958    6443051   1.23e+07
    ------------------------------------------------------------------------------
    Note: Default priors are used for some model parameters.
    Note: Adaptation tolerance is not met in at least one of the blocks.
    
    . generate price1000 = price/1000
    
    . bayes : regress price1000
      
    Burn-in ...
    Simulation ...
    
    Model summary
    ------------------------------------------------------------------------------
    Likelihood: 
      price1000 ~ regress({price1000:_cons},{sigma2})
    
    Priors: 
      {price1000:_cons} ~ normal(0,10000)
               {sigma2} ~ igamma(.01,.01)
    ------------------------------------------------------------------------------
    
    Bayesian linear regression                       MCMC iterations  =     12,500
    Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                     MCMC sample size =     10,000
                                                     Number of obs    =         74
                                                     Acceptance rate  =      .4423
                                                     Efficiency:  min =      .2098
                                                                  avg =      .2263
    Log marginal likelihood = -195.72851                          max =      .2428
     
    ------------------------------------------------------------------------------
                 |                                                Equal-tailed
                 |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
    -------------+----------------------------------------------------------------
    price1000    |
           _cons |  6.172187   .3422885   .006947   6.169311    5.49067   6.846441
    -------------+----------------------------------------------------------------
          sigma2 |  8.860962   1.455718   .031782   8.716194   6.438895   12.14453
    ------------------------------------------------------------------------------
    Note: Default priors are used for model parameters.
    Best
    Daniel
    Last edited by daniel klein; 23 May 2019, 06:03.

    Comment


    • #3
      I never really understood the idea of choosing a prior as least informative as possible; why not just stick with ML in the first place then?
      TL; DR: That was likely a rhetorical question that called for no answer.

      Back in the day, arguing the philosophy of inference was a popular pastime among theoretical statisticians. Debates between followers of the frequentist, subjective Bayesian, empirical Bayesian, and likelihood schools, and others, brought "Game of Thrones" levels of intensity to the conference halls, seminar rooms and candidate interviews. At that time arguments raised against maximum likelihood as a guiding principle presented unintuitive results under seemingly reasonable, if uncommon, circumstances. These unintuitive results were not found in Bayesian formulations of the same problems, and thus the Bayesian approach has been seen as a better principle for inference.

      That's a very short explanation of my understanding refreshed by the two pages in Kadane's Principles of Uncertainty in which he dismisses maximum likelihood estimation as a competitor to his preferred subjective Bayesian approach. Berger and Wolpert's The Likelihood Principle discuss these issues at length and if I recall correctly attempts to ameliorate them with extensions to the likelihood principle.

      Comment


      • #4
        Thank you daniel klein and William Lisowski for the helpful replies! I really appreciated both comments.

        With regards to the question:

        By the way, I never really understood the idea of choosing a prior as least informative as possible; why not just stick with ML in the first place then?

        Well, specifically in my case, I just wished to teach students how to produce a Bayesian analogue to a frequentist approach.

        I'm fully aware that Bayesian analysis is a world in itself. However, in order to create interest amongst a frequentist audience, I believe such comparisons with toy examples are a sound starting point.

        Thank you both again.
        Best regards,

        Marcos

        Comment

        Working...
        X