Multilevel negative binomial modelling via bayesmh

Daniel Weller

Join Date: Mar 2016

Posts: 7
#1

Multilevel negative binomial modelling via bayesmh

28 Mar 2016, 07:40

Dear Statalist members,

i am interested in fitting a three level negative binomial (random intercept) via bayesmh and made some progress with a working example using melanoma.dta as data source. Currently i am struggeling with the implementation of random intercepts - in short, a model with one random intercept for region takes very long to converge, but i'm not sure how to speed up the estimation process and how to implement a second random intercept as i am new bayesian analysis. Is there anything i can do to improve the syntax and implement a three-level approach

I've used the negative binomial likelihood evaluator provided by Houssein Assaad in this forum and prepared the data by creating a new variable for multivariate analysis:

Code:

use http://www.stata-press.com/data/r14/melanoma, replace set seed 12345 generate random = uv*uniform()

A non-multilevel model works fine, but can't figure out where to implement a second random for nation term in my multilevel model (regions nested in nations)

Code:

* Frequentistic model nbreg deaths uv random * Bayesian model bayesmh deaths uv random, lleval(negbin, param({lnalpha})) /// prior({deaths:uv}{deaths:random}{deaths:_cons}, flat) /// prior({lnalpha},flat) * Frequentistic model (First Random Intercept: Region): menbreg deaths uv random ||region:, nolog noheader * Bayesian model (First Random Intercept: Region): bayesmh deaths uv random i.region, lleval(negbin, param({lnalpha})) /// prior({deaths:uv}{deaths:random}{deaths:_cons}, flat) /// prior({deaths:i.region}, flat) /// prior({var_region}, igamma(0.001, 0.001)) /// prior({lnalpha}, flat) /// block({deaths:_cons}{deaths:uv}{deaths:random}{lnalpha}) /// block({deaths:i.region}) exclude({deaths:i.region}) /// burnin(50000) mcmcsize(50000) dots

Any comment would be highly appreciated.
Tags: None
Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015

Posts: 37
#2

29 Mar 2016, 10:07

Daniel,
in order to achieve a more efficient sampling of a multilevel negative binomial model you need to use some of the random effects facilities of bayesmh; see options reffects and redefine. Note however that these options are not supported with likelihood evaluators and I suggest you program the negative binomial likelihood directly using the generic likelihood specification llf. For an outcome variable y, linear predictor {xb} and model parameter {lnalpha}, the log-likelihood can be calculated using the expression

Code:

ln(nbinomialp(exp(-{lnalpha}), y, 1/(1+exp({lnalpha}+{xb}))))

For example, the Bayesian formulation of

Code:

nbreg deaths uv random

using flat priors is

Code:

bayesmh deaths, nocons likelihood(llf( /// ln(nbinomialp(exp(-{lnalpha}), deaths, /// 1/(1+exp({deaths:uv random}+{deaths:_cons})))))) /// prior({deaths:uv random _cons}{lnalpha}, flat)

Now, you can add random effects to the model. Say, you want a Bayesian formulation of the two-level model

Code:

menbreg deaths uv random ||region:, nolog noheader

You can use the option redefine(reg:i.region) to define the random intercepts and the short-cut {reg:} to refer to them in the likelihood expression. To have a proper two-level specification, I suggest that you suppress the base level of region, exclude the constant term from the liner predictor form and use the prior prior({reg:i.region}, normal({deaths:_cons}, {var})) for the random intercepts; the parameter {deaths:_cons} will be the mean random effect. The full specification can be something like this

Code:

fvset base none region bayesmh deaths, nocons likelihood(llf( /// ln(nbinomialp(exp(-{lnalpha}), deaths, /// 1/(1+exp({lnalpha}+{reg:}+{deaths:uv random})))))) /// redefine(reg:i.region) /// prior({deaths:uv random _cons}, normal(0, 100)) /// prior({reg:i.region}, normal({deaths:_cons}, {var})) /// prior({var}, igamma(0.001, 0.001)) /// prior({lnalpha}, flat) /// block({deaths:_cons uv random}{lnalpha}) /// exclude({reg:i.region})

Once you understand this specification you can extend it to three-level model specification.

Nikolay
Comment
Daniel Weller

Join Date: Mar 2016

Posts: 7
#3

30 Mar 2016, 01:52

Thank you, Nikolay, your example was very helpful - especially your suggestion regarding the generic likelihood specification! All my models run much faster now.
Comment
Sandra Kraemer

Join Date: Mar 2016

Posts: 4
#4

31 Mar 2016, 07:18

Hey guys,

i found this topic online and would like to pose an additional question to you - i hope you don't mind. Is it possible to extend Nikolay's example to more complex models, e.g. zero-inflated, zero-truncated or hurdle models? So far i've found several examples of multilevel zero-inflated poisson models, like this one here, but almost all of them focus on SAS. Do you think it is possible to build such a model in Stata using the bayesian framework?

Thank you in advance. Sandra
Comment
Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015

Posts: 37
#5

31 Mar 2016, 15:37

Hi Sandra,

Yes, the negative binomial example I've shown above can be extended to more complex models as long as the likelihood can be formulated as a single substitutable expression, which I believe is true for the zero-inflated and truncated negative binomial, and the hurdle models. Not all models however can be formulated using the llf() likelihood option; for example, it is not possible for likelihoods that involve iterations across observations. If you interested in a particular model, you may continue this discussion or contact Stata technical support.

Nikolay
Comment
Sandra Kraemer

Join Date: Mar 2016

Posts: 4
#6

01 Apr 2016, 02:42

Hey Nikolay,

i'm very interested in extending your example to account for zero-inflation as most of my data sets contain strong evidence of both overdispersion and excess zeros. Unfortunately, i haven't figured out how to implement both parts of a zero-inflation model, logit part and count part, in the llf() option - in the SAS example i've mentioned earlier, an if-else-combination is used to achieve this task.

Sandra
Comment
Daniel Weller

Join Date: Mar 2016

Posts: 7
#7

03 Apr 2016, 10:15

Including zero-inflation sounds tempting... Sadly, i can't provide any solution to your question. I've just started to read up on likelihood specifications for bayesmh. Having said that, is there some advanced literature on likelihood specifications in Stata?
Comment

Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015
Posts: 37

04 Apr 2016, 09:53

Sandra,

you can specify the zero-inflated Poisson regression model in bayesmh using the following likelihood option

Code:

llf(cond(y==0,                                                     ///
    ln(1/(1+exp(-{infl:})) + 1/(1+exp({infl:}))*exp(-exp({xb:}))), ///
    y*{xb:} - ln(1+exp({infl:}))                                   ///
    - exp({xb:}) - lngamma(y+1) ))

where y is the dependent variable, {xb:} is the linear predictor form in the Poisson model and {infl:} is the linear form that models the inflation at zero.

In the link you have provided we see examples using the following data

Code:

import delimited http://statistics.ats.ucla.edu/stat/data/hdp.csv, clear
encode smokinghx, gen(smoking)
encode cancerstage, gen(cancer)
drop smokinghx cancerstage

The considered zero-inflated Poisson model can be fitted in Stata using the zip command

Code:

zip nmorphine age lengthofstay bmi i.cancer, inflate(age i.smoking i.cancer)

A Bayesian formulation of this model would be

Code:

gen cons = 1
bayesmh nmorphine, noconstant likelihood(llf(cond(nmorphine==0,    ///
    ln(1/(1+exp(-{infl:})) + 1/(1+exp({infl:}))*exp(-exp({xb:}))), ///
    nmorphine*{xb:} - ln(1+exp({infl:}))                           ///
    - exp({xb:}) - lngamma(nmorphine+1) )))                        ///
    xbdefine(xb:   age lengthofstay bmi i.cancer cons)             ///
    xbdefine(infl: age i.smoking i.cancer cons)                    ///
    prior({xb:} {infl:}, normal(0, 10))                            ///
    block({xb:i.cancer}) block({xb:age lengthofstay bmi cons})     ///
    block({infl:age cons}) block({infl:i.cancer}) dots

You can amend the linear form {xb:} with random effects, as in the negative binomial example above, to implement a mixed effects zero-inflated Poisson model.

I hope you find this helpful,
Nikolay

Last edited by Nikolay Balov (StataCorp); 04 Apr 2016, 10:02.

Comment

Sandra Kraemer

Join Date: Mar 2016

Posts: 4
#9

06 Apr 2016, 07:23

Thank you so much, Nikolay, it works like a charm! As soon as possible I will include some random effects and post a feedback here. Again, thank you so much!
Comment
Daniel Weller

Join Date: Mar 2016

Posts: 7
#10

23 Apr 2016, 03:30

Hello again,

following Nikolays example, i managed to implement some random effects in zero-inflated poisson models and even a even a simple hurdle model. Unfortunately, i've struggled with zero-inflated negative binomial models for quite some time now.

Using Stata's default example for zinb models, i came up with this syntax following Joseph Hilbes log likelihood functions (Joseph Hilbe (2011): Negative Binomial Regression, p. 372)

Code:

webuse fish zinb count persons livebait, inflate(child camper) bayesmh count, noconstant likelihood(llf(cond(count==0, /// ln(1/(1+exp(−{xb1:})))+1/(1+exp({xb1:}))*(1/(1+ {alpha}*exp({xb:})))^(1/{alpha}), /// ln(1/(1+exp(−{xb1:}))) + ln((1/{alpha})+count) - ln(count+1) − ln(1/{alpha}) + /// (1/{alpha}) * ln(1/(1+{alpha}*exp({xb:}))) + count*ln(1−(1/(1+ {alpha}*exp({xb:})))))) /// xbdefine(xb: persons livebait cons) /// xbdefine(xb1: child camper cons) /// prior({xb:} {xb1:}, normal(0, 10))) /// prior({alpha}, normal(0, 10)) /// block({xb: persons livebait cons}) /// block({xb1: child camper cons}) dots

These lines result in the following error message:
option xbdefine(xb: persons livebait cons) xbdefine(xb1: child camper cons) prior({xb:} {xb1:}, normal(0, 10)) is not supported by likelihood model
logdensity

Has someone made a similar experience or can provide some insights into this error? Thank you!

Last edited by Daniel Weller; 23 Apr 2016, 03:33.
Comment
Justin Lawrence

Join Date: Mar 2016

Posts: 11
#11

27 Apr 2016, 07:46

Nikolay, could you possibly extend your first example which you posted on March 29th (negative binomial with random intercept) to also include a single random parameter for either "random" or "uv"? Any instruction would be immensely appreciated.
Comment
Sandra Kraemer

Join Date: Mar 2016

Posts: 4
#12

02 May 2016, 07:28

Hello again, Daniel, is there a bracket missing in your likelihood function? I've added an additional closing bracket, but it still results in an error regarding xbdefinebfir xb1. Is it at all possible to use xbdefine in that way?
Comment

Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015
Posts: 37

#13

03 May 2016, 15:15

Justin,

I will extend the negative binomial example posted on March 29th to include a random slope for the uv variable. The resulting model will be a Bayesian formulation of the following multilevel model fitted with the menbreg command

Code:

. menbreg deaths uv random ||region: uv

In addition to the random intercept parameters {reg:} I introduce random coefficients {reg_uv:}, one for each level of region, again using the redefine() option: redefine(reg_uv:i.region). In the log-likelihood expression for the model I use {reg_uv:} simply as a coefficient to the uv variable

Code:

ln(nbinomialp(exp(-{lnalpha}), deaths, 1/(1+exp({lnalpha}+{reg:}+{reg_uv:}*uv+{deaths:random}))))

The random coefficients {reg_uv:} I assign a normal prior with unknown mean {deaths:_cons} and variance {uv:var}.

Because of the increased complexity of the model, you may need to think more carefully about specifying priors, blocks of parameters, etc. Below I show a full specification of the model where I use a somewhat more informative igamma(2,1) prior for the variance parameters.

Code:

. fvset base none region
. set seed 12345
* Bayesian model (Random Intercept: Region, Random uv slope: Region):
. bayesmh deaths, nocons likelihood(llf(                            ///
    ln(nbinomialp(exp(-{lnalpha}), deaths,                          ///
    1/(1+exp({lnalpha}+{reg:}+{reg_uv:}*uv+{deaths:random}))))))    ///
    redefine(reg:i.region)                                          ///
    redefine(reg_uv:i.region)                                       ///
    prior({reg:i.region},    normal({deaths:_cons}, {deaths:var}))  ///
    prior({reg_uv:i.region}, normal({uv:_cons}, {uv:var}))          ///
    prior({deaths:var},      igamma(2, 1))                          ///
    prior({uv:var},          igamma(2, 1))                          ///
    prior({lnalpha},         flat)                                  ///
    prior({deaths:random _cons}{uv:_cons}, normal(0, 100))          ///
    block({deaths:_cons random}{lnalpha}) block({deaths:var})       ///
    block({uv:var}{uv:_cons})                                       ///
    exclude({reg:i.region}{reg_uv:i.region}) dots

Please, let me know if you have problems running this example.
Nikolay

Comment

Nikolay Balov (StataCorp)

StataCorp Employee

Join Date: Apr 2015
Posts: 37

#14

03 May 2016, 16:48

Hi Daniel,
I am sorry for the late response. You have made a really good progress in formulating the zero-inflated negative binomial model.

I see a syntactical error in your specification (a misplaced parenthesis) causing the error. Yes, it is not easy to keep track of so many closing parentheses. You should have

Code:

bayesmh count, noconstant likelihood(llf(cond(count==0, ///
    ln(1/(1+exp(−{xb1:}))) + 1/(1+exp({xb1:}))*(1/(1+ {alpha}*exp({xb:})))^(1/{alpha}), ///
    ln(1/(1+exp(−{xb1:}))) + ln((1/{alpha})+count) - ln(count+1) − ln(1/{alpha}) +            ///
    (1/{alpha}) * ln(1/(1+{alpha}*exp({xb:}))) + count*ln(1−(1/(1+ {alpha}*exp({xb:}))))))) ///
    xbdefine(xb: persons livebait cons)        ///
    xbdefine(xb1: child camper cons)          ///
    prior({xb:} {xb1:}, normal(0, 10))           ///
    prior({alpha}, normal(0, 10))                  ///
    block({xb: persons livebait cons})         ///
    block({xb1: child camper cons}) dots

However, there is another, algebraic problem in the actual likelihood function and you still would not be able to run the above command. Below, I rewrote the log-likelihood following the formulas from the documentation of the zinb command.

Note that I am using {xb:} for the main linear predictor and {xg:} for the linear predictor at zero (inflation). Because {alpha} is a positive parameter, I specify lognormal(0, 10) prior for it and initialize it with a positive value.

Code:

. bayesmh count, noconstant                                    ///
    likelihood(llf(cond(count<=0, ln(invlogit({xg:}) +         ///
    (1-invlogit({xg:}))/(1+{alpha}*exp({xb:}))^(1/{alpha})),   ///
    ((1-invlogit({xg:})) + lngamma(count+1/{alpha}) -          ///
    lngamma(count+1) - lngamma(1/{alpha}) -                    ///
    (count+1/{alpha})*ln(1+{alpha}*exp({xb:})) +               ///
    count*(ln({alpha})+{xb:})) )))                             ///
    xbdefine(xb: persons livebait cons)                        ///
    xbdefine(xg: child camper cons)                            ///
    prior({xb:} {xg:}, normal(0, 10))                          ///
    prior({alpha},     lognormal(0, 10))                       ///
    block({xb:}) block({xg:}) init({alpha} 1) dots

Note that the likelihood can also be re-parametrized with respect to {lnalpha}=log({alpha}) somewhat more efficiently.

Nikolay

Comment

Daniel Weller

Join Date: Mar 2016

Posts: 7
#15

04 May 2016, 02:46

Hello everybody! Thank you so much for your comments, Sandra and Nikolay! I'm pretty sure Nikolay's approach provides a solution to many problems a few of my colleagues encountered over the last couple of weeks.
Comment

Announcement