Predictions with and without BLUP after melogit

Steven Spivack

Join Date: Nov 2016

Posts: 27
#1

Predictions with and without BLUP after melogit

31 Mar 2020, 06:13

I am attempting to calculate predictions after a melogit model with and without best linear unbiased predictors. In SAS I would do this as follows:

Code:

PROC GLIMMIX DATA=&ANALYSIS NOCLPRINT MAXLMMUPDATE=100; CLASS PROVID; ODS OUTPUT PARAMETERESTIMATES=&EST (KEEP=EFFECT ESTIMATE STDERR); MODEL OUTCOME=&MODEL_VAR /D=B LINK=LOGIT SOLUTION; XBETA=_XBETA_; LINP=_LINP_; RANDOM INTERCEPT/SUBJECT=PROVID SOLUTION; OUTPUT OUT=OUTCOME PRED(BLUP ILINK)=PREDPROB PRED(NOBLUP ILINK)=EXPPROB;

For SAS the options above are:
BLUP=linear predictor
NOBLUP = marginal linear predictor
ILINK = predicted mean

However, in Stata I am not totally sure how to do this. Here is the code I am using. My understanding is that the predict command after running an melogit predicts the BLUP at the means. But I am not sure if the xb option would be calculating predictions at the means without the BLUP.

Code:

melogit outcome modelvars || secondlevel: predict predicted predict expected, xb
Tags: None

Scott Merryman

Join Date: Mar 2014
Posts: 895

31 Mar 2020, 07:10

From the SAS documentation:

Click image for larger version

Name: Capture.PNG
Views: 1
Size: 10.7 KB
ID: 1543871

So in Stata:

Code:

predict eta, eta  = default
predict xb, xb    = noblup
predict mu, mu  = ilink
predict mu_fix, mu condtional(fixedonly)  = noblup ilink

Note you can also estimate the empirical Bayes modes of random effects with the ebmodes option.
The various outputs:

Code:

. webuse bangladesh,clear
(Bangladesh Fertility Survey, 1989)

. qui melogit c_use i.urban age child* || district:

. predict mu, mu 
(predictions based on fixed effects and posterior means of random effects)
(using 7 quadrature points)

. predict eta, eta
(predictions based on fixed effects and posterior means of random effects)
(using 7 quadrature points)

. predict xb, xb

. predict re, reffect
(calculating posterior means of random effects)
(using 7 quadrature points)

. gen xb_re= xb+re

. gen inverse_xb_re = invlogit(xb_re)

. predict mu_fix, mu cond(fixedonly)

. gen inverse_xb = invlogit(xb)

. list mu-inverse_xb in 1, ab(14) noobs

  +--------------------------------------------------------------------------------------------------+
  |       mu         eta          xb          re       xb_re   inverse_xb_re     mu_fix   inverse_xb |
  |--------------------------------------------------------------------------------------------------|
  | .3037056   -.8297138   -.1016079   -.7281059   -.8297138        .3037056   .4746199     .4746199 |
  +--------------------------------------------------------------------------------------------------+

[INDENT=2] [/INDENT]

Comment

Steven Spivack

Join Date: Nov 2016

Posts: 27
#3

31 Mar 2020, 07:15

Wow thank you so much this is exactly what I was looking for.
Comment
Steven Spivack

Join Date: Nov 2016

Posts: 27
#4

23 Apr 2020, 13:10

I actually have a follow up to this. Doesn't the command predict mu_fix, mu condtional(fixedonly) calculate the predicted means for only the fixed portion of the model? What I really want are two different predictions.
Predictions accounting for fixed effects and an average of all the random intercepts

Predictions accounting for fixed effects and each observation's random intercept.

Correct me if I am wrong but I do not believe the code below is doing this.

Code:

melogit death c.age c.hccscore || providerID: predict predicted, mu predict expected, mu conditional(fixedonly)
Comment
Scott Merryman

Join Date: Mar 2014

Posts: 895
#5

23 Apr 2020, 14:43

Yes, it is the predicted means with only the fixed portion or since this is a logit model it is the predicted probability.

Given that the random effects are distributed ~ N(0, sigma2) , they will average to 0 over the providerIDs.

Code:

predict mu, mu

is the predicted probability accounting for fixed portion and each group's random intercept.

Code:

predict expected, mu conditional(fixedonly)

is the predicted probability accounting for the fixed portion (or with the average of the random effects == 0).
Comment
Steven Spivack

Join Date: Nov 2016

Posts: 27
#6

23 Apr 2020, 15:01

Thanks again Scott. I guess the reason I am confused is because I am recreating a model in which the authors state that "The expected number of deaths for each hospital is estimated using its patient mix and the average hospital-specific effect (that is, the average effect among all hospitals in the sample) ." If the average effect is 0, then why not explicitly say this? Is it due to the multilevel nature of the data? In other words, while the random effects will average to 0 at the provider level, this is not the case at the patient level. And since I am calculating patient-level predictions could this explain the disparity?
Comment

Scott Merryman

Join Date: Mar 2014
Posts: 895

23 Apr 2020, 17:43

I find "average hospital-specific effect (that is, the average effect among all hospitals in the sample) ." to be unclear.

Yes, the average random effect at the provider level will be 0, but if the sample size o the providers are different than the average of the random effect (over all individuals) may be zero.
In the example below, there are 60 districts but the number of individuals within districts varies:

Code:

. webuse bangladesh,clear
(Bangladesh Fertility Survey, 1989)

. qui melogit c_use  c.age##c.age   || district:

. predict mu, mu 
(predictions based on fixed effects and posterior means of random effects)
(using 7 quadrature points)

. predict re, reffect
(calculating posterior means of random effects)
(using 7 quadrature points)

. sort district age

. bys distr: gen tag = 1 if _n ==_N
(1,874 missing values generated)

. sum re

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          re |      1,934    .0738812    .4611981  -.9851072   .9998125

. sum re if tag ==1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          re |         60    3.09e-07    .4072956  -.9851072   .9998125

In this example below, the probability of contraceptive use is a function of individual (centered) age and district specific random effects. The probability of contraceptive use in district 4 is about 18 percentage points higher than district 1 (assuming 0 centered age).

Code:

line mu age if  dist < 10, c(L)  /// 
    || scatter mu age  if dist < 10 & tag == 1, mlabel(district)  /// 
     mlabcolor(black) ms(i) /// 
    || , legend(off) ytitle("Probability of Contraception Use" "By Age & District")

Click image for larger version

Name: Graph.png
Views: 1
Size: 55.3 KB
ID: 1548874

Comment

Steven Spivack

Join Date: Nov 2016

Posts: 27
#8

23 Apr 2020, 18:01

So I think the confusion may be in their terminology. Looking at the methods below it appears the hospital-effect is derived after bootstrapping.

Last edited by Steven Spivack; 23 Apr 2020, 18:57.
Comment

Announcement