Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predictions with and without BLUP after melogit

    I am attempting to calculate predictions after a melogit model with and without best linear unbiased predictors. In SAS I would do this as follows:

    Code:
    PROC GLIMMIX DATA=&ANALYSIS NOCLPRINT MAXLMMUPDATE=100;
    CLASS PROVID;
    ODS OUTPUT PARAMETERESTIMATES=&EST (KEEP=EFFECT ESTIMATE STDERR);
    MODEL OUTCOME=&MODEL_VAR
            /D=B LINK=LOGIT SOLUTION;
    XBETA=_XBETA_;
    LINP=_LINP_;
    RANDOM INTERCEPT/SUBJECT=PROVID SOLUTION;
    OUTPUT OUT=OUTCOME
            PRED(BLUP ILINK)=PREDPROB PRED(NOBLUP ILINK)=EXPPROB;
    For SAS the options above are:
    BLUP=linear predictor
    NOBLUP = marginal linear predictor
    ILINK = predicted mean

    However, in Stata I am not totally sure how to do this. Here is the code I am using. My understanding is that the predict command after running an melogit predicts the BLUP at the means. But I am not sure if the xb option would be calculating predictions at the means without the BLUP.

    Code:
    melogit outcome modelvars  || secondlevel:
    predict predicted
    predict expected, xb

  • #2
    From the SAS documentation:
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	10.7 KB
ID:	1543871


    So in Stata:
    Code:
    predict eta, eta  = default
    predict xb, xb    = noblup
    predict mu, mu  = ilink
    predict mu_fix, mu condtional(fixedonly)  = noblup ilink
    Note you can also estimate the empirical Bayes modes of random effects with the ebmodes option.
    The various outputs:
    Code:
    . webuse bangladesh,clear
    (Bangladesh Fertility Survey, 1989)
    
    . qui melogit c_use i.urban age child* || district:
    
    . predict mu, mu 
    (predictions based on fixed effects and posterior means of random effects)
    (using 7 quadrature points)
    
    . predict eta, eta
    (predictions based on fixed effects and posterior means of random effects)
    (using 7 quadrature points)
    
    . predict xb, xb
    
    . predict re, reffect
    (calculating posterior means of random effects)
    (using 7 quadrature points)
    
    . gen xb_re= xb+re
    
    . gen inverse_xb_re = invlogit(xb_re)
    
    . predict mu_fix, mu cond(fixedonly)
    
    . gen inverse_xb = invlogit(xb)
    
    . list mu-inverse_xb in 1, ab(14) noobs
    
      +--------------------------------------------------------------------------------------------------+
      |       mu         eta          xb          re       xb_re   inverse_xb_re     mu_fix   inverse_xb |
      |--------------------------------------------------------------------------------------------------|
      | .3037056   -.8297138   -.1016079   -.7281059   -.8297138        .3037056   .4746199     .4746199 |
      +--------------------------------------------------------------------------------------------------+
    [INDENT=2] [/INDENT]

    Comment


    • #3
      Wow thank you so much this is exactly what I was looking for.

      Comment


      • #4
        I actually have a follow up to this. Doesn't the command predict mu_fix, mu condtional(fixedonly) calculate the predicted means for only the fixed portion of the model? What I really want are two different predictions.
        1. Predictions accounting for fixed effects and an average of all the random intercepts
        2. Predictions accounting for fixed effects and each observation's random intercept.
        Correct me if I am wrong but I do not believe the code below is doing this.

        Code:
        melogit death c.age c.hccscore || providerID:
        predict predicted, mu 
        predict expected, mu conditional(fixedonly)

        Comment


        • #5
          Yes, it is the predicted means with only the fixed portion or since this is a logit model it is the predicted probability.

          Given that the random effects are distributed ~ N(0, sigma2) , they will average to 0 over the providerIDs.
          Code:
          predict mu, mu
          is the predicted probability accounting for fixed portion and each group's random intercept.

          Code:
          predict expected, mu conditional(fixedonly)
          is the predicted probability accounting for the fixed portion (or with the average of the random effects == 0).

          Comment


          • #6
            Thanks again Scott. I guess the reason I am confused is because I am recreating a model in which the authors state that "The expected number of deaths for each hospital is estimated using its patient mix and the average hospital-specific effect (that is, the average effect among all hospitals in the sample) ." If the average effect is 0, then why not explicitly say this? Is it due to the multilevel nature of the data? In other words, while the random effects will average to 0 at the provider level, this is not the case at the patient level. And since I am calculating patient-level predictions could this explain the disparity?

            Comment


            • #7
              I find "average hospital-specific effect (that is, the average effect among all hospitals in the sample) ." to be unclear.

              Yes, the average random effect at the provider level will be 0, but if the sample size o the providers are different than the average of the random effect (over all individuals) may be zero.
              In the example below, there are 60 districts but the number of individuals within districts varies:
              Code:
              . webuse bangladesh,clear
              (Bangladesh Fertility Survey, 1989)
              
              . qui melogit c_use  c.age##c.age   || district:
              
              . predict mu, mu 
              (predictions based on fixed effects and posterior means of random effects)
              (using 7 quadrature points)
              
              . predict re, reffect
              (calculating posterior means of random effects)
              (using 7 quadrature points)
              
              . sort district age
              
              . bys distr: gen tag = 1 if _n ==_N
              (1,874 missing values generated)
              
              . sum re
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                        re |      1,934    .0738812    .4611981  -.9851072   .9998125
              
              . sum re if tag ==1
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                        re |         60    3.09e-07    .4072956  -.9851072   .9998125
              In this example below, the probability of contraceptive use is a function of individual (centered) age and district specific random effects. The probability of contraceptive use in district 4 is about 18 percentage points higher than district 1 (assuming 0 centered age).

              Code:
              line mu age if  dist < 10, c(L)  /// 
                  || scatter mu age  if dist < 10 & tag == 1, mlabel(district)  /// 
                   mlabcolor(black) ms(i) /// 
                  || , legend(off) ytitle("Probability of Contraception Use" "By Age & District")
              Click image for larger version

Name:	Graph.png
Views:	1
Size:	55.3 KB
ID:	1548874

              Comment


              • #8
                So I think the confusion may be in their terminology. Looking at the methods below it appears the hospital-effect is derived after bootstrapping.

                Click image for larger version

Name:	Screen Shot 2020-04-23 at 8.51.53 PM.png
Views:	1
Size:	164.0 KB
ID:	1548882
                Last edited by Steven Spivack; 23 Apr 2020, 18:57.

                Comment

                Working...
                X