residual diagnostic for multilevel model

Lulu Dong

Join Date: Mar 2016

Posts: 12
#1

residual diagnostic for multilevel model

20 Apr 2016, 13:09

Dear all,

I'm doing residual diagnostic for multilevel models with continuous outcome in Stata. I'm using the following commands. The level 2 residuals look sort of flat, and I'm wondering what the next steps may be if I conclude that level 2 residuals are not quite normally distributed. Any suggestion would be much appreciated! Thanks!

mixed y c.x1##i.time || id: time, cov(un) nolog reml

predict lev1, residuals
predict lev2a lev2b, reffects

histogram lev1, xtitle(Level 1 residuals) normal
histogram lev2a, xtitle(Level 2 residuals) normal
histogram lev2b, xtitle(Level 2 residuals) normal
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#2

20 Apr 2016, 14:21

Well, before we get into analysis of residuals, let's troubleshoot the model. You have time entered as a discrete variable in the bottom level of the model, but then you have a random slope for continuous time at the id: level. That makes no sense. If you want a random slope on time, then time as a continuous variable needs to be included among the fixed effects at the bottom level. You can also have i.time in the model, but you can't omit c.time. (Well, you can, but then you are constraining the average slope to be 0.)

On the other hand, maybe what you are expecting is not a random slope for continuous time, but rather random effects defined at the levels of time. In that case, the effect at the id: level should be R.time, not time.
Comment
Lulu Dong

Join Date: Mar 2016

Posts: 12
#3

20 Apr 2016, 16:14

Thanks for the reply. The random effect for time is in fact continuous (I should have specified this) as it is the days in the study. The fixed effect of time is discrete.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#4

20 Apr 2016, 16:25

You can't have the random slope for continuous time without also including continuous time in the fixed effects, unless you want to constrain your model to have the mean slope for time be zero. So your model should be:

Code:

mixed y c.x1##i.time c.time || id: time, cov(un) nolog reml

Then you can calculate your residuals and explore them as you like.
Comment
Lulu Dong

Join Date: Mar 2016

Posts: 12
#5

20 Apr 2016, 16:53

Thanks - just to clarify: the time indicator is 0, 1, 2, indicating baseline, post, and follow-up. Continuous time is number of days from baseline to follow-up. Does it make sense to include both time variables in the fixed effect model? They are essentially the same variables. Running the model both ways did not appear to change much the residual check and fixed effect estimates.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#6

20 Apr 2016, 17:16

Now you've got me completely confused. The only way I can make sense of this is if post occurred 1 day after baseline and follow-up occurred 2 days after baseline. Otherwise, the single variable time can't represent both of the things you described.

If you believe that the relationship between y and continuous time is linear, then there is nothing gained by including i.time in the model. But if you think that linearity is not a reasonable assumption here, then using both i.time and c.time allows you to fit any arbitrary shape of the relationship. Then again, if there are actually only 3 time points, this is really splitting hairs and probably you will overfit noise in the model.

I think it's probably more reasonable to choose only i.time or c.time. But then you have to specify the corresponding random effects accordingly. For c.time, id: time is appropriate. For i.time, id: R.time is appropriate.
Comment
Lulu Dong

Join Date: Mar 2016

Posts: 12
#7

21 Apr 2016, 14:53

The indicator variable of time is: 0 indicates measured at baseline, 1 indicates measured at post, and 2 indicates measured at follow-up. This way, the x by time interaction would indicate changes in y between time 0 and 1 (pre-post change; as well as 0 and 2, pre-to-FU change), when x changes. The continuous variable of time is days from baseline to follow-up, included for random effect of time. Does this clarify?

What is "R.time"?

Thanks for your input!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#8

21 Apr 2016, 16:31

I don't understand. You are describing two different variables: one is a trichotomy 0=baseline, 1 = post, 2 = followup, and the other is a continuous measure of the number of days from baseline to follow-up. They can't have the same name. Or, if you have somehow tricked Stata into creating a data set with two variables that have a common name, I cannot even begin to predict what Stata will do when you then refer to that variable name in any commands. So what are the actual names of these two variables?

To understand the R.time notation, see example 10 in the -mixed- chapter of the [ME] manual where they use it to create crossed random effects. More generally, it creates random effects among the values of a categorical variable (p. 358 of same chapter).
Comment
Lulu Dong

Join Date: Mar 2016

Posts: 12
#9

22 Apr 2016, 15:41

These are the actual names of the time variables.

mixed y c.x##i.time || id: dayspretofu, cov(un) nolog
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#10

22 Apr 2016, 16:15

OK, then you just need to add dayspretofu to the fixed effects as well:

Code:

mixed y c.x##i.time dayspretofu || id: dayspretofu, cov(un) nolog

Without it you are imposing an implicit constraint that the mean slope on dayspretofu is 0. Unless there is a scientific basis for imposing that constraint, I wouldn't.
Comment
Carla Pezzulo

Join Date: Sep 2019

Posts: 13
#11

18 Oct 2019, 03:44

Hello, I was actually interested in the question in post #1 about how to check for diagnostics after xtmelogit (residual plot, constant variance). Any suggestion on codes please?! Or, any other similar post on this topic? Can't seem to find it.... thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#12

18 Oct 2019, 13:17

Residual plots and homoscdedasticity are issues for linear regression, but they are not directly applicable to logistic models, and even less so to multi-level logistic models. There aren't a lot of pre-packaged diagnostics for these models. Usually people are most concerned about the calibration of the model's predictions. You can evaluate that using something like the Hosmer-Lemeshow statistics from one-level logistic regression. It works like this. You run your model (by the way, -xtmelogit- is now called -meqrlogit-) and then do this:

Code:

predict phat, mu // GET PREDICTED PROBABILITIES xtile quantile = phat, nq(10) // DECILES OF PREDICTED PROBABILITY collapse (sum) phat outcome_variable, by(quantile) list, noobs clean graph twoway scatter outcome_variable phat || line phat phat, sort

To the extent that the scatter points are near the diagonal line in the graph you have good fit. Note that you don't have to do deciles. You can do as many groups as is reasonable given your sample size. Generally you would like the groups to each have a size of no less than 30 and probably not more than 100 observations in it, so that gives you some guidance how many groups to make with the -xtile- command.
1 like
Comment
Carla Pezzulo

Join Date: Sep 2019

Posts: 13
#13

20 Oct 2019, 12:53

Fantastic, thank you so much! Is this how the red line should look like!? Other than that, the dots seem to be pretty much on the diagonal.
Thanks for pointing out that the new command is now meqrlogit.
As far as i remember from some articles, there may be ways to look into assumptions holding at level 2 too (residuals and constant variance). I will look more into that.

Last edited by Carla Pezzulo; 20 Oct 2019, 12:55. Reason: trying to make the image smaller...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#14

23 Oct 2019, 08:17

No, there is something seriously wrong. The red line should be a diagonal in the plot. For some reason your values of the summed outcome_variable are 100 times greater (y-axis) than the sums of the predicted outcome variable (x-axis) so the red line is coming out flat. Since both phat and the outcome variable are bounded within 0 and 1, this suggests to me that there are observations in your data set that have values of the outcome variable but are not getting predicted values calculated. My first thought is that these would be observations that are not in the estimation sample. But then when the quantiles of phat are calculated, they would all be in a group with the quantile missing. Perhaps your outcome variable is scaled from 0 to 100 (percent) as opposed to 0 to 1 (probability). Since Stata's logistic regression routines treat any non-zero outcome value as true, and 0 as false, this would succeed in your regression but produce results like this. I suppose it's legal but it's an odd thing to do (in part because it can produce strange results like this). So I'm really not sure what's going on here, but something is amiss.
Comment
Carla Pezzulo

Join Date: Sep 2019

Posts: 13
#15

23 Oct 2019, 14:19

Ah! Spot on! I had the outcome variable coded as 0 / 100. My bad. Thanks for the hint! Looks good now.
Comment

Announcement

residual diagnostic for multilevel model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment