Mixed Linear Regression Postestimation

Hamed Akbarpoor

Join Date: Dec 2023

Posts: 2
#1

Mixed Linear Regression Postestimation

27 Dec 2023, 13:20

Hi to all,
I am using Stata to find the relationship between a continuous dependent variable and some explanatory variables through mixed linear regression. To check the model's generalization, I have split my data into two parts, train and test.
How can I apply the model to my test data set considering the random effect parts not just the Xb portion?
P.S: The model is a random intercept model.
Thank you'all.
Tags: None
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#2

27 Dec 2023, 15:46

I think what you want is to get the "fitted" prediction. Look in help mixed postestimation##predict, and you will find that the fitted prediction gives "fitted values, fixed-portion linear prediction plus contributions based on predicted random effects".

Just an observation about predictions from mixed effects models - they are actually quite good at making predictions at the cluster level. What makes them good is they do something called partial pooling, which is the idea that for clusters with few observations or cases, their prediction is pulled toward the weighted sample mean given the covariates. It seems that you are making predictions for cases within clusters, however, so this may not be particularly useful for you.
1 like
Comment
Hamed Akbarpoor

Join Date: Dec 2023

Posts: 2
#3

27 Dec 2023, 23:53

Thank you for your response.
Yes, I want to get the "fitted" prediction. I have got the fitted values on my training data set. However, I don't know how to get the "fitted" values on my cross-sectional set to explore the generalization of the model.
Comment

Erik Ruzek

Join Date: Oct 2017
Posts: 398

02 Jan 2024, 14:46

Sorry for not responding sooner. This situation depends on whether your hold out/testing sample is a) of clusters that were not included in the original mixed model or b) is of units within clusters in which some other units in that cluster are observed. If it is the former, then no. You have no information whatsoever about those clusters, so the most appropriate prediction for them is the population (fixed effect - xb in Stata) prediction. But if it is instead the latter, then you will get cluster level predictions (random effects - reffects in Stata) for those clusters.

Code:

use http://www.stata-press.com/data/r16/pig.dta, clear
*Hold out sample of clusters (ids)
splitsample, cluster(id) split(.85 .15) generate(hold_out_cl) rseed(834098)

gen weight2 = weight
replace weight2 = . if hold_out_cl==2

qui mixed weight2 week || id: week, cov(un) reml

predict weight2_fix, xb 
predict weight2_mix, fitted
predict weight2_eb*, reffects

sum weight2_* if hold_out_cl==2  // no predictions of random effects

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 weight2_fix |         63    50.34553     16.1147   25.57968   75.11138
 weight2_mix |          0
 weight2_eb1 |          0
 weight2_eb2 |          0

*Hold out sample of units (observations w/in clusters)
splitsample, split(.85 .15) generate(hold_out_unit) rseed(834098)

gen weight3 = weight
replace weight3 = . if hold_out_unit==2

mixed weight3 week || id: week, cov(un) reml

predict weight3_fix, xb 
predict weight3_mix, fitted
predict weight3_eb*, reffects

sum weight3_* if hold_out_unit==2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 weight3_fix |         65    49.19337    15.72399   25.53997   75.33659
 weight3_mix |         65    49.12059    16.25986   22.35899   83.86142
 weight3_eb1 |         65   -.0157374    .5312524   -1.42909   1.074583
 weight3_eb2 |         65   -.0441666    2.558147  -4.141885   7.773883

Announcement

Mixed Linear Regression Postestimation

Comment

Comment

Comment