Interpret linear mixed effect model

Sara Hansen

Join Date: Apr 2022
Posts: 30

Interpret linear mixed effect model

17 Oct 2024, 09:17

I could really use some help whether I'm on the right track with 1) the model I've selected, and 2) the interpretation of the model output.

I want to analyse the associations between respiratory function at follow-up and the variables age, gender, smoking, and baseline respiratory function at follow-up. Specifically for a group of patients with a specific lung disease.

I have a theory that the decline in lung function over the years for this patient groups can be explained by time (because the older you get --> the worse lung function).

Each patient have baseline lung function measure, and then 1-3 follow up visits, where they measure the lung function again.

Example of my data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str4 recordid float(baseline_date date days_since_baseline) byte(sex age smoking)
"1"  21728 21728    0 1 62 0
"1"  21728 22719  991 1 62 0
"10" 20841 20841    0 1 34 0
"10" 20841 21577  736 1 34 0
"10" 20841 22305 1464 1 34 0
end
format %td baseline_date
format %td date

I used a linear mixed effect model:

Code:

mixed lung_function smoking sex age  days_since_baseline || recordid: days_since_baseline

And got the following output:

Code:

Performing EM optimization ...

Performing gradient-based optimization: 
Iteration 0:  Log likelihood = -434.99454  
Iteration 1:  Log likelihood = -434.38122  
Iteration 2:  Log likelihood = -434.36409  
Iteration 3:  Log likelihood = -434.36264  
Iteration 4:  Log likelihood = -434.36251  
Iteration 5:  Log likelihood = -434.36251  

Computing standard errors ...

Mixed-effects ML regression                          Number of obs    =    118
Group variable: recordid                             Number of groups =     49
                                                     Obs per group:
                                                                  min =      1
                                                                  avg =    2.4
                                                                  max =      4
                                                     Wald chi2(4)     =  20.68
Log likelihood = -434.36251                          Prob > chi2      = 0.0004

------------------------------------------------------------------------------------------
                lung_function | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
                 smoking |   -2.62267    2.20514    -1.19   0.234    -6.944665    1.699325
                     sex |  -.3625873   3.859876    -0.09   0.925    -7.927806    7.202632
                     age |  -.2484607   .1275995    -1.95   0.052    -.4985513    .0016298
     days_since_baseline |  -.0040639   .0010194    -3.99   0.000     -.006062   -.0020658
                   _cons |   113.3245   7.138893    15.87   0.000     99.33253    127.3165
------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
recordid: Independent        |
    var(days_since_baseline) |   2.69e-13   3.76e-10             0           .
                  var(_cons) |   158.1431   34.86357      102.6592    243.6142
-----------------------------+------------------------------------------------
               var(Residual) |   32.42914   5.543997      23.19617    45.33718
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 89.32                 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.



.

I interpret this as the following:
Fixed effects (estimated effects across all patients):
- Smoking, sex, and age do not show a significant effect on lung function (however, age is borderline significant).
- The time variable (days since baseline) is significant: For each day since baseline, lung function decreases by 0.004 units. I.e. lung function declines over time.

Random effects (the variability within and between patients (record id)):
- The random slope (days since baseline) coefficient is basically 0, meaning there is no variability in how lung function changes over time across patients. I.e. the rate of lung function decline is similar across the patients.

- The random intercept (recordid): there is a substantial variation in the baseline lung function level between patients. I.e. the patients baseline lung function measures were not similar.

- Residual variance = 32.4. This means that approx 30% of the variation in lung function is not explained by the fixed and random effects (i.e. not explained by the variabels included in the model).

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

17 Oct 2024, 10:29

You are on the right track here, and I agree with your interpretations, except for the one about residual variance.

The residual variance is indeed 32.4 square lung function units. But I don't see anything you show that suggests that this is approximately 30% of the variation in lung function. For that to be true, the total variance in lung function would have to be about 110 square units. But the record_id level intercept component of the variance alone is about 158 square units. So the total must be even larger.

Also, I think it is not a good idea to interpret fixed effects coefficients as if they were the exact effect. They are estimates, and there is a range of uncertainty around them. The confidence level around your days_since_baseline coefficient is from -.006 to -.002. So instead of saying that lung function declines by 0.004 units per day it would be better to phrase it as lung function declines with time at an estimated rate of decline is 0.004 units per day, with a 95% confidence interval from 0.002 to 0.006.
3 likes
Comment
Sara Hansen

Join Date: Apr 2022

Posts: 30
#3

17 Oct 2024, 10:42

Originally posted by Clyde Schechter View Post

You are on the right track here, and I agree with your interpretations, except for the one about residual variance.

The residual variance is indeed 32.4 square lung function units. But I don't see anything you show that suggests that this is approximately 30% of the variation in lung function. For that to be true, the total variance in lung function would have to be about 110 square units. But the record_id level intercept component of the variance alone is about 158 square units. So the total must be even larger.

Also, I think it is not a good idea to interpret fixed effects coefficients as if they were the exact effect. They are estimates, and there is a range of uncertainty around them. The confidence level around your days_since_baseline coefficient is from -.006 to -.002. So instead of saying that lung function declines by 0.004 units per day it would be better to phrase it as lung function declines with time at an estimated rate of decline is 0.004 units per day, with a 95% confidence interval from 0.002 to 0.006.

Ah, I understand now—I had interpreted it somewhat like R-squared, which is, of course, incorrect. Thank you for also pointing out my overly black-and-white interpretation. I really appreciate you taking the time to help me; it's very helpful!
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#4

17 Oct 2024, 10:43

Hi Sara,

Thanks for presenting a detailed explanation of your data (with example) and the model you ran.

Your interpretations generally look right to me and are consistent with the model results. I'm would be careful interpreting coefficients from variables that are not the primary focus of your study and are instead covariates to account for potential selection factors. See Westerich and Greenland's (2013) paper on the Table 2 fallacy, which addresses this issue.

One thing you might think a little more about is how to interpret the variances of the constant and the residual. In mixed models, they tell you about the degree to which variance is between vs. within persons. You point out that the between person variance is substantial but you mention that it is related to their baseline lung function. It is actually about variance in mean outcome lung function, after adjusting for covariates. With the residual variance, this is about unexplained within-person variation in the outcome, which should be time-varying.

I think my biggest questions are about the degree to which the days since baseline and the age slopes are linear in nature. Have you plotted the lung_function variable (not in your dataex, btw) against these two variables and fit a lowess smother to the data points (graph twoway will do this for you)? If those variables have a non-linear association with lung function, you will want to think about introducing polynomials or some other functional form for these predictors (e.g., splines). I might also consider other ways to clock time in the time variable given it appears to have a very large range. Sometimes this results in sparsity where you do not have a lot of data in certain parts of the range. So converting days into weeks or months may be appropriate. This may ultimately be decided in terms of clinical relevance and disease progression. I am not a MD so have little to add about that.

Last edited by Erik Ruzek; 17 Oct 2024, 10:44. Reason: Crossed with post #2
2 likes
Comment

Announcement

Interpret linear mixed effect model

Comment

Comment

Comment