Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpret linear mixed effect model

    I could really use some help whether I'm on the right track with 1) the model I've selected, and 2) the interpretation of the model output.



    I want to analyse the associations between respiratory function at follow-up and the variables age, gender, smoking, and baseline respiratory function at follow-up. Specifically for a group of patients with a specific lung disease.

    I have a theory that the decline in lung function over the years for this patient groups can be explained by time (because the older you get --> the worse lung function).

    Each patient have baseline lung function measure, and then 1-3 follow up visits, where they measure the lung function again.

    Example of my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 recordid float(baseline_date date days_since_baseline) byte(sex age smoking)
    "1"  21728 21728    0 1 62 0
    "1"  21728 22719  991 1 62 0
    "10" 20841 20841    0 1 34 0
    "10" 20841 21577  736 1 34 0
    "10" 20841 22305 1464 1 34 0
    end
    format %td baseline_date
    format %td date

    I used a linear mixed effect model:

    Code:
    mixed lung_function smoking sex age  days_since_baseline || recordid: days_since_baseline


    And got the following output:

    Code:
    Performing EM optimization ...
    
    Performing gradient-based optimization: 
    Iteration 0:  Log likelihood = -434.99454  
    Iteration 1:  Log likelihood = -434.38122  
    Iteration 2:  Log likelihood = -434.36409  
    Iteration 3:  Log likelihood = -434.36264  
    Iteration 4:  Log likelihood = -434.36251  
    Iteration 5:  Log likelihood = -434.36251  
    
    Computing standard errors ...
    
    Mixed-effects ML regression                          Number of obs    =    118
    Group variable: recordid                             Number of groups =     49
                                                         Obs per group:
                                                                      min =      1
                                                                      avg =    2.4
                                                                      max =      4
                                                         Wald chi2(4)     =  20.68
    Log likelihood = -434.36251                          Prob > chi2      = 0.0004
    
    ------------------------------------------------------------------------------------------
                    lung_function | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------------------+----------------------------------------------------------------
                     smoking |   -2.62267    2.20514    -1.19   0.234    -6.944665    1.699325
                         sex |  -.3625873   3.859876    -0.09   0.925    -7.927806    7.202632
                         age |  -.2484607   .1275995    -1.95   0.052    -.4985513    .0016298
         days_since_baseline |  -.0040639   .0010194    -3.99   0.000     -.006062   -.0020658
                       _cons |   113.3245   7.138893    15.87   0.000     99.33253    127.3165
    ------------------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
    recordid: Independent        |
        var(days_since_baseline) |   2.69e-13   3.76e-10             0           .
                      var(_cons) |   158.1431   34.86357      102.6592    243.6142
    -----------------------------+------------------------------------------------
                   var(Residual) |   32.42914   5.543997      23.19617    45.33718
    ------------------------------------------------------------------------------
    LR test vs. linear model: chi2(2) = 89.32                 Prob > chi2 = 0.0000
    
    Note: LR test is conservative and provided only for reference.
    
    
    
    .

    I interpret this as the following:
    Fixed effects (estimated effects across all patients):
    - Smoking, sex, and age do not show a significant effect on lung function (however, age is borderline significant).
    - The time variable (days since baseline) is significant: For each day since baseline, lung function decreases by 0.004 units. I.e. lung function declines over time.



    Random effects (the variability within and between patients (record id)):
    - The random slope (days since baseline) coefficient is basically 0, meaning there is no variability in how lung function changes over time across patients. I.e. the rate of lung function decline is similar across the patients.

    - The random intercept (recordid): there is a substantial variation in the baseline lung function level between patients. I.e. the patients baseline lung function measures were not similar.


    - Residual variance = 32.4. This means that approx 30% of the variation in lung function is not explained by the fixed and random effects (i.e. not explained by the variabels included in the model).



  • #2
    You are on the right track here, and I agree with your interpretations, except for the one about residual variance.

    The residual variance is indeed 32.4 square lung function units. But I don't see anything you show that suggests that this is approximately 30% of the variation in lung function. For that to be true, the total variance in lung function would have to be about 110 square units. But the record_id level intercept component of the variance alone is about 158 square units. So the total must be even larger.

    Also, I think it is not a good idea to interpret fixed effects coefficients as if they were the exact effect. They are estimates, and there is a range of uncertainty around them. The confidence level around your days_since_baseline coefficient is from -.006 to -.002. So instead of saying that lung function declines by 0.004 units per day it would be better to phrase it as lung function declines with time at an estimated rate of decline is 0.004 units per day, with a 95% confidence interval from 0.002 to 0.006.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      You are on the right track here, and I agree with your interpretations, except for the one about residual variance.

      The residual variance is indeed 32.4 square lung function units. But I don't see anything you show that suggests that this is approximately 30% of the variation in lung function. For that to be true, the total variance in lung function would have to be about 110 square units. But the record_id level intercept component of the variance alone is about 158 square units. So the total must be even larger.

      Also, I think it is not a good idea to interpret fixed effects coefficients as if they were the exact effect. They are estimates, and there is a range of uncertainty around them. The confidence level around your days_since_baseline coefficient is from -.006 to -.002. So instead of saying that lung function declines by 0.004 units per day it would be better to phrase it as lung function declines with time at an estimated rate of decline is 0.004 units per day, with a 95% confidence interval from 0.002 to 0.006.


      Ah, I understand now—I had interpreted it somewhat like R-squared, which is, of course, incorrect. Thank you for also pointing out my overly black-and-white interpretation. I really appreciate you taking the time to help me; it's very helpful!

      Comment


      • #4
        Hi Sara,

        Thanks for presenting a detailed explanation of your data (with example) and the model you ran.

        Your interpretations generally look right to me and are consistent with the model results. I'm would be careful interpreting coefficients from variables that are not the primary focus of your study and are instead covariates to account for potential selection factors. See Westerich and Greenland's (2013) paper on the Table 2 fallacy, which addresses this issue.

        One thing you might think a little more about is how to interpret the variances of the constant and the residual. In mixed models, they tell you about the degree to which variance is between vs. within persons. You point out that the between person variance is substantial but you mention that it is related to their baseline lung function. It is actually about variance in mean outcome lung function, after adjusting for covariates. With the residual variance, this is about unexplained within-person variation in the outcome, which should be time-varying.

        I think my biggest questions are about the degree to which the days since baseline and the age slopes are linear in nature. Have you plotted the lung_function variable (not in your dataex, btw) against these two variables and fit a lowess smother to the data points (graph twoway will do this for you)? If those variables have a non-linear association with lung function, you will want to think about introducing polynomials or some other functional form for these predictors (e.g., splines). I might also consider other ways to clock time in the time variable given it appears to have a very large range. Sometimes this results in sparsity where you do not have a lot of data in certain parts of the range. So converting days into weeks or months may be appropriate. This may ultimately be decided in terms of clinical relevance and disease progression. I am not a MD so have little to add about that.
        Last edited by Erik Ruzek; 17 Oct 2024, 10:44. Reason: Crossed with post #2

        Comment

        Working...
        X