Comparing mixed effects models

Annabelle Coleman

Join Date: May 2023
Posts: 44

Comparing mixed effects models

25 Jun 2024, 10:56

Hello,

I was hoping someone could help me understand how I should assess which statistical model is best to use, both statistically and visually using mixed effects models.

I am looking at concentrations of a protein, NfL, in different CAG repeat lengths (40-45) across age in Huntington's disease. My data is cross-sectional.

The number of data points for each CAG group is below, some CAG repeat lengths have very small amounts of data points and I was wondering if they needed to be combined potentially if the number of data points is underpowered?

40 = 4 participants
41 = 22 participants
42 = 15 participants
43 = 9 participants
44 = 8 participants
45 = 1 participant
46 = 2 participants

The current code performs a mixed-effects model, then a margins prediction, and then a plot of the predicted margins with the scatter points overlayed. See the code below:

Code:

/////////controls

     summarize Age if disease_grp==0, meanonly
        local min = round(r(min)) //define for the margins estimate
        local max = round(r(max)) //define for the margins estimate
        
      mixed NfL c.Age if disease_grp==0, stddeviations reml
        est store mixed_control 
        local loglike_control : display %5.3f e(ll) //store log likelihood for control model
margins, at(Age=(`min'(0.15)`max')) post
est store model_control 

     coefplot (model_control   , recast(line) lcolor("`INF_grey'") noci), ///
             at ytitle("NfL pg/mL controls") xtitle("Age, years") plotregion(lstyle(none)) xlabel(18(10)50) ylabel(0(1)4) name(`model'_control) ///
             addplot(scatter NfL Age if disease_grp==0 ,  mcolor("`INF_grey'"))


/////////gene expansion carriers 

     summarize CAG if disease_grp==1, meanonly
        local min_cag = round(r(min)) //define for CAG loop below
         local max_cag = round(r(max)) //define for CAG loop below    
    
     forvalues e = `min_cag'/`max_cag' {
         summarize Age if CAG==`e' & disease_grp==1, meanonly
             local min = round(r(min)) //define for the margins estimate
             local max = round(r(max)) //define for the margins estimate
        mixed NfL c.Age c.CAG if disease_grp==1, stddeviations reml
                 local loglike_`e' : display %5.3f e(lls) //store log likelihood for control model

      margins, at(Age=(`min'(0.15)`max') CAG=`e') post //predicted margins 15% above/below max and min CAG
                    est store model_cag`e'


         coefplot (model_cag`e'   , recast(line) lcolor("`INF_Red_Light'") noci), ///
                 at ytitle("NfL pg/mL `cag'") xtitle("Age, years") plotregion(lstyle(none)) xlabel(18(10)50) ylabel(0(1)4) ///
                 name(`model'_cag`e', replace) ///
                      addplot(scatter NfL Age if disease_grp==1 & CAG==`e',  mcolor("`INF_Red_Light'"))
        
}

I am unsure how to fit these models to assess if they are well-suited for the data I have and what to plot whether it's the residuals from the mixed effects model or the predicted values from the margins? Is it correct to create a new model for each CAG to plot for each CAG?

I was also wondering if a regression would be better suited for this analysis? As a linear regression model treats each CAG group effect as a fixed effect, estimating a separate coefficient for each group. Whereas mixed effects model treats the CAG group effects as random, allowing for variability across groups and better generalization. How would I decide which would be better to use?

I look forward to any responses and I hope someone can help with some of my questions at least!

Very best wishes,
Annabelle Coleman

Tags: None

Erik Ruzek

Join Date: Oct 2017

Posts: 398
#2

25 Jun 2024, 11:15

Before addressing any questions about model comparison, I want to point out that your mixed syntax does not have a random intercept. If it did, i would expect to see something like || group_id: in the random part of the mixed statement. Instead, you just specify that one model is to be used for grp==0 and another for grp==1. So, under the hood, mixed is giving you (restricted) maximum likelihood parameter estimates. Fundamentally, unless you have a strong preference for using maximum likelihood, you can switch to OLS (regress) and results should be almost identical. Nothing about your present analysis requires a mixed effect model unless you do indeed have some sort of grouping variable (e.g., doctor or hospital) that you haven't denoted in your syntax.

Last edited by Erik Ruzek; 25 Jun 2024, 11:17. Reason: Edited for clarity
1 like
Comment

Announcement

Comparing mixed effects models

Comment