Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing mixed effects models

    Hello,

    I was hoping someone could help me understand how I should assess which statistical model is best to use, both statistically and visually using mixed effects models.

    I am looking at concentrations of a protein, NfL, in different CAG repeat lengths (40-45) across age in Huntington's disease. My data is cross-sectional.

    The number of data points for each CAG group is below, some CAG repeat lengths have very small amounts of data points and I was wondering if they needed to be combined potentially if the number of data points is underpowered?
    • 40 = 4 participants
    • 41 = 22 participants
    • 42 = 15 participants
    • 43 = 9 participants
    • 44 = 8 participants
    • 45 = 1 participant
    • 46 = 2 participants
    The current code performs a mixed-effects model, then a margins prediction, and then a plot of the predicted margins with the scatter points overlayed. See the code below:
    Code:
    /////////controls
    
         summarize Age if disease_grp==0, meanonly
            local min = round(r(min)) //define for the margins estimate
            local max = round(r(max)) //define for the margins estimate
            
          mixed NfL c.Age if disease_grp==0, stddeviations reml
            est store mixed_control 
            local loglike_control : display %5.3f e(ll) //store log likelihood for control model
    margins, at(Age=(`min'(0.15)`max')) post
    est store model_control 
    
         coefplot (model_control   , recast(line) lcolor("`INF_grey'") noci), ///
                 at ytitle("NfL pg/mL controls") xtitle("Age, years") plotregion(lstyle(none)) xlabel(18(10)50) ylabel(0(1)4) name(`model'_control) ///
                 addplot(scatter NfL Age if disease_grp==0 ,  mcolor("`INF_grey'"))
    
    
    /////////gene expansion carriers 
    
         summarize CAG if disease_grp==1, meanonly
            local min_cag = round(r(min)) //define for CAG loop below
             local max_cag = round(r(max)) //define for CAG loop below    
        
         forvalues e = `min_cag'/`max_cag' {
             summarize Age if CAG==`e' & disease_grp==1, meanonly
                 local min = round(r(min)) //define for the margins estimate
                 local max = round(r(max)) //define for the margins estimate
            mixed NfL c.Age c.CAG if disease_grp==1, stddeviations reml
                     local loglike_`e' : display %5.3f e(lls) //store log likelihood for control model
    
          margins, at(Age=(`min'(0.15)`max') CAG=`e') post //predicted margins 15% above/below max and min CAG
                        est store model_cag`e'
    
    
             coefplot (model_cag`e'   , recast(line) lcolor("`INF_Red_Light'") noci), ///
                     at ytitle("NfL pg/mL `cag'") xtitle("Age, years") plotregion(lstyle(none)) xlabel(18(10)50) ylabel(0(1)4) ///
                     name(`model'_cag`e', replace) ///
                          addplot(scatter NfL Age if disease_grp==1 & CAG==`e',  mcolor("`INF_Red_Light'"))
            
    }
    I am unsure how to fit these models to assess if they are well-suited for the data I have and what to plot whether it's the residuals from the mixed effects model or the predicted values from the margins? Is it correct to create a new model for each CAG to plot for each CAG?

    I was also wondering if a regression would be better suited for this analysis? As a linear regression model treats each CAG group effect as a fixed effect, estimating a separate coefficient for each group. Whereas mixed effects model treats the CAG group effects as random, allowing for variability across groups and better generalization. How would I decide which would be better to use?

    I look forward to any responses and I hope someone can help with some of my questions at least!

    Very best wishes,
    Annabelle Coleman

  • #2
    Before addressing any questions about model comparison, I want to point out that your mixed syntax does not have a random intercept. If it did, i would expect to see something like || group_id: in the random part of the mixed statement. Instead, you just specify that one model is to be used for grp==0 and another for grp==1. So, under the hood, mixed is giving you (restricted) maximum likelihood parameter estimates. Fundamentally, unless you have a strong preference for using maximum likelihood, you can switch to OLS (regress) and results should be almost identical. Nothing about your present analysis requires a mixed effect model unless you do indeed have some sort of grouping variable (e.g., doctor or hospital) that you haven't denoted in your syntax.
    Last edited by Erik Ruzek; 25 Jun 2024, 11:17. Reason: Edited for clarity

    Comment

    Working...
    X