linear mixed models time variable

pierre martin

Join Date: Nov 2017

Posts: 63
#1

linear mixed models time variable

27 Feb 2025, 09:24

Hi all,

I am new to the field of biostatistics from MD neurology, trying to run analyses by myself without being mentored at all which is terribly hard to navigate. I really hope the community can provide some insights.

I am investigating the effect of air pollution (continuous variable) measured at baseline on cognitive performances over time (cognition is measured as a composite score at baseline and then annually). Visit is 0, 1, 2 or 3 for baseline and then year 1, year 2 and year 3. I was thinking I could do this:

mixed cognition i.visit##c.pollution age sex education || id: visit residuals(ar 1, t(visit))

When I look at raw mean cognitive scores in my population before running any models, there is a slightly increase at year 1 (learning effect I guess) then a decrease then a slightly increase again.

I don't know if I should consider visit as 0, 1, 2 and 3 here or c.visit instead of i.visit? Convert in months 0, 12, 24 and 36? I noticed that in some papers, instead of having visit as the 'time' variable and age (age at baseline) as a covariate and their potential interaction (as it's not the same thing an increase in a year at age 60 than age 80 years old), they used 'age' as the 'time' variable (I imagine by incrementing age every year)? How would you recommend to proceed? Should I add a quadratic term on visit or age? I will of course compare models with lr tests or AIC/BIC etc but I wanted to make sure I start in the right direction.

I know this is a very basic question but any help you might provide would be so incredibly valuable to me.

Thank you so much,

Cheers,

Pierre M.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3152
#2

27 Feb 2025, 18:26

i.visit. Otherwise you are treating visit as continuous. if you do that, I'd allow a nonlinear relationship (in fact, it's just a parametric approach to your non-parametric i.visit).

changing to months makes no difference; they are perfectly correlated.

age increments with year, and age is probably important and may get washed out by the FE. Could be a heterogenous effect in age (c.age#c.pollution).
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#3

28 Feb 2025, 07:47

George has some good thoughts. I want to commend you on trying to do the analysis yourself and note that you seem to be asking a lot of good questions.

One issue to consider is whether the visit variable accurately captures how much time elapsed between visits. That is, was there variability in the time between visits 0 and 1, 1 and 2, and/or 2 and 3 for different people? George's answer assumes that all people had subsequent visits at the same pre-specified time. But if that is not the case, then you probably should code a variable that captures this amount of time, especially if all else equal, more time between visits is likely to be associated with differing levels of cognition. You can code the variable as capturing days from visit 0 and the value on this variable for visit 0 equals 0. If you do this, I would treat the new variable as continuous.

Note that in a lot of cases, people don't do this simply because the amount of time that elapsed between say visit 0 and visit 1 differed only by a few days across the sample. In that case, your model would work as is.
1 like
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#4

28 Feb 2025, 09:59

Hi George and Erik,

Thank you so much for taking time to comment on my post, I do appreciate your feedback.

Yes, participants have been seen regularly every year with only a few days of difference so I can actually consider every 12 months. In that case, would you rather consider visit as a continuous variable 0, 1, 2, 3 or 0, 12, 24, 36 which indeed does not change anything or a categorical variable. It looks like cognition does not decline regularly over time. There is a slightly improvement at year 1 then a decline than again a slightly improvement. If I consider visit as continuous I could add a quadratic term, does it seem appropriate to you?

I wanted to consider visit as my time variable in the model and just adjust for age and possibly an interaction term between age and visit. However, some people told me that I could instead of using visit calculate the age of participants at each visit and then get rid of this visit variable. My 'time' would be the age of the participant and I could do this :

mixed cognition c.age##c.pollution sex education || id: age residuals(ar 1, t(age)). Age would remplace my visit variable and be calculate with an increment of 1 year at each annual visit

instead of this : mixed cognition i.visit##c.pollution age sex education || id: visit residuals(ar 1, t(visit))

What are your thoughts on this?

Again, thank you so much for your help. It is extremely valuable to me.

Kind regards,

Pierre
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#5

28 Feb 2025, 16:15

I think how you specify the model depends on your research question. Is there a particular hypothesis you want to test?

To my mind, visit is just a variable that tracks yearly increments of time. So when you model the following,

Code:

mixed cognition i.visit##c.pollution age sex education || id: visit residuals(ar 1, t(visit))

you are testing whether yearly changes in cognition differ across persons and then whether cognition in a given year is partially a function of pollution levels (i.visit##c.pollution). All this is adjusted for age, sex, and education. Is that what interests you?

If you switch the time variable to age, as follows,

Code:

mixed cognition c.age##c.pollution sex education || id: age residuals(ar 1, t(age))

then you are answering the question of whether there are between person differences in how much cognition changes as people age (note that interpretation depends on how age is clocked - days, months, years) and whether this association varies by pollution levels one is exposed to at a given age. That seems a more relevant set of questions to me, but this isn't my project.
Comment
pierre martin

Join Date: Nov 2017

Posts: 63
#6

28 Feb 2025, 16:48

Hi Erick,
Thank you for your feedback. In my project there is a cross sectional part where I investigate whether pollution at baseline is associated with cognition measured at baseline. I did a linear model for that. And a second longitudinal part where I am interested in the effect of pollution (still measured only at baseline) on changes in cognition over time.
What is hard for me to figure out is if I should consider visit as a categorical variable i.visit, as a continuous variable (0, 1 year, 2 years... so c.visit) and in both case adjust for baseline age and other covariables.

Or if I should remove visit and instead calculate a new age variable incremented of 1 year for each visit and only use that variable instead.

To me the 2 approaches are kind of similar (as of course people age over time) but I think this is what I don't understand very well. How the 2 approaches differ and what I should rather do.

In the longitudinal part of my work, I want to know if the effect of pollution on cognition differs over time (or as people age) which is kind of equivalent no?

I think using age as a time variable would make more sense?

best,
Pierre
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#7

01 Mar 2025, 04:13

Originally posted by pierre martin View Post

. . . I could do this :

mixed cognition c.age##c.pollution sex education || id: age residuals(ar 1, t(age)). Age would remplace my visit variable and be calculate with an increment of 1 year at each annual visit

instead of this : mixed cognition i.visit##c.pollution age sex education || id: visit residuals(ar 1, t(visit))

What are your thoughts on this?

A few thoughts:

1. You don’t say much about your study’s setup, but I gather from the inclusion of education as a covariate that the participants are adults, and that the levels of pollution vary according to geography so that unless the participants have just moved into the neighborhood, then they’ve been exposed to their prevailing levels of pollution for perhaps decades.

I think that the first thing that I would do if it were me is to fit a model of age at baseline, pollution level and their interaction as predictors of cognition score at baseline,

Code:

regress cognition c.age##c.pollution if visit == 0

or to take a subset of your dataset of participants living in the highest quartile of pollution level and create a scatterplot of the baseline cognition score versus age.

Code:

centile pollution if visit == 0, centile(75) graph twoway scatter cognition age if pollution >= r(c_1) & visit == 0

If you don’t see profound effects of pollution exposure over the range of participants’ current lifespans—potentially a range of several decades—on your baseline cognition performance, then I think that you don’t have a prayer to see anything with this “composite score” over a three-year observation span.

2. I see that you impose a first-order autoregressive covariance structure on the residuals presumably for the potential to increase efficiency of the estimator. With such a short time span, the software might have difficulty getting a decent enough estimate on the AR 1 parameter to make much difference. Have you determined the correlation coefficient matrix of the residuals to see whether you’re really justified in imposing that structure in your model? If not, then subtract each time point’s cell mean from its data in order to get the residuals and then use correlate on those residuals. If you don’t see a clear pattern over the three lags, then it might not be worth the bother. As an alternative, you could consider an unstructured covariance structure on the residuals, which would cover your bases regardless.

3. Your mixed syntax include both a random slope and a covariance structure on the residuals. It’s fairly common to see either a random slope or a structured residual covariance matrix, but I don’t recall ever seeing a model fit with mixed that includes both. Although xtregar seems to something like that for "cross-sectional time-series", but whenever I’ve seen Stata’s mixed command or SAS’s PROC MIXED used to fit a MANOVA-like model (that is, a model that concerns itself with the structure of the residual covariance), the random effects equation is left completely empty. So, if you’re going to include a structured residual covariance matrix, then maybe consider something like

Code:

mixed cognition i.visit##c.pollution c.age i.(sex education) || id: , noconstant residuals(unstructured, t(visit))

4. I’m not sure how kosher it is to have a visit variable as factor variable (categorical, indicator, “dummy”) in the fixed effects equation while including it in the random effects equation as continuous, which is what you’re doing in your second example.

5. You mention that you don’t see much systematic change in cognitive performance over the three-year observation period. I’m guessing that your age-versus-visit issue is probably the least of your worries, and I would be surprised if your choice between them makes a day-versus-night difference in the ability to detect an interpretable change.
1 like
Comment

Announcement

linear mixed models time variable

Comment

Comment

Comment

Comment

Comment

Comment