Interpretation of covariate effects in a linear mixed model

Kjell Weyde

Join Date: May 2016

Posts: 129
#1

Interpretation of covariate effects in a linear mixed model

19 May 2016, 04:56

Hi!
I am working on a project that will explore whether long-term averaged traffic noise exposure is associated with body mass index in children measured at different ages (0-7). I use linear mixed models, with which I have little experience. My model looks like this:

mixed zbmi centnoise age age2 c.centnoise#c.age i.gender c.age##i.gender c.age2##i.gender || PREG_ID_1569: alder_mnd, cov(unstr) mle

-The continous noise variable is centered, but age is not (it includes 0). Is it in general good practice to center all continous variables?

-In order to answer my question, I look at the interaction term between noise and age to see whether the slope is influenced by noise, and if significant, one can say that noise is associated with bmi longitudinally. I am correct? (And the centnoise, the centered noise variable, indicate the average change in zbmi per one unit increase in the noise variable?) In the above model, this interaction term turns out to be insignificant. However, both age-gender-interactions are significant. Is a good way to proceed with gender stratified analyses, or should I stop with the model shown?

-In addition, I would like to include the covariate "diet" in the model. However, diet is only measured at the last time point (age 7). Is it still OK to include diet if I restrict analysis to only those who have this diet information?

-I also have some time varying categorical covariates I consider to include. In order to make interpretation easier, is it OK to generate a variabel that is the mean of a covariate's scores? (i.e., maternal smoking coded 0 or 1 at 5 different time points made by (smoking0+smoking1+...+smoking4)/4)

Best regards,
Kjell V. Weyde
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

19 May 2016, 09:38

First, let me point out that you have only incompletely taken advantage of factor-variable notation in posing your model. The way you coded it, Stata's -margins- command, which will likely prove very helpful to you in interpreting your results, has no way to know that age2 is the square of age (which, I presume, it is). So you should re-do this with complete use of factor variables. You can also use the algebraic properties of the ## operators to simplify it::

Code:

mixed zbmi centnoise c.age##c.age##i.gender c.centnoise#c.age || PREG_ID_1569: alder_mnd, cov(unstr) mle

Next, there is are two anomalies in your model specification:

1. You have specified an interaction between centnoise and the linear term for age, but not the quadratic. This may be right, but usually isn't. It implies that you believe that centnoise does not modify the shape of the parabolic zbmi age relationship, but does move the age at which it peaks (or plateaus). If this is what you intended, that's fine. But it's unusual. More commonly one expects the interaction to involve both the linear and quadratic terms.

2. You have specified random slopes at the PREG_ID_1569 level for a variable alder_mnd. But alder_mnd does not appear among the fixed effects. This means, whether you intended it or not, that your model constrains the average effect of alder_mnd to be zero. If that is not what you intended (it usually is not, in such circumstances) you must add alder_mnd to the fixed effects.

So I think your model will end up looking like this:

Code:

mixed zbmi centnoise c.age##c.age##(i.gender c.centnoise) alder_mnd || PREG_ID_1569: alder_mnd, cov(unstr) mle

To get a real understanding of what your model implies about your data, I strongly suggest you use the -margins- and -marginsplot- commands to explore predicted values for a representative range of values of centnoise age and gender, and marginal effects of centnoise at various ages, by gender.

All of that said, here are some answers to the specific questions you posed.

-The continous noise variable is centered, but age is not (it includes 0). Is it in general good practice to center all continous variables?

The decision to center or not center a continuous variable rests on a number of considerations, some related to modeling, and some computational. In models with quadratic effects, if you center that variable around the vertex of the parabola, you minimize the correlation between x and x², which sometimes offers computational advantages (though, in my experience, not all that often.) The most frequent reason for centering a variable is to facilitate interpretation of the model. So, there is non compelling case here for centering age that I can see. There is one situation where paying attention to centering is very important: with random slopes. So, your variance component for the random slope of alder_mnd will have a very different interpretation depending on whether and where alder_mnd is centered. If you don't actually care about the interpretation of the random effects of alder_mnd, then feel free to overlook this. But if you plan to actually look at that, and particularly if you plan to look at the correlation (covariance) between the intercept and the alder_mnd slope at the PREG_ID_1569 level, you absolutely must chose a centering that corresponds to what you want to interpret. I can't advise more concretely without knowing more about what you plan to do here.

I look at the interaction term between noise and age to see whether the slope is influenced by noise,

Correct.

and if significant, one can say that noise is associated with bmi longitudinally.

No. The significance of the interaction term does not tell you about this. It tells you whether the association between noise and bmi depends on age. In fact, in a model with interaction terms it is not possible to say that bmi "is" or "is not" associated with noise. For some values of age they may be strongly associated, and for others not. Running -margins, dydx(centnoise) at(age = (fill in interesting values of age here))- and -marginsplot- will be helpful in visualizing this. If you feel you must have a single summary test of the association of centnoise and bmi, it would be -test centnoise centnoise#age-: but really, single summary tests of variable associations like this are not very useful.

(And the centnoise, the centered noise variable, indicate the average change in zbmi per one unit increase in the noise variable?)

No. It indicates the average change in zbmi per one unit increase in the noise variable conditional on age = 0. In an interaction model, there is no such thing as "the average change in zbmi per one unit increase in the nonse variable" because that effect is different for different values of age. If you don't believe that there is a non-age-dependent single effect, then you should not use an interaction model. Again, the results of -margins- and -marginsplot- will help clarify this.

In the above model, this interaction term turns out to be insignificant.

OK, but see how it turns out with the quadratic age term included in the model (unless you have concluded that you really want a model that precludes that kind of effect modification).

both age-gender-interactions are significant. Is a good way to proceed with gender stratified analyses, or should I stop with the model shown?

So the effect of age on bmi is gender-dependent, which is not surprising. Interestingly, you have not included an interaction between gender and noise in your model. And perhaps there is no good reason to think that gender would modify the noise-bmi relationship. But in that case, gender is more of a nuisance variable, and I don't see what you would gain by going to gender-stratified analyses. I think the model that allows gender age interaction is probably quite adequate.

I would like to include the covariate "diet" in the model. However, diet is only measured at the last time point (age 7). Is it still OK to include diet if I restrict analysis to only those who have this diet information?

These are two separate issues. The fact that it is measured only once decreases your ability to make fine-grained analysis of the association between diet and bmi, but it can still be of some use. I would probably include this variable by spreading its values at age 7 to all of the person's observations. I would probably even rename the variable diet_at_age_7 so there is no confusion about what it means. As for restricting the analysis to only those who have this information, that depends on why only some people have it. If it was decided ahead of time to gather this data only from a randomly selected subset of study participants (perhaps due to the difficulty and expense of collecting the data), then your sample will be smaller and have less power, but no bias is introduced. On the other hand, if the missingness of diet information from some people arises because some people declined to provide it, then the potential for introducing bias into the analysis this way is rather great. You have to talk to the people who collected this data and find out why some people have this and others don't, and then make a decision. To guide your decision making, you might want to review http://www3.nd.edu/~rwilliam/stats2/l12.pdf.

I also have some time varying categorical covariates I consider to include. In order to make interpretation easier, is it OK to generate a variabel that is the mean of a covariate's scores? (i.e., maternal smoking coded 0 or 1 at 5 different time points made by (smoking0+smoking1+...+smoking4)/4)

While I could contrive unusual circumstances where this would make sense, it is typically not a good idea. You are probably better off by including a smoking variable that takes on the actual 0/1 values at the corresponding observation times.
Comment

Announcement

Interpretation of covariate effects in a linear mixed model

Comment