Random slope and random intercept mixed linear model

Ian Maccie

Join Date: Jul 2017

Posts: 18
#1

Random slope and random intercept mixed linear model

13 Jul 2017, 03:59

Can anyone please help with the following?

I have a rather small but complex dataset consisting these variables of interest
Var1 = A continuous variable from 0 to approximately 7
Var2 = A categorical variable from 1 to 3, where 1 acts as the reference for the value 2 and 3
Var3 = A continuous variable from 5 to approximately 45
Var4 = ID of the included patients goes from 1 to 10

Can anyone please inform me
how to recode the xtmixed so I can have both random slope and intercept in the models in which var2 is nested within var4?

how to calculate the mixed linear regression equation with 95 % confidence interval for the slope and the corresponding p-value for these three equations (I know the random effects are not listed in equations but it is rather because I don’t know how to calculate them)?
Var1 = intercept + var3 if var2 == 1

Var1 = intercept + var3 if var2 == 2

Var1 = intercept + var3 if var2 == 3

how to interpret the random effects parameters box?

The output is as following:

xtmixed c.var1 c.var3 i.var2|| var4: || var2:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -186.78944
Iteration 1: log likelihood = -186.29506
Iteration 2: log likelihood = -186.29315
Iteration 3: log likelihood = -186.29313

Computing standard errors:

Mixed-effects ML regression Number of obs = 109

-------------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+--------------------------------------------
Var4 | 10 5 10.9 19
Var2 | 28 1 3.9 8
-------------------------------------------------------------

Wald chi2(3) = 100.09
Log likelihood = -186.29313 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
Var1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Var3 | .0451614 .014722 3.07 0.002 .0163068 .074016
|
Var2 |
2 | -1.765215 .3077596 -5.74 0.000 -2.368413 -1.162018
3 | -2.999844 .3132985 -9.58 0.000 -3.613898 -2.38579
|
_cons | 4.157045 .3764994 11.04 0.000 3.41912 4.894971
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Var4: Identity |
sd(_cons) | .1781771 .248025 .0116404 2.727322
-----------------------------+------------------------------------------------
Var2: Identity |
sd(_cons) | 1.67e-06 7.87e-06 1.62e-10 .0171642
-----------------------------+------------------------------------------------
sd(Residual) | 1.325808 .0939577 1.153871 1.523364
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 0.16 Prob > chi2 = 0.9226

Note: LR test is conservative and provided only for reference.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

13 Jul 2017, 08:19

I find your post confusing, and you describe your desired model in terms that appear to contradict each other.

For a starting point, the combination of three separate equations:

Var1 = intercept + var3 if var2 == 1

Var1 = intercept + var3 if var2 == 2

Var1 = intercept + var3 if var2 == 3]

has nothing to do with mixed models: it's just an interaction between var3 and var2 and can be captured simply as

Code:

regress var1 i.var2##c.var3

However, since you have repeated measures, you need to use some model that properly accounts for that. So a mixed model, with a random intercept at the id: level since each id was measured on more than one occasion (or at least I think that's what you mean--is that right?) So that ups the ante a bit to

Code:

mixed var1 i.var2##c.var3 || var4:

Note that since version 14, -xtmixed- has been renamed -mixed-.

Your post title refers to wanting random slopes as well, but nothing in the description of the text explains why you want that, and I fear that you think those three equations I quoted above constitute random slopes. They do not. You don't need random slopes to get those three equations: you just need the interaction term I showed above. If you genuinely want the slope of var3 to be random across individuals in addition to depending on the level of var2, then the code is:

Code:

mixed var1 i.var2##c.var3 || var4: var3

Be sure you understand the meaning of this model before you adopt it. It means that the slope of var1 on var3 has a expected value that depends on the value of var2. In addition, the individual value of that slope varies among the persons, coming from a normal distribution centered around the var2-dependent expected value and with a standard deviation (variance) to be estimated from the data. That takes things too a level of ramification beyond just your original three equations.
2 likes
Comment

Ian Maccie

Join Date: Jul 2017
Posts: 18

14 Jul 2017, 08:35

Dear Clyde,

Thank you for your great input and sorry for my little confusion terminology (I am not an expert within this field).

It is correct that I have repeated measures. Within each of the three categories in var2, I have at least three repeated measures of var1. I have noticed, that xtmixed has changed to mixed.

Regarding the interaction, it is now clear that I need it in the model. When I look at the scattered data (var1 as a function of var3) and categorize the scatter-dots according to var2, the scatters of each category tend towards different slopes and definitely different intercepts. Is this a fair reason to have random intercepts and slopes? Var2 varies randomly among patients in var4 and the slope of the continuous variable var3 seems to depend on var2.

Given the previous mentioned can I use this code for stata?

Code:

mixed var1 i.var2##c.var3 ||var4: || var2: var3

If yes, this is the result:

HTML Code:

. mixed var1 i.var##c.var3|| var4: || var2: var3

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -186.77849  
Iteration 1:   log likelihood = -185.86858  
Iteration 2:   log likelihood = -185.85497  
Iteration 3:   log likelihood = -185.85373  
Iteration 4:   log likelihood = -185.85373  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        109

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
      pignumber |         10          5       10.9         19
     tissuetype |         28          1        3.9          8
-------------------------------------------------------------

                                                Wald chi2(5)      =      83.07
Log likelihood = -185.85373                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------
var1|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
var2|
   2|  -1.794695   .7666552    -2.34   0.019    -3.297311    -.292078
   3|  -2.717368   .8179917    -3.32   0.001    -4.320602   -1.114133
    |
var3|   .0481726   .0241009     2.00   0.046     .0009357    .0954095
    |
i.var2#c.var3 |
    2|   .0010644   .0357765     0.03   0.976    -.0690562    .0711851
    3|  -.0122471   .0359003    -0.34   0.733    -.0826103    .0581162
    |
_cons |   4.083168   .5320625     7.67   0.000     3.040344    5.125991
-----------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
var4: Identity             |
                  var(_cons) |   6.50e-12   1.23e-10      5.45e-28    77606.65
-----------------------------+------------------------------------------------
var2: Independent      |
                    var(var3) |   .0002543     .00043      9.26e-06     .006989
                  var(_cons) |   1.41e-14          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   1.656782   .2719505      1.201008    2.285519
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 0.77                  Prob > chi2 = 0.8557

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

14 Jul 2017, 09:16

Regarding the interaction, it is now clear that I need it in the model. When I look at the scattered data (var1 as a function of var3) and categorize the scatter-dots according to var2, the scatters of each category tend towards different slopes and definitely different intercepts.

No. This tells you that you need the interaction between var2 and var3. It doesn't really tell you whether random intercepts and slopes are helpful or not. Even if ultimately you include they are, your code is not correct. See what I suggested in #2 for correct code with random intercepts alone or random slopes and intercepts.

As for deciding whether to use random slopes and intercepts, if there is no scientific theory in your area to go on, you can run the models with and without them and then look at the LR test vs linear model that comes at the end of the output to inform your decision making.
Comment
Ian Maccie

Join Date: Jul 2017

Posts: 18
#5

14 Jul 2017, 12:21

If we assume that the presented are correct - just hypothetically - how would I be able to get the equation for each of the categories of var2 and the corresponding p-value?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

14 Jul 2017, 12:43

The intercepts of the equations for var2 = 1, 2, and 3 would be _b[_cons], _b[_cons]+_b[2.var2] and _b[_cons] +b]3.var2].

The slopes of the equations for var2 = 1, 2, and 3 would be _b[var3], _b[var3] + _b[2.var2#c.var3], and _b[var3] + _b[3.var2#c.var3].

You can use the -lincom- command to calculate these.

I don't know what you mean by "the corresponding p-value?" Equations are fortunate in not being burdened with p-values.
1 like
Comment
Ian Maccie

Join Date: Jul 2017

Posts: 18
#7

14 Jul 2017, 14:42

To the best of my knowledge the lincom function will give me the slopes within each category of var2. This is a part of what I asked for in the beginning of this thread, however, how can I test wheter these slopes with random effects are significantly different from each?

And how do I calculate the intercept of each var2=1, 2 and 3?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#8

14 Jul 2017, 15:21

how can I test wheter these slopes with random effects are significantly different from each?

So, after running the random effects model you can run

Code:

test 1.var2#c.var3 = 2.var2#c.var3 = 3.var2#c.var3

And how do I calculate the intercept of each var2=1, 2 and 3?

Answered in #6.
1 like
Comment
Ian Maccie

Join Date: Jul 2017

Posts: 18
#9

14 Jul 2017, 15:45

Thank you for your time and great help.
Comment

Ian Maccie

Join Date: Jul 2017
Posts: 18

#10

14 Jul 2017, 16:53

I have just reviewed the codes from #6 and can see, that these intercepts and slopes only includes the fixed part and no random effects at all. If i use the suggested

mixed var1 i.var2##c.var3 || var4: var3

do I simply have to add the sd(var3) and sd(residuals) to each of the _b[var3], _b[var3] + _b[2.var2#c.var3], and _b[var3] + _b[3.var2#c.var3] and add sd(_cons) to each of the _b[_cons], _b[_cons]+_b[2.var2] and _b[_cons] +b]3.var2] to get the equations within each category of var2 with random effects? Or am I wrong

Code:

. xtmixed var1 c.var2##i.var3|| var4: var3

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -185.49682  
Iteration 1:   log likelihood = -185.26954  
Iteration 2:   log likelihood = -185.26789  
Iteration 3:   log likelihood = -185.26789  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        109
Group variable: pignumber                       Number of groups  =         10

                                                Obs per group:
                                                              min =          5
                                                              avg =       10.9
                                                              max =         19

                                                Wald chi2(5)      =      99.12
Log likelihood = -185.26789                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------
                     var1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                      var3 |   .0498006   .0240906     2.07   0.039      .002584    .0970172
                             |
                     var2 |
                         2  |  -1.743623   .7677119    -2.27   0.023     -3.24831   -.2389351
                         3  |  -2.657442    .817759    -3.25   0.001     -4.26022   -1.054663
                             |
        i.var2#c.var3 |
                         2  |  -.0014285   .0349997    -0.04   0.967    -.0700267    .0671696
                         2  |  -.0116853   .0350165    -0.33   0.739    -.0803165    .0569458
                            |
                  _cons |   4.057095   .5313023     7.64   0.000     3.015762    5.098428
-----------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Independent       |
                     sd(var3) |   .0164756   .0092438      .0054861    .0494785
                   sd(_cons) |   4.41e-09   4.57e-08      6.59e-18     2.95261
-----------------------------+------------------------------------------------
                sd(Residual) |   1.286054   .1068056      1.092868     1.51339
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 1.95                  Prob > chi2 = 0.3781

Note: LR test is conservative and provided only for reference.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#11

14 Jul 2017, 19:40

No. When you use random slopes, you are fitting a model in which there are, potentially, an infinite number of such equations. Each individual (or whatever the group of entities identified by var4 consists of) has its own equation. For each value of var2 (due to the var2#var3 interaction) there is a separate distribution of equations. The mean slopes and intercepts, conditional on var2, are given by the formulas in #6. Each individual regression line, conditional on var2, is drawn from a bivariate normal distribution of slopes and intercepts centered at those means, and with standard deviations given by sd(_cons) and sd(var3), respectively. The actual individual random effects (for both intercept and slope) can be estimated using -predict-

So, if you want the intercept and slope of each individual, you can use this code:

Code:

predict re_slope re_int, reffects

Then you can calculate the mean slope and intercept conditional on var2 as shown in #6, and add re_int to the mean intercept and re_slope to the mean slope. That will give you each individual's estimated intercept and slope. I don't know what you will use that for; I can't see I've ever seen anybody do that. But it can be done.

But let's step back and look at your outputs. The estimated grand mean of your outcome variable var1 when var2 = 0 and var3 = 0 is close to 4. And the coefficients of 2.var2 and 3.var2 are appreciable, relative to that: so they move the intercept from around 4 to around 2.3 and 1.4. So that seems like a meaningful separation of the mean intercepts. Notice that sd(_cons) is a really tiny number, 4.41x10^-9. So even if some individual observation is a 10 sd outlier for its intercept, that will only change that individual intercept by around 4 in the 8th decimal place: not even close to a rounding error. So these random intercepts are not really meaningfully different once we know var2. The random intercept distribution is essentially a spike.

Now let's look at the slopes. When var2 = 1, the mean slope is about .0498. When var2 = 3 it's about .0484, and when var2 = 3 it's about .0381. Certainly the difference between the mean slope for var2 = 1 and when var2 = 2 is in rounding-error territory, and that's pretty close to true for var2 = 3 mean slope as well. Unless the variable var3 takes on extremely large values, so that small differences in these slopes scale up to appreciable differences in var1, these interactions are looking like they don't amount to much at all. And what does the random variation in slopes add to this picture? Well, the standard deviation of the slopes within each var2-defined group is about 0.0165. Your whole sample size is only 109, and so you probably have, on average, about 36 observations in each var2-defined group. So it is unlikely you have even one 3-sd deviant in each group. So 2 sd is 0.033. Thus in the var2=1 group, the range of slopes is probably from about .0498-0.033 to about 0.0498+0.033. In relative terms, that random component is pretty large.

So it seems to me that you could streamline your model by eliminating the i.var2#c.var3 interaction: its effects on the outcome are probably not detectable at all. Now, the estimates of the random intercept and slope standard deviations may change when you take out i.var2#c.var3. So I would re-run the model as:

Code:

mixed var1 i.var2 var3 || var4: var3

and then re-evaluate. I would not invest much effort into detailed work on the model shown in #10 as you will be focusing on the larvae of the ants crawling on the bark of one tree and missing the forest if you do.

Added: One additional remark. You only have 10 groups. That is a rather small sample of the group space. You cannot consider the random effects parameters to be well estimated at all. That is probably why that last line about LR test vs linear model gives such an anemic result. Even though the estimated variation of the slopes is large enough that it may well be of practical importance, estimating it with N = 10 gives you such imprecise estimates that Stata thinks you would be just as well off ignoring the variation altogether. I think it's a huge stretch to be doing this model with just N = 10 groups. If you think the distinctions among the 10 groups identified by var4 are important to your research goals, then I would be more inclined to go to a fixed effects model:

Code:

regress var1 i.var2 var3 i.var4##c.var3

Such a model will give you more unbiased estimation of the actual var1 = a + b*var3 equations for each of the 10 var4-groups than the -mixed- model is giving you. What you lose is that your results would not be generalizable to different var4-defined groups of individuals. But generalizing from a sample of 10 is always risky business.

Last edited by Clyde Schechter; 14 Jul 2017, 19:51.
1 like
Comment
Ian Maccie

Join Date: Jul 2017

Posts: 18
#12

17 Jul 2017, 09:28

Can I use the code below eventhough I have repeated measures of var1 at least three times for each category in i.var2? Will change of the regress to mixed command take care of the repeated measures issue?

regress var1 i.var2 var3 i.var4##c.var3
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#13

17 Jul 2017, 10:00

Yes, because the -mixed- model leads to the conclusion that there is essentially no variation among the random intercepts. So in this situation, it is fine to go to a one-level model.
Comment
Ian Maccie

Join Date: Jul 2017

Posts: 18
#14

17 Jul 2017, 10:07

just to be sure, so the correct code is

mixed var1 i.var2 var3 i.var4##c.var3

Thank you for your great help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#15

17 Jul 2017, 10:11

You could do that. Simpler, and equivalent, would be:

Code:

regress var1 i.var2 i.var4##c.var3

So two changes from what you wrote: -mixed- with only a single level specified is equivalent to -regress-. -regress- runs faster (though in a data set this size you won't perceive the difference), and is not subject to convergence issues or other numerical problems that -mixed- can encounter. Also, when you specify an interaction using ##, you do not need to also separately specify the constituent variables: Stata expands it for you.
Comment

Announcement