Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random slope and random intercept mixed linear model

    Can anyone please help with the following?

    I have a rather small but complex dataset consisting these variables of interest
    Var1 = A continuous variable from 0 to approximately 7
    Var2 = A categorical variable from 1 to 3, where 1 acts as the reference for the value 2 and 3
    Var3 = A continuous variable from 5 to approximately 45
    Var4 = ID of the included patients goes from 1 to 10

    Can anyone please inform me
    • how to recode the xtmixed so I can have both random slope and intercept in the models in which var2 is nested within var4?
    • how to calculate the mixed linear regression equation with 95 % confidence interval for the slope and the corresponding p-value for these three equations (I know the random effects are not listed in equations but it is rather because I don’t know how to calculate them)?
      • Var1 = intercept + var3 if var2 == 1
      • Var1 = intercept + var3 if var2 == 2
      • Var1 = intercept + var3 if var2 == 3
    • how to interpret the random effects parameters box?
    The output is as following:

    xtmixed c.var1 c.var3 i.var2|| var4: || var2:

    Performing EM optimization:

    Performing gradient-based optimization:

    Iteration 0: log likelihood = -186.78944
    Iteration 1: log likelihood = -186.29506
    Iteration 2: log likelihood = -186.29315
    Iteration 3: log likelihood = -186.29313

    Computing standard errors:

    Mixed-effects ML regression Number of obs = 109

    -------------------------------------------------------------
    | No. of Observations per Group
    Group Variable | Groups Minimum Average Maximum
    ----------------+--------------------------------------------
    Var4 | 10 5 10.9 19
    Var2 | 28 1 3.9 8
    -------------------------------------------------------------

    Wald chi2(3) = 100.09
    Log likelihood = -186.29313 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------
    Var1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Var3 | .0451614 .014722 3.07 0.002 .0163068 .074016
    |
    Var2 |
    2 | -1.765215 .3077596 -5.74 0.000 -2.368413 -1.162018
    3 | -2.999844 .3132985 -9.58 0.000 -3.613898 -2.38579
    |
    _cons | 4.157045 .3764994 11.04 0.000 3.41912 4.894971
    ------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    Var4: Identity |
    sd(_cons) | .1781771 .248025 .0116404 2.727322
    -----------------------------+------------------------------------------------
    Var2: Identity |
    sd(_cons) | 1.67e-06 7.87e-06 1.62e-10 .0171642
    -----------------------------+------------------------------------------------
    sd(Residual) | 1.325808 .0939577 1.153871 1.523364
    ------------------------------------------------------------------------------
    LR test vs. linear model: chi2(2) = 0.16 Prob > chi2 = 0.9226

    Note: LR test is conservative and provided only for reference.




  • #2
    I find your post confusing, and you describe your desired model in terms that appear to contradict each other.

    For a starting point, the combination of three separate equations:
      • Var1 = intercept + var3 if var2 == 1
      • Var1 = intercept + var3 if var2 == 2
      • Var1 = intercept + var3 if var2 == 3]
    has nothing to do with mixed models: it's just an interaction between var3 and var2 and can be captured simply as

    Code:
    regress var1 i.var2##c.var3
    However, since you have repeated measures, you need to use some model that properly accounts for that. So a mixed model, with a random intercept at the id: level since each id was measured on more than one occasion (or at least I think that's what you mean--is that right?) So that ups the ante a bit to

    Code:
    mixed var1 i.var2##c.var3 || var4:
    Note that since version 14, -xtmixed- has been renamed -mixed-.

    Your post title refers to wanting random slopes as well, but nothing in the description of the text explains why you want that, and I fear that you think those three equations I quoted above constitute random slopes. They do not. You don't need random slopes to get those three equations: you just need the interaction term I showed above. If you genuinely want the slope of var3 to be random across individuals in addition to depending on the level of var2, then the code is:

    Code:
    mixed var1 i.var2##c.var3 || var4: var3
    Be sure you understand the meaning of this model before you adopt it. It means that the slope of var1 on var3 has a expected value that depends on the value of var2. In addition, the individual value of that slope varies among the persons, coming from a normal distribution centered around the var2-dependent expected value and with a standard deviation (variance) to be estimated from the data. That takes things too a level of ramification beyond just your original three equations.

    Comment


    • #3
      Dear Clyde,

      Thank you for your great input and sorry for my little confusion terminology (I am not an expert within this field).

      It is correct that I have repeated measures. Within each of the three categories in var2, I have at least three repeated measures of var1. I have noticed, that xtmixed has changed to mixed.

      Regarding the interaction, it is now clear that I need it in the model. When I look at the scattered data (var1 as a function of var3) and categorize the scatter-dots according to var2, the scatters of each category tend towards different slopes and definitely different intercepts. Is this a fair reason to have random intercepts and slopes? Var2 varies randomly among patients in var4 and the slope of the continuous variable var3 seems to depend on var2.

      Given the previous mentioned can I use this code for stata?


      Code:
      mixed var1 i.var2##c.var3 ||var4: || var2: var3
      If yes, this is the result:




      HTML Code:
      . mixed var1 i.var##c.var3|| var4: || var2: var3
      
      Performing EM optimization: 
      
      Performing gradient-based optimization: 
      
      Iteration 0:   log likelihood = -186.77849  
      Iteration 1:   log likelihood = -185.86858  
      Iteration 2:   log likelihood = -185.85497  
      Iteration 3:   log likelihood = -185.85373  
      Iteration 4:   log likelihood = -185.85373  
      
      Computing standard errors:
      
      Mixed-effects ML regression                     Number of obs     =        109
      
      -------------------------------------------------------------
                      |     No. of       Observations per Group
       Group Variable |     Groups    Minimum    Average    Maximum
      ----------------+--------------------------------------------
            pignumber |         10          5       10.9         19
           tissuetype |         28          1        3.9          8
      -------------------------------------------------------------
      
                                                      Wald chi2(5)      =      83.07
      Log likelihood = -185.85373                     Prob > chi2       =     0.0000
      
      -----------------------------------------------------------------------------------------
      var1|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ------------------------+----------------------------------------------------------------
      var2|
         2|  -1.794695   .7666552    -2.34   0.019    -3.297311    -.292078
         3|  -2.717368   .8179917    -3.32   0.001    -4.320602   -1.114133
          |
      var3|   .0481726   .0241009     2.00   0.046     .0009357    .0954095
          |
      i.var2#c.var3 |
          2|   .0010644   .0357765     0.03   0.976    -.0690562    .0711851
          3|  -.0122471   .0359003    -0.34   0.733    -.0826103    .0581162
          |
      _cons |   4.083168   .5320625     7.67   0.000     3.040344    5.125991
      -----------------------------------------------------------------------------------------
      
      ------------------------------------------------------------------------------
        Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
      -----------------------------+------------------------------------------------
      var4: Identity             |
                        var(_cons) |   6.50e-12   1.23e-10      5.45e-28    77606.65
      -----------------------------+------------------------------------------------
      var2: Independent      |
                          var(var3) |   .0002543     .00043      9.26e-06     .006989
                        var(_cons) |   1.41e-14          .             .           .
      -----------------------------+------------------------------------------------
                     var(Residual) |   1.656782   .2719505      1.201008    2.285519
      ------------------------------------------------------------------------------
      LR test vs. linear model: chi2(3) = 0.77                  Prob > chi2 = 0.8557

      Comment


      • #4
        Regarding the interaction, it is now clear that I need it in the model. When I look at the scattered data (var1 as a function of var3) and categorize the scatter-dots according to var2, the scatters of each category tend towards different slopes and definitely different intercepts.
        No. This tells you that you need the interaction between var2 and var3. It doesn't really tell you whether random intercepts and slopes are helpful or not. Even if ultimately you include they are, your code is not correct. See what I suggested in #2 for correct code with random intercepts alone or random slopes and intercepts.

        As for deciding whether to use random slopes and intercepts, if there is no scientific theory in your area to go on, you can run the models with and without them and then look at the LR test vs linear model that comes at the end of the output to inform your decision making.

        Comment


        • #5
          If we assume that the presented are correct - just hypothetically - how would I be able to get the equation for each of the categories of var2 and the corresponding p-value?

          Comment


          • #6
            The intercepts of the equations for var2 = 1, 2, and 3 would be _b[_cons], _b[_cons]+_b[2.var2] and _b[_cons] +b]3.var2].

            The slopes of the equations for var2 = 1, 2, and 3 would be _b[var3], _b[var3] + _b[2.var2#c.var3], and _b[var3] + _b[3.var2#c.var3].

            You can use the -lincom- command to calculate these.

            I don't know what you mean by "the corresponding p-value?" Equations are fortunate in not being burdened with p-values.

            Comment


            • #7
              To the best of my knowledge the lincom function will give me the slopes within each category of var2. This is a part of what I asked for in the beginning of this thread, however, how can I test wheter these slopes with random effects are significantly different from each?

              And how do I calculate the intercept of each var2=1, 2 and 3?

              Comment


              • #8
                how can I test wheter these slopes with random effects are significantly different from each?
                So, after running the random effects model you can run
                Code:
                test 1.var2#c.var3 = 2.var2#c.var3 = 3.var2#c.var3
                And how do I calculate the intercept of each var2=1, 2 and 3?
                Answered in #6.

                Comment


                • #9
                  Thank you for your time and great help.

                  Comment


                  • #10
                    I have just reviewed the codes from #6 and can see, that these intercepts and slopes only includes the fixed part and no random effects at all. If i use the suggested

                    mixed var1 i.var2##c.var3 || var4: var3
                    do I simply have to add the sd(var3) and sd(residuals) to each of the _b[var3], _b[var3] + _b[2.var2#c.var3], and _b[var3] + _b[3.var2#c.var3] and add sd(_cons) to each of the _b[_cons], _b[_cons]+_b[2.var2] and _b[_cons] +b]3.var2] to get the equations within each category of var2 with random effects? Or am I wrong

                    Code:
                    . xtmixed var1 c.var2##i.var3|| var4: var3
                    
                    Performing EM optimization: 
                    
                    Performing gradient-based optimization: 
                    
                    Iteration 0:   log likelihood = -185.49682  
                    Iteration 1:   log likelihood = -185.26954  
                    Iteration 2:   log likelihood = -185.26789  
                    Iteration 3:   log likelihood = -185.26789  
                    
                    Computing standard errors:
                    
                    Mixed-effects ML regression                     Number of obs     =        109
                    Group variable: pignumber                       Number of groups  =         10
                    
                                                                    Obs per group:
                                                                                  min =          5
                                                                                  avg =       10.9
                                                                                  max =         19
                    
                                                                    Wald chi2(5)      =      99.12
                    Log likelihood = -185.26789                     Prob > chi2       =     0.0000
                    
                    -----------------------------------------------------------------------------------------
                                         var1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    ------------------------+----------------------------------------------------------------
                                          var3 |   .0498006   .0240906     2.07   0.039      .002584    .0970172
                                                 |
                                         var2 |
                                             2  |  -1.743623   .7677119    -2.27   0.023     -3.24831   -.2389351
                                             3  |  -2.657442    .817759    -3.25   0.001     -4.26022   -1.054663
                                                 |
                            i.var2#c.var3 |
                                             2  |  -.0014285   .0349997    -0.04   0.967    -.0700267    .0671696
                                             2  |  -.0116853   .0350165    -0.33   0.739    -.0803165    .0569458
                                                |
                                      _cons |   4.057095   .5313023     7.64   0.000     3.015762    5.098428
                    -----------------------------------------------------------------------------------------
                    
                    ------------------------------------------------------------------------------
                      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                    -----------------------------+------------------------------------------------
                    id: Independent       |
                                         sd(var3) |   .0164756   .0092438      .0054861    .0494785
                                       sd(_cons) |   4.41e-09   4.57e-08      6.59e-18     2.95261
                    -----------------------------+------------------------------------------------
                                    sd(Residual) |   1.286054   .1068056      1.092868     1.51339
                    ------------------------------------------------------------------------------
                    LR test vs. linear model: chi2(2) = 1.95                  Prob > chi2 = 0.3781
                    
                    Note: LR test is conservative and provided only for reference.

                    Comment


                    • #11
                      No. When you use random slopes, you are fitting a model in which there are, potentially, an infinite number of such equations. Each individual (or whatever the group of entities identified by var4 consists of) has its own equation. For each value of var2 (due to the var2#var3 interaction) there is a separate distribution of equations. The mean slopes and intercepts, conditional on var2, are given by the formulas in #6. Each individual regression line, conditional on var2, is drawn from a bivariate normal distribution of slopes and intercepts centered at those means, and with standard deviations given by sd(_cons) and sd(var3), respectively. The actual individual random effects (for both intercept and slope) can be estimated using -predict-

                      So, if you want the intercept and slope of each individual, you can use this code:
                      Code:
                      predict re_slope re_int, reffects
                      Then you can calculate the mean slope and intercept conditional on var2 as shown in #6, and add re_int to the mean intercept and re_slope to the mean slope. That will give you each individual's estimated intercept and slope. I don't know what you will use that for; I can't see I've ever seen anybody do that. But it can be done.

                      But let's step back and look at your outputs. The estimated grand mean of your outcome variable var1 when var2 = 0 and var3 = 0 is close to 4. And the coefficients of 2.var2 and 3.var2 are appreciable, relative to that: so they move the intercept from around 4 to around 2.3 and 1.4. So that seems like a meaningful separation of the mean intercepts. Notice that sd(_cons) is a really tiny number, 4.41x10-9. So even if some individual observation is a 10 sd outlier for its intercept, that will only change that individual intercept by around 4 in the 8th decimal place: not even close to a rounding error. So these random intercepts are not really meaningfully different once we know var2. The random intercept distribution is essentially a spike.

                      Now let's look at the slopes. When var2 = 1, the mean slope is about .0498. When var2 = 3 it's about .0484, and when var2 = 3 it's about .0381. Certainly the difference between the mean slope for var2 = 1 and when var2 = 2 is in rounding-error territory, and that's pretty close to true for var2 = 3 mean slope as well. Unless the variable var3 takes on extremely large values, so that small differences in these slopes scale up to appreciable differences in var1, these interactions are looking like they don't amount to much at all. And what does the random variation in slopes add to this picture? Well, the standard deviation of the slopes within each var2-defined group is about 0.0165. Your whole sample size is only 109, and so you probably have, on average, about 36 observations in each var2-defined group. So it is unlikely you have even one 3-sd deviant in each group. So 2 sd is 0.033. Thus in the var2=1 group, the range of slopes is probably from about .0498-0.033 to about 0.0498+0.033. In relative terms, that random component is pretty large.

                      So it seems to me that you could streamline your model by eliminating the i.var2#c.var3 interaction: its effects on the outcome are probably not detectable at all. Now, the estimates of the random intercept and slope standard deviations may change when you take out i.var2#c.var3. So I would re-run the model as:

                      Code:
                      mixed var1 i.var2 var3 || var4: var3
                      and then re-evaluate. I would not invest much effort into detailed work on the model shown in #10 as you will be focusing on the larvae of the ants crawling on the bark of one tree and missing the forest if you do.

                      Added: One additional remark. You only have 10 groups. That is a rather small sample of the group space. You cannot consider the random effects parameters to be well estimated at all. That is probably why that last line about LR test vs linear model gives such an anemic result. Even though the estimated variation of the slopes is large enough that it may well be of practical importance, estimating it with N = 10 gives you such imprecise estimates that Stata thinks you would be just as well off ignoring the variation altogether. I think it's a huge stretch to be doing this model with just N = 10 groups. If you think the distinctions among the 10 groups identified by var4 are important to your research goals, then I would be more inclined to go to a fixed effects model:

                      Code:
                      regress var1 i.var2 var3 i.var4##c.var3
                      Such a model will give you more unbiased estimation of the actual var1 = a + b*var3 equations for each of the 10 var4-groups than the -mixed- model is giving you. What you lose is that your results would not be generalizable to different var4-defined groups of individuals. But generalizing from a sample of 10 is always risky business.
                      Last edited by Clyde Schechter; 14 Jul 2017, 19:51.

                      Comment


                      • #12
                        Can I use the code below eventhough I have repeated measures of var1 at least three times for each category in i.var2? Will change of the regress to mixed command take care of the repeated measures issue?

                        regress var1 i.var2 var3 i.var4##c.var3

                        Comment


                        • #13
                          Yes, because the -mixed- model leads to the conclusion that there is essentially no variation among the random intercepts. So in this situation, it is fine to go to a one-level model.

                          Comment


                          • #14
                            just to be sure, so the correct code is

                            mixed var1 i.var2 var3 i.var4##c.var3

                            Thank you for your great help!

                            Comment


                            • #15
                              You could do that. Simpler, and equivalent, would be:

                              Code:
                              regress var1 i.var2 i.var4##c.var3
                              So two changes from what you wrote: -mixed- with only a single level specified is equivalent to -regress-. -regress- runs faster (though in a data set this size you won't perceive the difference), and is not subject to convergence issues or other numerical problems that -mixed- can encounter. Also, when you specify an interaction using ##, you do not need to also separately specify the constituent variables: Stata expands it for you.

                              Comment

                              Working...
                              X