Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation with mixed-effects model, marginal effects (mimrgns) & lincom

    I am using Stata 15.1.

    This post refers to mixed, margins, lincom, mi estimate and mimrgns (from SSC). I have achieved my aim using mixed, margins & lincom & have now decided to use multiple imputation with one of my variables, my post refers to attempts to do this using mi estimate: & mimrgns (SSC) & possibly lincom.


    I use a mixed-effect model with data in long-format. Data example using dataex follows at bottom of post.

    Specific pieces of code are pasted thoughout the post but the whole code is pasted at the bottom of the post.

    The aim is to model a laboratory test result (variable name: logRSLT) which decreases over time for two arms of a trial (XYstatus, stored as 0 or 1).
    logRSLT is collected on up to 3 times on each of two days (maximum total of 6 logRSLT per participant). Each RSLT has a corresponding time (salivatime_hr) as samples were taken at a variety of times, but around 08:00, 12:00 & 16:00.

    Therefore, each participant is represented by multiple observations, each observation represents a laboratory result taken at a specific time.

    Random effects are specific at the cluster (clusternum), participant (childid) and day (daytext) level.

    The mixed-effects model is:

    Code:
    mixed c.logRSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base
    After running this, I use the postestimation command margins to estimate RSLT at time 08:00,12:00 & 16:00 for each of trial arms X&Y and use coeflegend to identify appropriate tags for variables in the subsequent step.

    Code:
    margins, at(XYstatus=(0 1) salivatime_hr=(8 12 16)) vsquish post
    margins, coeflegend
     
     
    .         margins, at(XYstatus=(0 1) salivatime_hr=(8 12 16)) vsquish post
     
    Predictive margins                              Number of obs     =      3,962
     
    Expression   : Linear prediction, fixed portion, predict()
    1._at        : XYstatus        =           0
                   salivatime~r    =           8
    2._at        : XYstatus        =           0
                   salivatime~r    =          12
    3._at        : XYstatus        =           0
                   salivatime~r    =          16
    4._at        : XYstatus        =           1
                   salivatime~r    =           8
    5._at        : XYstatus        =           1
                   salivatime~r    =          12
    6._at        : XYstatus        =           1
                   salivatime~r    =          16
     
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             _at |
              1  |  -1.896906   .0276162   -68.69   0.000    -1.951033   -1.842779
              2  |  -1.972262   .0232784   -84.72   0.000    -2.017887   -1.926638
              3  |  -2.047619   .0278691   -73.47   0.000    -2.102242   -1.992997
              4  |    -1.9128   .0275298   -69.48   0.000    -1.966758   -1.858843
              5  |  -1.978063   .0231845   -85.32   0.000    -2.023504   -1.932623
              6  |  -2.043327   .0277203   -73.71   0.000    -2.097657   -1.988996
    ------------------------------------------------------------------------------
     
    .         margins, coeflegend
    > om.
     
    Predictive margins                              Number of obs     =      3,962
     
    Expression   : Linear prediction, fixed portion, predict()
     
    1._at        : XYstatus        =           0
                   salivatime~r    =           8
     
    2._at        : XYstatus        =           0
                   salivatime~r    =          12
     
    3._at        : XYstatus        =           0
                   salivatime~r    =          16
     
    4._at        : XYstatus        =           1
                   salivatime~r    =           8
     
    5._at        : XYstatus        =           1
                   salivatime~r    =          12
     
    6._at        : XYstatus        =           1
                   salivatime~r    =          16
     
    ------------------------------------------------------------------------------
                 |     Margin  Legend
    -------------+----------------------------------------------------------------
             _at |
              1  |  -1.896906  _b[1bn._at]
              2  |  -1.972262  _b[2._at]
              3  |  -2.047619  _b[3._at]
              4  |    -1.9128  _b[4._at]
              5  |  -1.978063  _b[5._at]
              6  |  -2.043327  _b[6._at]
    ------------------------------------------------------------------------------
    I then use lincom to calculate the ‘area under the curve’ using simple mathematics (trapezoid formula 0.5*base*(a+b)) for each of arms X & Y and give me the difference between these along with a 95%CI & p-value.

    Code:
    lincom ( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )
     
    .         lincom ( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )
     
     ( 1)  - 2*1bn._at - 4*2._at - 2*3._at + 2*4._at + 4*5._at + 2*6._at = 0
     
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |  -.0464077   .2628844    -0.18   0.860    -.5616517    .4688362
    ------------------------------------------------------------------------------
    This appears to work nicely.

    Now I decide I want to multiply impute one of my variables (wakeuphr).
    I first impute my data (mi set, mi register, mi impute)
    Then add mi estimate: to my mixed: command and change margins to mimrgns

    Code:
    mi estimate: mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base
    mimrgns, at(XYstatus=(0 1) salivatime_hr=(8 12 16)) vsquish post
    mimrgns, coeflegend
    Now I want to use lincom again and type (exactly as previously):

    Code:
    lincom ( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )
     
    .         lincom ( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )
     
     ( 1)  - 2*1bn._at - 4*2._at - 2*3._at + 2*4._at + 4*5._at + 2*6._at = 0
     
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |  -.0144249   .0386819    -0.37   0.709      -.09024    .0613901
    ------------------------------------------------------------------------------
    There is no error message, however on typing ereturn list I see that my e(df_min_mi) is 295802954464.1576 with a similarly large e(df_max_mi) and this concerns me. I am also aware that there is concern that this approach is inappropriate in view of the imputed data (e.g https://www.statalist.org/forums/for...estimate-stcox) and according to the help file for mimrgns (SSC)


    There are two questions that flow from the way I have structured my question:
    1. Could you advise on the validity of using lincom in this circumstance following mimrgns?
    2. Is there a way to incorporate the desired command into mi estimate? I have tried variations of:
    Code:
    mi estimate (( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )), coeflegend: mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base
    But Stata, reasonable, does not understand my tags. Error message is:
    incorrectly specified expression (( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) ) - ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )):
    [4._at] not found
    an error occurred when mi estimate executed nlcom on m=1
    Finally, question 3 is - perhaps you know a much better way of achieving my intended goal. Please do share if so.


    Thank you for your time.

    Sunil Bhopal



    DATA EXAMPLE:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float childid2 byte XYstatus float wakeuphr int clusternum double RSLT float salivatime_hr
    1 1 6.5 901  .14225       8.5
    1 1 6.5 901  .16973      12.5
    1 1   6 901  .16748         9
    1 1   6 901  .21777      11.5
    2 1   6 901  .16858  8.333333
    2 1   6 901  .13563 11.583333
    2 1   6 901  .12135 15.666667
    2 1 6.5 901 .083026      8.85
    2 1 6.5 901  .33422     11.25
    2 1 6.5 901  .16747     15.75
    end
    label values XYstatus arms3
    label def arms3 1 "arm y", modify
    WHOLE CODE WITHOUT IMPUTATION:
    Code:
    mi estimate: mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base
                    mimrgns, at(XYstatus=(0 1) salivatime_hr=(8 12 16)) vsquish post
                    mimrgns, coeflegend // I had to do this to get the name tags used by stata to store the resulting mean cortisol volumes (usually starting with _b[…]). Use these tags to run below lincom.
     
    lincom ( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )
    WHOLE CODE WITH IMPUTATION:
    Code:
    mi set mlong
    mi register imputed wakeuphr
    set seed 14214 //generated using random in excel on 26jan18 at 10am
    mi impute chained (regress) wakeuphr=day clusternum RSLT salivatime_hr childid2, add(20)
    mi estimate: mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base
    mimrgns, at(XYstatus=(0 1) salivatime_hr=(8 12 16)) vsquish post
    mimrgns, coeflegend // I had to do this to get the name tags used by stata to store the resulting mean cortisol volumes (usually starting with _b[…]). Use these tags to run below lincom.
    lincom ( 0.5*4*(_b[4._at]+_b[5._at]) + 0.5*4*(_b[5._at]+_b[6._at]) )    -    ( 0.5*4*(_b[1bn._at]+_b[2._at]) + 0.5*4*(_b[2._at]+_b[3._at]) )

  • #2
    Just a quick comment, to begin with: you definitely do not need to call

    Code:
    mimrgns , coeflegend
    That code is not supposed to mess things up (although it wipes out most of the r() results left behind by mimrgns) but you can just omit it. The coefficients are already accessible in _b[] when mimrgns (with the post option) has finished.

    Unfortunately, I have no idea whether lincom would be appropriate with mimrgns; it does not seem to be with mi estimate and you need the post option (which is warned against in the manual) to get it to work. I also have no clue how lincom works, internally.

    If you believe that the results from lincom could be combined via Rubin's rules, you might be better off, writing your own wrapper program that calls mixed, margins and lincom.

    Sorry, I cannot be of greater help help here. Perhaps others have comments?

    Best
    Daniel

    Comment


    • #3
      Sunil, the code below broke because there is no coefficient called _at. You will need to express to the quantities you want to estimate in terms of the coefficients. After one mi estimate run, you can type

      Code:
      mi estimate, coeflegend
      to verify the coefficient names. But I think some variation of this will work:

      Code:
      mi estimate (( 0.5*4* (8*_b[salivatime_hr] + 12 * _b[salivatime_hr]) - 0.5*4* (_b[1.XYstatus] + 8*(_b[salivatime_hr] + _b[1o.XYstatus#co.salivatime_hr]) + 12 * (_b[salivatime_hr] + _b[1o.XYstatus#co.salivatime_hr]) )), coeflegend: mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base
      I know this is incredibly tedious, but unfortunately everything is harder with multiple imputation. If the betas and p-values don't change much relative to unimputed data, then perhaps there is no need to use mi estimate to compute the area under the curve?

      Side, note: reading your -mixed- syntax, I did see that you specified a regression where observations are nested within clusters, which are nested within children, which are nested within days. Is that correct? First, it would normally seem like you would have kids nested within clusters, assuming "cluster" refers to something like a clinic. Second, while I am not familiar with crossed designs, do you really mean for clusters to be nested in days? It strikes me like days would be more like a crossed random effect. Excuse me if your syntax is correct, since I don't know your data well enough to comment.
      Last edited by Weiwen Ng; 29 Jan 2018, 10:42.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        daniel klein, thanks very much for your rapid response. Will await any other input - but perhaps as you say, writing a wrapper program will be the way to go if I for some reason I can't get Weiwen Ng's suggestion to work.

        Weiwen Ng, thank you very much for pointing out the way of accessing coefficient names. Your suggested code may well work & I will spend some time now trying to get it right - and report back. A quick one on the hierarchy of the random effects - you're right that I want to have days nested within children within clusters (these are geographical communities). From help mixed:

        Code:
         If the data are organized by a series of nested groups, for example, classes within schools, then the random-effects structure is
            specified by a series of equations, each separated by ||.  The order of nesting proceeds from left to right.  For our example, this
            would mean that an equation for schools would be specified first, followed by an equation for classes.  As an example, consider
        
                . mixed f_p || school: z1, cov(un) || class: z1 z2 z3, nocons cov(ex) options
        
            where variables school and class identify the schools and classes within schools, respectively.
        Perhaps I've misunderstood - definitely don't want to get this wrong! Thanks again.

        Comment


        • #5
          Originally posted by Sunil Bhopal View Post
          daniel klein, thanks very much for your rapid response. Will await any other input - but perhaps as you say, writing a wrapper program will be the way to go if I for some reason I can't get Weiwen Ng's suggestion to work.

          Weiwen Ng, thank you very much for pointing out the way of accessing coefficient names. Your suggested code may well work & I will spend some time now trying to get it right - and report back. A quick one on the hierarchy of the random effects - you're right that I want to have days nested within children within clusters (these are geographical communities). From help mixed:

          Code:
          If the data are organized by a series of nested groups, for example, classes within schools, then the random-effects structure is
          specified by a series of equations, each separated by ||. The order of nesting proceeds from left to right. For our example, this
          would mean that an equation for schools would be specified first, followed by an equation for classes. As an example, consider
          
          . mixed f_p || school: z1, cov(un) || class: z1 z2 z3, nocons cov(ex) options
          
          where variables school and class identify the schools and classes within schools, respectively.
          Perhaps I've misunderstood - definitely don't want to get this wrong! Thanks again.
          I forgot about the direction of nesting, you are correct that cluster goes before kids.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Weiwen Ng - nearly there! You used the prefix 1o. and co. in the interaction term. This is new to me & I can't see why it's needed here. Can you expand?

            Code:
             
             _b[1o.XYstatus#co.salivatime_hr]

            Comment


            • #7
              Originally posted by Sunil Bhopal View Post
              Weiwen Ng - nearly there! You used the prefix 1o. and co. in the interaction term. This is new to me & I can't see why it's needed here. Can you expand?

              Code:
              _b[1o.XYstatus#co.salivatime_hr]
              That looks like a typo on my part! The o. operator tells Stata to omit a category, which you do not want to do. More information available in the help:

              Code:
              help fvvarlist
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                With thanks to all for help. The following is how I made an area under the curve for two groups & compared them in a multiply imputated dataset using multilevel mixed-effects linear regression . Unwieldly but it works!

                Code:
                mi estimate, coeflegend: mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || childid: || daytext:, base //just to see coefficient names
                      
                 mi estimate (ratio: /// //note: making AUC assuming wakeup time of 6am for all children
                        ( (0.5*4*((_b[_cons]+6*_b[wakeuphr]+_b[1.XYstatus]+ (8*(_b[salivatime_hr]+_b[1.XYstatus#c.salivatime_hr])) + ///
                        (_b[_cons]+ 6*_b[wakeuphr]+_b[1.XYstatus]+(12*(_b[salivatime_hr]+_b[1.XYstatus#c.salivatime_hr])))))) + /// //first group 1 trapezoid - from time 8am to 12noon
                        ( (0.5*4*((_b[_cons]+6*_b[wakeuphr]+_b[1.XYstatus#c.salivatime_hr]+_b[1.XYstatus]+12*_b[salivatime_hr]) + /// 
                        (_b[_cons]+6*_b[wakeuphr]+_b[1.XYstatus#c.salivatime_hr]+16*_b[salivatime_hr]))))) - /// //second group 1 trapezoid - from time 12noon to 4pm
                        ( (0.5*4*((_b[_cons]+6*_b[wakeuphr]+8*_b[salivatime_hr]) + (_b[_cons]+6*_b[wakeuphr]+12*_b[salivatime_hr]))) + /// //first group 0 trapezoid - from time 8am to 12noon
                        (0.5*4* ( (_b[_cons]+6*_b[wakeuphr]+12*_b[salivatime_hr]) + (_b[_cons]+6*_b[wakeuphr]+16*_b[salivatime_hr]))))): mixed c.RSLT i.XYstatus##c.salivatime_hr c.wakeuphr || clusternum: || /// //second group 0 trapezoid from time 12noon to 4pm & model command
                childid: || daytext:, base
                A more generic version which can be adapted to your needs (deleted the fixed effect of wakeup as this is quite particular to my study)


                Code:
                mi estimate, coeflegend: mixed c.RSLT i.XYstatus##c.salivatime_hr || clusternum: || childid: || daytext:, base //coeflegend allows you to see coefficient names
                
                mi estimate (ratio: /// //remember! AUC formula is 0.5*base(side1+side2). We have 
                        ( (0.5*BASE*((_b[_cons]+_b[1.GROUP1]+ (TIME POINT1*(_b[INTERACTION TERM(SLOPE)]+_b[1.GROUP1#FIXED EFFECT OF SLOPE])) + ///
                        (_b[_cons]+ _b[GROUP1]+(12*(_b[SLOPE]+_b[1.GROUP1#FIXED EFFECT OF SLOPE])))))) + ///
                        ( (0.5*4*((_b[_cons]+_b[1.GROUP1#FIXED EFFECT OF SLOPE]+_b[1.GROUP1]+12*_b[SLOPE]) + ///
                        (_b[_cons]+_b[1.GROUP1#FIXED EFFECT OF SLOPE]+16*_b[SLOPE]))))) - ///
                        ( (0.5*4*((_b[_cons]+8*_b[SLOPE]) + (_b[_cons]+12*_b[SLOPE]))) + ///
                        (0.5*4* ( (_b[_cons]+12*_b[SLOPE]) + (_b[_cons]+16*_b[SLOPE]))))): mixed c.RSLT i.GROUP##c.salivatime_hr || clusternum: || childid: || daytext:, base
                Final word for anyone trying to do this in future - this code is unwieldly, the process timeconsuming & it's easy to make errors!

                Comment

                Working...
                X