Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mixed for Group-randomized Trial Analysis

    Apologies for the long email. Sorry to say so, but I am trying to replicate some SAS code in Stata (trying to teach some students that Stata can do all SAS can wrt to our desired models).

    Our data were generated from a group-randomized trial where students are nested within schools, and schools are nested within treatment condition such that a whole school is randomized to either treatment or control condition. There are two time points, pre and post treatment. The outcome variable is math score.

    One may view the data in two ways. First is what we call a member cross-section. Here we do NOT link a particular student at baseline (time=0) to their assessment at follow-up (time=1). Instead we assume the schools are the same but students within a school change, either due to sampling or perhaps graduation (e.g, senior class in 2011 and 2012).

    Apart for some formatting and other details (e.g., ML v REML), the well-understood SAS code is

    * member XS analysis;
    proc mixed;
    class cond school time;
    model math = cond time cond*time /s;
    random int time/subject=school(cond);
    run;


    Basically this simply regresses the outcome variable, math, on condition and time effects, and their interaction. We have a random effect for schools nested within condition, and time. We treat cond, school, and time as factor variables.

    In Stata (v13), the following will yield identical results save for degrees of freedom for parm estimates. For what it’s worth, Stata relies in the Z distribution but we rarely have so many groups/cluster in a public health interventions. Accordingly, one must use a post-estimation command such as lincom or margins to manually specify degrees of freedom for effect estimates, for example df(17). Regardless, the paramater estimates are correct if we type

    * member XS analysis
    mixed math cond##time || school: ||time:, reml



    It’s the second approach to the data that is vexing me in Stata. Here we DO want to link a particular person from baseline to follow—up. We call this a member cohort analysis. Again, the relevant SAS code is


    * Member cohort approach #1;
    proc mixed;
    class cond school time;
    format time timef. cond condf.;
    model math = cond time cond*time /s;
    random int time/subject=school(cond);
    random int/subject=id(school*cond) ;
    run;

    Would could also run the model this way, which will yield identical results

    * Member cohort approach #2;
    proc mixed;
    class cond school time;
    format time timef. cond condf.;
    model math = cond time cond*time /s;
    random int time/subject=school(cond);
    repeated time/subject=id(school*cond) type=cs;
    run;


    Notice the only change is the extra line at bottom which tells SAS that subjects, denoted by id, have repeated observations, and are nested within schools which are nested in condition.

    I’ve searched wide and far and did a bunch of trial and error. I cannot figure out how to get these results from Stata.

    How can I get mixed (or xtmixed) to recognize repeated observations on particular subjects over time and recognize that such subjects are nested within schools and condition?


    Thanks in advance - Michael (UMN Epidemiology)

  • #2
    Well assuming you have a variable identifying students, call it student, wouldn't it just be:

    Code:
    mixed math cond##time || school: || id:
    Am I missing something here?

    Comment


    • #3
      I meant:

      Code:
      mixed math cond##time || school: || student:

      Comment


      • #4
        Thanks much, Clyde. Your proposed code is what I thought too. It is close but not quite right. For exact same data, the SAS member cohort model yields

        Coef = 9.88 SE = 6.76 <- SAS

        My/your Stata model yields

        Coef = 9.25 SE = 4.80 <- Stata

        I don't think it's rounding or algorithm error. I can get the member cross-section results to match to 3-4 decimal places.

        Thoughts?

        Comment


        • #5
          Your PROC MIXED model has a random slope for time:

          random int time/subject=school(cond);

          Would the following Stata code get you closer to your SAS coefficient?
          Code:
          mixed math cond##time || school: time || id: , reml
          Last edited by Joseph Coveney; 21 Apr 2014, 19:14.

          Comment


          • #6
            Time is class (i.e., factor) in his SAS code, so including time like Joseph did won't replicate the analysis (I think). I believe if you create a couple of dummy variables representing time1 and time 2, and then do something like this: (I don't have any toy data to test this on so this is a bit of guess...)

            Code:
            mixed math cond##time || school: || school: time1 time2, nocons cov(id) || id:, reml
            I'm pretty sure that the second "school command" above is identical to "|| time: " in the original post.

            You can also use the residual option to mimic the repeated command. If the original poster can provide an example dataset, I can show how to do that.

            Best,
            Scott

            Comment


            • #7
              We have a winner and his name is Scott Balwin!

              As per directions, I created a dummy for time1 and time2 and fit Scott's model. The results from Stata are

              coeff = 9.88 SE = 6.76

              which is the same as SAS to the third decimal. Further, all of the variance components match to at least 3 decimals.

              Scott, could you explain to us what the model means? I'm a bit embarrassed to say that I've played with Stata (xt)mixed for years and never would have come up with your code. I tried the residual option but no go for me. Anyway, perhaps parse out the above code and the residual alternative for the group? I'm sure many would be grateful.

              FWIW, here is the data I've been playing with. Please note it's derived from a real trial but tweaked and modified for teaching purposes. Just look on the left side of the webpage, toward bottom of Data section. I labelled it Stata List data.

              https://sites.google.com/a/umn.edu/o...home/pubh-6363

              I'll leave the data up for awhile in case folks want to experiment.

              Thank you, everyone, especially Scott!

              Comment


              • #8
                Thanks for posting your data. As you have your SAS code written your model is something like:

                y_ijk = fixed effects + school_k + time_j + e_ijk

                You have a random deflection from school and another for time. (you aren't getting
                random growth curves as that would require the school by time interaction in the random
                effects). I think the the way your wrote your first SAS code can make this confusing.
                Try rewriting it like:

                Code:
                proc mixed;
                class cond school time;
                model math = cond time cond*time /s;
                random school time;
                run;
                I believe you should get the same answer as your other code.

                In any case, since you treat time like a factor variable, we need to represent it like a
                factor variable in Stata.

                There are a couple of ways to do that and they are all different ways of setting up the design
                matrix for random effects (usually called the Z matrix).

                First, you have what I sent, which manually creates the levels of the factor variable and then
                forces the random effect variances to be equal using the "cov(id)" option:

                Code:
                tabulate time, gen(time)
                mixed math cond##time || school:  || school: time1 time2, nocons cov(id) || id: , reml
                This way of writing the model is a bit confusing (much like the first SAS code you sent). Stata has a way
                of accomodating factor variables in the random effects (see the manual entry on crossed random effects).

                So you could rewrite the model like:

                Code:
                mixed math cond##time || school: ||  _all:R.time || id: ,  reml
                The syntax for time automatically creates what we created manually above. I prefer to write this model like:

                Code:
                mixed math cond##time || _all:R.school: ||  _all:R.time || id: ,  reml
                As that just makes it clearer to me.

                There isn't anything magic about the time variable that makes "|| school: time1 time2, nocons cov(id) "
                formulation work. You can do it for the random intercept for school. Note the following two models are
                equivalent:

                Code:
                tabulate school, gen(school)
                
                mixed math cond##time || school: , reml
                mixed math cond##time || school: school1-school20, nocons cov(id) reml
                The first is just uses a more efficient way of setting up the model.

                Lastly, you were wondering how to use the residuals option. The residuals option is like the
                REPEATED option in SAS. Let's take the model I sent previously. You could do:

                Code:
                mixed math cond##time || school:  || school: time1 time2, nocons cov(id) || id: , nocons residuals(exchangeable) reml
                Note you have to specify id:, nocons so that residuals "knows" where to look for repeated
                measures. It is kind of like specifying the subject option in SAS. the exchangeable structure
                is the compound symmetry structure.

                Anyway, I hope that is useful. Some of this is hard to communicate without writing the equations and matrices.

                Best,
                Scott

                Comment


                • #9
                  I'm curious: is it common for time to be included as one of the factor-variable random effects in a cross-classified random effects model? I recall reading somewhere that econometricians sometimes include time as such a random effect in panel models. I think that it was supposed to have something to do with generalizing beyond the time interval of the data set, but I'm not really sure. Is it common in other fields? If so, what's the objective?

                  Comment


                  • #10
                    I knew I had read it somewhere. "In [ordinal time as a random slope], we assume that the effect due to week is . . . pig specific . . .; in [time-as-factor-variable random effect in a cross-classified random effects model], we assume that the effect due to week . . is systematic to that week and common to all pigs." (Stata Multilevel Mixed Effects Reference Manual. Release 13, p. 318)

                    Comment


                    • #11
                      To follow up on Scott's reply, note that you can use mixed's R. notation to specify random effects of a factor variable time at the school level. That is,

                      Code:
                      school: time1 time2, noconstant  covariance(identity)
                      is precisely

                      Code:
                      school: R.time
                      so that the final command is
                      Code:
                      mixed math cond##time || school:  || school: R.time || id: , reml
                      You can even combine the two school equations into one by using the trick with an exchangeable covariance matrix:
                      Code:
                      mixed math cond##time || school: R.time, covariance(exchangeable) || id: , reml
                      In the specification above, the variance component for time will be the difference between the estimated variance and covariance parameters of the exchangeable covariance structure, var(R.time)-cov(R.time). This specification should be slightly more efficient than repeating the random-effects levels but requires additional manipulations to extract the variance component of time.

                      You can also read about other efficient ways of using xtmixed (renamed to mixed in Stata 13) in the following article:

                      http://www.stata-journal.com/sjpdf.h...iclenum=st0095

                      Comment


                      • #12
                        Joseph asked:

                        "I'm curious: is it common for time to be included as one of the factor-variable random effects in a cross-classified random effects model?"

                        I use that formulation the most in psychometric analyses where I need variance components for various factors -- and the factors are nearly always fully crossed in those studies. Other than that, I don't use the formulation much except for in very specific designs (like the one in the original post).

                        Best,
                        Scott


                        Comment


                        • #13
                          Thank you Scott and Yulia; you've both been a big help. And thank you Michael for making the data set available to run Scott's and Yulia's example code on. It's been edifying.

                          Comment

                          Working...
                          X