Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bivariate multilevel model

    Hello everyone, I am trying to implement a multilevel bivariate model to analyze the determinants of reading and mathematics academic performance. My two dependent variables are the scores on the reading test and the mathematics test. I am entering the following command.

    xtmixed Nrdtlectf Nrdmathf sexedeelv agedeélève diflcev1ir avoirfaim rvauxdome11ique ses Nbelvdanslécole nbensgt localisation genremaitre typeécole coursdesoutien || id_ecole : if ecaxte >1, covariance(unstructured)
    mfx compute
    est store eq1
    outreg2 [eq1] using Tableau1, mfx ctitle(mfx) replace see word excel

    The variables that have a significant and positive effect on reading have the opposite effect in mathematics. These results are quite puzzling. Thank you for helping me understand if I am entering the correct command.

    Click image for larger version

Name:	Capture1.PNG
Views:	1
Size:	158.2 KB
ID:	1741457

  • #2
    Default mixed is only capable of running a univariate model. Thus the second variable in your mixed command shows up as a predictor of the first variable (the outcome). You must use gsem instead. See this thread.

    Comment


    • #3
      You are using quite ancient syntax, so unless you have a very old copy of Stata, you should use -mixed- and -margins- instead of -xtmixed- and -mfx-.

      I disagree with Erik, though I overall endorse his suggestion to use -gsem- for the type of modeling discussed here because it s a great deal more flexible than what can be done in -mixed-. That said, there are some models that can be modelled equivalently in either frameowrk, and I'll suggest the book "Growth Modeling: Structural Equation and Multilevel Modeling Approaches" by Grimm, Ram and Estabrook as an excellent overview of growth modeling in both frameworks.

      If you will use -mixed-, then you will assume that each outcome have the same set of level 1 and level 2 predictors. Usually this is done with time as the repeated observations within indviduals to model growth, but it also works for distinct outcomes at a single time to model the inter-relationships between scores. This is not required with SEM. To perform multivariate modeling within the -mixed- framework, you will need to reshape your data into a suitable long format, which is to have one observation per unit (e.g., student) per outcome (e.g., math and lecture), and then create a new variable (e.g., type, numbered from 1 to K outcomes) to differentiate each of the outcomes.

      Once you have the data in a suitable format, here is one suggested syntax for -mixed-.

      Code:
      * assumes a 2-level hierarchy with person as the higher level and outcomes clustered within. You may consider interactions of i.type with your covariates.
      mixed outcome_score i.type <other covariates> || person_id : , nocons reml dfmethod(kr) cov(unstructured, t(type))
      I think this will extend to a 3-level version, but I don't know how stable that estimation would be.

      Code:
      * models schools at 3 level, then person then outcome
      mixed outcome_score i.type <other covariates> || school_id : || person_id : , nocons reml dfmethod(kr) cov(unstructured, t(type))

      Comment


      • #4

        I followed your recommendations and defined a variable "type" ranging from 1 to 61916, which corresponds here to each student's results. However, when I enter the command, Stata says "invalid name," and I don't understand where the problem lies.




        Click image for larger version

Name:	Capture 2.PNG
Views:	1
Size:	18.1 KB
ID:	1741508

        Comment


        • #5
          Edit: I erred in my previous post. Change -cov(….)- to -resid(….)- keeping the same contents inside the parentheses.

          Note, it’s generi generally more useful to copy and paste directly the output of Stata using the code tags, rather than using screenshots. This is explained in the FAQ.

          Comment


          • #6
            I enter the following code, but it returns an error saying "too many variables specified."

            HTML Code:
             mixed Nrdtlectf i.type diflcev1ir avoirfaim rvauxdome11ique langue1arlé || id_ecole : , nocons re
            > ml dfmethod(kr) resid(unstructured, t(type))

            Comment


            • #7
              Does it run with fewer covariates? I don’t think I’ve encountered this error before and it’s difficult to troubleshoot without a minimal data example.

              Comment


              • #8
                I think the problem is that you want the residual specification to be as follows:
                Code:
                residuals(independent, by(type))

                Comment


                • #9
                  Originally posted by Erik Ruzek View Post
                  I think the problem is that you want the residual specification to be as follows:
                  Code:
                  residuals(independent, by(type))
                  This is acceptable if you insist there should be no covariance between score types. Whether this makes sense in this context is not something I can say, but generally seems dubious if we are talking student performance.

                  Comment


                  • #10
                    Originally posted by Romuald Landry View Post
                    I enter the following code, but it returns an error saying "too many variables specified."

                    HTML Code:
                     mixed Nrdtlectf i.type diflcev1ir avoirfaim rvauxdome11ique langue1arlé || id_ecole : , nocons re
                    > ml dfmethod(kr) resid(unstructured, t(type))
                    I think the issue here is that you are adding a random intecept for school, but do not have one for student. You must have student at level 2 for the model to be sensible.

                    Try adjusting your model to something simpler first. Exclude all covariates and only consider student as a hierarchical level. You need to fill in your own variable name for -student_id-, this is just a placeholder.

                    Code:
                    mixed Nrdtlectf i.type || student_id : ,, resid(un, t(type)) reml dfmethod(kr)
                    If this works, you can expand the model from here. First I would add back covariates. Then if that model looks sensible, you can add in clustering at the school level (at level 3).

                    Here is a simple sketch of how to do it with 2 levels.

                    Code:
                    clear *
                    cls
                    
                    mkf Data
                    cwf Data
                    set obs 50
                    mat M = (3, 5)
                    mat Corr = (1,.3\.3,1)
                    mat SD = (2,4)
                    drawnorm y0 y1 , mean(M) corr(Corr) sd(SD) double
                    
                    gen `c(obs_t)' pid = _n
                    sort pid
                    
                    reshape long y , i(pid) j(type)
                    mixed y i.type || pid : , resid(un, t(type)) reml dfmethod(kr)

                    Comment


                    • #11
                      Leonardo Guizzetti has it in #10. I still believe that the most flexible approach is to use gsem, as we both stated earlier.

                      Update: You can see how you would specify the model in the SEM framework, whether gsem or sem, by slightly altering Leonardo's simulation. Namely, I increased the observations to 500 (SEM doesn't have small sample size corrections in the same way as mixed).
                      Code:
                      clear *
                      cls
                      
                      mkf Data
                      cwf Data
                      set obs 500
                      mat M = (3, 5)
                      mat Corr = (1,.3\.3,1)
                      mat SD = (2,4)
                      drawnorm y0 y1 , mean(M) corr(Corr) sd(SD) double
                      
                      gen `c(obs_t)' pid = _n
                      sort pid
                      
                      *Using sem
                      sem (RI -> y1@1 y0@1) (y1 <- ) (y0 <- ) , latent(RI) // RI = random intercept
                      
                      *Using mixed
                      reshape long y , i(pid) j(type)
                      mixed y i.type || pid : , resid(exchangeable, t(type)) // reml dfmethod(kr)
                      Mixed estimates a single residual variance vs. sem, which gives you test-specific variances. Mixed also gives you a covariance between residuals, which you can also get from sem, but note that here it is very small and imprecise (huge standard error). Accordingly, one probably wouldn't keep it in the model.
                      Last edited by Erik Ruzek; 30 Jan 2024, 13:29.

                      Comment


                      • #12
                        I entered the following codes, but I'm getting the same error saying "too many variables specified."

                        HTML Code:
                          mixed Nrdtlectf i.type diflcev1ir avoirfaim rvauxdome11ique langue1arlé || id_ecole : , nocons re
                        > ml dfmethod(kr) resid( id_eleve , t(type))
                        HTML Code:
                         mixed Nrdtlectf i.type || id_eleve : , resid(un, t(type)) reml dfmethod(kr)
                        I am new to bivariate multilevel modeling and I have data for one year (2019) on the reading and math scores of primary school students. I would like to implement a bivariate two-level multilevel model, with the student and school levels. The variable "id_eleve" represents the student identifier within the school, while "Nrdtlectf" and "Nrdtmathf" represent the reading and math test scores, respectively. I hope I have provided all the necessary information to receive assistance.

                        Comment


                        • #13
                          The third post of this thread, written by Leonardo Guizzetti elaborates on the steps you need to do to get your data in the shape necessary to run the mixed model. Most important is the following,
                          To perform multivariate modeling within the -mixed- framework, you will need to reshape your data into a suitable long format, which is to have one observation per unit (e.g., student) per outcome (e.g., math and lecture), and then create a new variable (e.g., type, numbered from 1 to K outcomes) to differentiate each of the outcomes.
                          .
                          We imagine your data is currently set up such that each student has one row and there are separate columns for math test score and reading test score. The code below, which is adapted from Leonardo's code in #10 first creates the wide data and then reshapes it:
                          Code:
                          clear *
                          cls
                          
                          mkf Data
                          cwf Data
                          
                          ** Create dataset for illustration purposes
                          set obs 10
                          gen sid = _n                         // schools
                          gen u_school = rnormal()    // school random effect
                          expand 5                             // number of students per school
                          mat M = (3, 5)                     // math and reading score means
                          mat Corr = (1,.3\.3,1)          // math and reading score correlation
                          mat SD = (2,4)                    // math and reading score standard deviations
                          drawnorm score0 score1, mean(M) corr(Corr) sd(SD) double
                          label variable score0 "math score"
                          label variable score1 "reading score"
                          foreach v of varlist score0 score1 {
                              replace `v' = `v' + u_school        // add in school random effect
                          }
                          gen `c(obs_t)' pid = _n
                          sort sid pid
                          
                          ** Your math and reading score variables need to be named something
                          **  like score0 and score1 for the reshape to work! Rename if necessary
                          
                          ** Reshape wide data to long so you have a single score variable and
                          **  a 0/1 indicator for subject (each student should have two rows)
                          reshape long score, i(pid) j(type)
                          label define test_type 0 "math" 1 "reading"
                          label values type test_type 
                          
                          ** Run the mixed model
                          mixed score i.type || sid: || pid : , resid(un, t(type)) reml dfmethod(kr)

                          Comment


                          • #14
                            I've entered the following command sequence, but I can't get any results, I don't know why.
                            HTML Code:
                             set obs 62934
                            gen sid = _n                        
                            gen u_school = rnormal()
                            matrix define M = J(2, 1, .)
                            matrix M[1, 1] = 36
                            matrix M[2, 1] = 35
                            matrix define SD = J(2, 1, .)
                            matrix M[2, 1] = 35
                            matrix SD[1, 1] = 30
                            matrix SD[2, 1] = 28
                            drawnorm score0 score1, mean(M) corr(Corr) sd(SD)
                            label variable Nrdmathf "math score"
                            label variable Nrdtlectf "reading score"
                            foreach v of varlist score0 score1 {
                                replace `v' = `v' + u_school        // add in school random effect
                            }
                            gen `c(obs_t)' pid = _n
                            sort sid pid
                            eshape long score, i(pid) j(type)
                            label define test_type 0 "math score" 1 " reading score"
                            label values type test_type
                            mixed score i.type || sid: || pid : , resid(un, t(type)) reml dfmethod(kr)
                            mixed score i.type langue sexedeelv diflcev1ir avoirfaim || sid: || pid : , resid(un, t(type)) reml dfmethod(kr)
                            > ml dfmethod(kr) resid(unstructured, t(type))
                            Click image for larger version

Name:	Capture 2.PNG
Views:	1
Size:	89.7 KB
ID:	1743231

                            Comment


                            • #15
                              Originally posted by Romuald Landry View Post
                              . . . I am trying to implement a multilevel bivariate model to analyze the determinants of reading and mathematics academic performance. My two dependent variables are the scores on the reading test and the mathematics test. I am entering the following command.

                              xtmixed Nrdtlectf Nrdmathf sexedeelv agedeélève diflcev1ir avoirfaim rvauxdome11ique ses Nbelvdanslécole nbensgt localisation genremaitre typeécole coursdesoutien || id_ecole : if ecaxte >1, covariance(unstructured)
                              Originally posted by Romuald Landry View Post
                              . . . I have data for one year (2019) on the reading and math scores of primary school students. . . . The variable "id_eleve" represents the student identifier within the school, while "Nrdtlectf" and "Nrdtmathf" represent the reading and math test scores, respectively.
                              Try this:
                              Code:
                              rename (Nrdtlectf Nrdmathf) sco#, addnumber(0)
                              reshape long sco, i(id_eleve) j(subj)
                              
                              mixed sco i.subj##i.(sexedeelv agedeélève diflcev1ir avoirfaim rvauxdome11ique ses) ///
                                  i.subj##c.(Nbelvdanslécole nbensgt) ///
                                  i.subj##i.(localisation genremaitre typeécole coursdesoutien) ///
                                  if ecaxte > 1 ///
                                  || id_ecole: || id_eleve: , noconstant covariance(unstructured, t(subj))
                              I'm guessing as to which of your predictors are categorical and which are continuous, and you can make the necessary changes to the factor variable notation.

                              Given the number of students that you've got, you'll undoubtedly have enough schools that you won't need any small-sample adjustment (i.e., reml dfmethod(kroger)) to your degrees of freedom.

                              Comment

                              Working...
                              X