Bivariate multilevel model

Romuald Landry

Join Date: Nov 2020

Posts: 43
#1

Bivariate multilevel model

29 Jan 2024, 14:56

Hello everyone, I am trying to implement a multilevel bivariate model to analyze the determinants of reading and mathematics academic performance. My two dependent variables are the scores on the reading test and the mathematics test. I am entering the following command.

xtmixed Nrdtlectf Nrdmathf sexedeelv agedeélève diflcev1ir avoirfaim rvauxdome11ique ses Nbelvdanslécole nbensgt localisation genremaitre typeécole coursdesoutien || id_ecole : if ecaxte >1, covariance(unstructured)
mfx compute
est store eq1
outreg2 [eq1] using Tableau1, mfx ctitle(mfx) replace see word excel

The variables that have a significant and positive effect on reading have the opposite effect in mathematics. These results are quite puzzling. Thank you for helping me understand if I am entering the correct command.
Tags: #bivariatemultilevelmodel
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#2

29 Jan 2024, 15:52

Default mixed is only capable of running a univariate model. Thus the second variable in your mixed command shows up as a predictor of the first variable (the outcome). You must use gsem instead. See this thread.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#3

29 Jan 2024, 16:53

You are using quite ancient syntax, so unless you have a very old copy of Stata, you should use -mixed- and -margins- instead of -xtmixed- and -mfx-.

I disagree with Erik, though I overall endorse his suggestion to use -gsem- for the type of modeling discussed here because it s a great deal more flexible than what can be done in -mixed-. That said, there are some models that can be modelled equivalently in either frameowrk, and I'll suggest the book "Growth Modeling: Structural Equation and Multilevel Modeling Approaches" by Grimm, Ram and Estabrook as an excellent overview of growth modeling in both frameworks.

If you will use -mixed-, then you will assume that each outcome have the same set of level 1 and level 2 predictors. Usually this is done with time as the repeated observations within indviduals to model growth, but it also works for distinct outcomes at a single time to model the inter-relationships between scores. This is not required with SEM. To perform multivariate modeling within the -mixed- framework, you will need to reshape your data into a suitable long format, which is to have one observation per unit (e.g., student) per outcome (e.g., math and lecture), and then create a new variable (e.g., type, numbered from 1 to K outcomes) to differentiate each of the outcomes.

Once you have the data in a suitable format, here is one suggested syntax for -mixed-.

Code:

* assumes a 2-level hierarchy with person as the higher level and outcomes clustered within. You may consider interactions of i.type with your covariates. mixed outcome_score i.type <other covariates> || person_id : , nocons reml dfmethod(kr) cov(unstructured, t(type))

I think this will extend to a 3-level version, but I don't know how stable that estimation would be.

Code:

* models schools at 3 level, then person then outcome mixed outcome_score i.type <other covariates> || school_id : || person_id : , nocons reml dfmethod(kr) cov(unstructured, t(type))
Comment
Romuald Landry

Join Date: Nov 2020

Posts: 43
#4

30 Jan 2024, 04:17

I followed your recommendations and defined a variable "type" ranging from 1 to 61916, which corresponds here to each student's results. However, when I enter the command, Stata says "invalid name," and I don't understand where the problem lies.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#5

30 Jan 2024, 07:05

Edit: I erred in my previous post. Change -cov(….)- to -resid(….)- keeping the same contents inside the parentheses.

Note, it’s generi generally more useful to copy and paste directly the output of Stata using the code tags, rather than using screenshots. This is explained in the FAQ.
Comment
Romuald Landry

Join Date: Nov 2020

Posts: 43
#6

30 Jan 2024, 11:30

I enter the following code, but it returns an error saying "too many variables specified."

HTML Code:

mixed Nrdtlectf i.type diflcev1ir avoirfaim rvauxdome11ique langue1arlé || id_ecole : , nocons re > ml dfmethod(kr) resid(unstructured, t(type))
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#7

30 Jan 2024, 11:43

Does it run with fewer covariates? I don’t think I’ve encountered this error before and it’s difficult to troubleshoot without a minimal data example.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#8

30 Jan 2024, 11:46

I think the problem is that you want the residual specification to be as follows:

Code:

residuals(independent, by(type))
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#9

30 Jan 2024, 11:51

Originally posted by Erik Ruzek View Post

I think the problem is that you want the residual specification to be as follows:

Code:

residuals(independent, by(type))

This is acceptable if you insist there should be no covariance between score types. Whether this makes sense in this context is not something I can say, but generally seems dubious if we are talking student performance.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#10

30 Jan 2024, 11:56

Originally posted by Romuald Landry View Post

I enter the following code, but it returns an error saying "too many variables specified."

HTML Code:

mixed Nrdtlectf i.type diflcev1ir avoirfaim rvauxdome11ique langue1arlé || id_ecole : , nocons re > ml dfmethod(kr) resid(unstructured, t(type))

I think the issue here is that you are adding a random intecept for school, but do not have one for student. You must have student at level 2 for the model to be sensible.

Try adjusting your model to something simpler first. Exclude all covariates and only consider student as a hierarchical level. You need to fill in your own variable name for -student_id-, this is just a placeholder.

Code:

mixed Nrdtlectf i.type || student_id : ,, resid(un, t(type)) reml dfmethod(kr)

If this works, you can expand the model from here. First I would add back covariates. Then if that model looks sensible, you can add in clustering at the school level (at level 3).

Here is a simple sketch of how to do it with 2 levels.

Code:

clear * cls mkf Data cwf Data set obs 50 mat M = (3, 5) mat Corr = (1,.3\.3,1) mat SD = (2,4) drawnorm y0 y1 , mean(M) corr(Corr) sd(SD) double gen `c(obs_t)' pid = _n sort pid reshape long y , i(pid) j(type) mixed y i.type || pid : , resid(un, t(type)) reml dfmethod(kr)
1 like
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#11

30 Jan 2024, 11:59

Leonardo Guizzetti has it in #10. I still believe that the most flexible approach is to use gsem, as we both stated earlier.

Update: You can see how you would specify the model in the SEM framework, whether gsem or sem, by slightly altering Leonardo's simulation. Namely, I increased the observations to 500 (SEM doesn't have small sample size corrections in the same way as mixed).

Code:

clear * cls mkf Data cwf Data set obs 500 mat M = (3, 5) mat Corr = (1,.3\.3,1) mat SD = (2,4) drawnorm y0 y1 , mean(M) corr(Corr) sd(SD) double gen `c(obs_t)' pid = _n sort pid *Using sem sem (RI -> y1@1 y0@1) (y1 <- ) (y0 <- ) , latent(RI) // RI = random intercept *Using mixed reshape long y , i(pid) j(type) mixed y i.type || pid : , resid(exchangeable, t(type)) // reml dfmethod(kr)

Mixed estimates a single residual variance vs. sem, which gives you test-specific variances. Mixed also gives you a covariance between residuals, which you can also get from sem, but note that here it is very small and imprecise (huge standard error). Accordingly, one probably wouldn't keep it in the model.

Last edited by Erik Ruzek; 30 Jan 2024, 12:29.
1 like
Comment
Romuald Landry

Join Date: Nov 2020

Posts: 43
#12

30 Jan 2024, 13:58

I entered the following codes, but I'm getting the same error saying "too many variables specified."

HTML Code:

mixed Nrdtlectf i.type diflcev1ir avoirfaim rvauxdome11ique langue1arlé || id_ecole : , nocons re > ml dfmethod(kr) resid( id_eleve , t(type))

HTML Code:

mixed Nrdtlectf i.type || id_eleve : , resid(un, t(type)) reml dfmethod(kr)

I am new to bivariate multilevel modeling and I have data for one year (2019) on the reading and math scores of primary school students. I would like to implement a bivariate two-level multilevel model, with the student and school levels. The variable "id_eleve" represents the student identifier within the school, while "Nrdtlectf" and "Nrdtmathf" represent the reading and math test scores, respectively. I hope I have provided all the necessary information to receive assistance.
Comment

Erik Ruzek

Join Date: Oct 2017
Posts: 429

#13

30 Jan 2024, 16:11

The third post of this thread, written by Leonardo Guizzetti elaborates on the steps you need to do to get your data in the shape necessary to run the mixed model. Most important is the following,

To perform multivariate modeling within the -mixed- framework, you will need to reshape your data into a suitable long format, which is to have one observation per unit (e.g., student) per outcome (e.g., math and lecture), and then create a new variable (e.g., type, numbered from 1 to K outcomes) to differentiate each of the outcomes.

.
We imagine your data is currently set up such that each student has one row and there are separate columns for math test score and reading test score. The code below, which is adapted from Leonardo's code in #10 first creates the wide data and then reshapes it:

Code:

clear *
cls

mkf Data
cwf Data

** Create dataset for illustration purposes
set obs 10
gen sid = _n                         // schools
gen u_school = rnormal()    // school random effect
expand 5                             // number of students per school
mat M = (3, 5)                     // math and reading score means
mat Corr = (1,.3\.3,1)          // math and reading score correlation
mat SD = (2,4)                    // math and reading score standard deviations
drawnorm score0 score1, mean(M) corr(Corr) sd(SD) double
label variable score0 "math score"
label variable score1 "reading score"
foreach v of varlist score0 score1 {
    replace `v' = `v' + u_school        // add in school random effect
}
gen `c(obs_t)' pid = _n
sort sid pid

** Your math and reading score variables need to be named something
**  like score0 and score1 for the reshape to work! Rename if necessary

** Reshape wide data to long so you have a single score variable and
**  a 0/1 indicator for subject (each student should have two rows)
reshape long score, i(pid) j(type)
label define test_type 0 "math" 1 "reading"
label values type test_type 

** Run the mixed model
mixed score i.type || sid: || pid : , resid(un, t(type)) reml dfmethod(kr)

Comment

Romuald Landry

Join Date: Nov 2020
Posts: 43

#14

14 Feb 2024, 07:59

I've entered the following command sequence, but I can't get any results, I don't know why.

HTML Code:

 set obs 62934
gen sid = _n                        
gen u_school = rnormal()
matrix define M = J(2, 1, .)
matrix M[1, 1] = 36
matrix M[2, 1] = 35
matrix define SD = J(2, 1, .)
matrix M[2, 1] = 35
matrix SD[1, 1] = 30
matrix SD[2, 1] = 28
drawnorm score0 score1, mean(M) corr(Corr) sd(SD)
label variable Nrdmathf "math score"
label variable Nrdtlectf "reading score"
foreach v of varlist score0 score1 {
    replace `v' = `v' + u_school        // add in school random effect
}
gen `c(obs_t)' pid = _n
sort sid pid
eshape long score, i(pid) j(type)
label define test_type 0 "math score" 1 " reading score"
label values type test_type
mixed score i.type || sid: || pid : , resid(un, t(type)) reml dfmethod(kr)
mixed score i.type langue sexedeelv diflcev1ir avoirfaim || sid: || pid : , resid(un, t(type)) reml dfmethod(kr)
> ml dfmethod(kr) resid(unstructured, t(type))

Click image for larger version

Name: Capture 2.PNG
Views: 1
Size: 89.7 KB
ID: 1743231

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#15

14 Feb 2024, 22:29

Originally posted by Romuald Landry View Post

. . . I am trying to implement a multilevel bivariate model to analyze the determinants of reading and mathematics academic performance. My two dependent variables are the scores on the reading test and the mathematics test. I am entering the following command.

xtmixed Nrdtlectf Nrdmathf sexedeelv agedeélève diflcev1ir avoirfaim rvauxdome11ique ses Nbelvdanslécole nbensgt localisation genremaitre typeécole coursdesoutien || id_ecole : if ecaxte >1, covariance(unstructured)

Originally posted by Romuald Landry View Post

. . . I have data for one year (2019) on the reading and math scores of primary school students. . . . The variable "id_eleve" represents the student identifier within the school, while "Nrdtlectf" and "Nrdtmathf" represent the reading and math test scores, respectively.

Try this:

Code:

rename (Nrdtlectf Nrdmathf) sco#, addnumber(0) reshape long sco, i(id_eleve) j(subj) mixed sco i.subj##i.(sexedeelv agedeélève diflcev1ir avoirfaim rvauxdome11ique ses) /// i.subj##c.(Nbelvdanslécole nbensgt) /// i.subj##i.(localisation genremaitre typeécole coursdesoutien) /// if ecaxte > 1 /// || id_ecole: || id_eleve: , noconstant covariance(unstructured, t(subj))

I'm guessing as to which of your predictors are categorical and which are continuous, and you can make the necessary changes to the factor variable notation.

Given the number of students that you've got, you'll undoubtedly have enough schools that you won't need any small-sample adjustment (i.e., reml dfmethod(kroger)) to your degrees of freedom.
1 like
Comment

Announcement