How to specify the syntax for a four-level multilevel model using Stata

Tiago Pereira

Join Date: Jan 2016

Posts: 375
#1

How to specify the syntax for a four-level multilevel model using Stata

25 Feb 2022, 08:39

Dear Statalisters,

I have been (slowly) learning multilevel modelling in Stata and right now we are in the middle of a productive discussion comparing Stata vs MLwiN.

I am trying to show the benefits of post-estimation commands in Stata, graphs, etc. However, I am not 100% sure regarding the syntax when it comes to high order models (e.g., 4 levels). The analysis I am trying to replicate is based on a previous clinical trial [1], whose data structure is as follows:

In a nutshell: repeated measurements are nested within sites, which are nested within teeth, which is turn are nested in subjects.

I am considering the following syntax:

Code:

command dependent_variable independent_variables || subject: || tooth: || sites: || time:

So, this would be a four-level model with four random-effects equations. The first equation is a random intercept at the subject level. The second is a random intercept at the tooth level, the third is a random intercept at the site level. The fourth is a random intercept at time? Any suggestions on how to incorporate correctly a time-dependent explanatory variable?

Any tips/suggestions are extremely welcome.

All the best,

Tiago

Reference

[1] Müller, H. P., Barrieshi‐Nusair, K. M., Könönen, E., & Yang, M. (2006). Effect of triclosan/copolymer‐containing toothpaste on the association between plaque and gingival bleeding: a randomized controlled clinical trial. Journal of clinical periodontology, 33(11), 811-818. https://onlinelibrary.wiley.com/doi/...X.2006.00993.x
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

25 Feb 2022, 11:29

No, there is no explicit representation of the bottom level of the model in Stata's multi-level modeling commands. If you try to include time: or something equivalent to it, you will probably end up with the model failing to converge because Stata will try to independently estimate a time-level intercept and a residual. But since you have only one observation per person-tooth-site-time, the time-level intercept is the residual. So the resulting model is unidentifiable: there is no way to "split" the bottom level variance between a time-level intercept and a residual. This usually results in non-convergence. Even if it happens to converge, the results are uninterpretable. The correct syntax here is:

Code:

command dependent_variable independent_variables || subject: || tooth: || sites:

In Stata's multi-level commands for nested effects (no cross-effects or multiple memberships), the correct syntax always involves one fewer || than the number of levels.

Added: I have interpreted your description of the study as meaning that the "repeated measures" refers to one measurement at each time period (for each site within tooth within person). If you have multiple measures at each time period, then that is a different design, and the corresponding model would be a five-level model and would have the syntax you show in #1.

Last edited by Clyde Schechter; 25 Feb 2022, 11:33.
1 like
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 375
#3

25 Feb 2022, 12:37

As always, many thanks, Clyde, for your wise suggestions and comments. What a great explanation, remarkably helpful for me and for sure many other Stata users.
Comment
Chul Lee

Join Date: Apr 2019

Posts: 45
#4

25 Feb 2022, 13:13

I am interested in the model Tiago is preparing and want to look forward to seeing how the design will be developed. Thank you for this question and a very practical answer.

I have a similar yet different design to ask for any advice/hints/insights from this general forum. I appreciate it in advance.
My research question is whether the final '(a) course grades or (b) ABCD vs. FW' are related with students' retention. I have four separate data sets, which can be easily merged together if I need to for Stata or MLwinN (I am a Stata beginner and I am currently using HLM for multilevel analysis).

1. Faculty; this file has information of what courses instructors are teaching and some faculty characteristics.
2. Student: this file has what courses students are taking and a code of retention vs. attrition as well as student characteristics.
3. Course; this file has information of course characteristics.
4. Grade: this file has atomic grades for each section of each student.

Problem:
The relationships of grade-course-(student) to faculty are best fit with a cross-classified membership model, but this model does not allow 'retention/attrition' in student-level as a dependent variable. I need to dig documents more but neither multi-rater model nor repeated measures of cross-classified allow it. Also, I cannot treat my design as another nested model like Tiago's one. Unlike Tiagon's data, my students are crossing different faculties and students are re-grouped by courses. In order to explain the error terms of 'retention/attrition', I guess it should be controlled by course level as well as faculty level.

Question:
Can anyone advise me what kind of technique I need to handle this study? I don't insist on using multilevel modeling, but welcome any other parsimonious techniques if you have in mind. I am also open to start with simpler model or simpler study (let's say focusing on grade, course characteristics and student retention, no faculty data) if I can interpret the effect of different grades in different courses on student retention/attrition.
Comment

Announcement

How to specify the syntax for a four-level multilevel model using Stata

Comment

Comment

Comment