gsem and partially nested design

Hiro farabi

Join Date: Feb 2024

Posts: 22
#1

gsem and partially nested design

06 Sep 2024, 09:13

I have a partially nested three level model. The study randomised by sites and in each site I have coaches which just covered intervention groups. This is my model for outcome

mixed q_6m treat q_b || site: ||///

therapist:treat, nocons reml ///

residuals(independent, by(treat)) ///

dfmethod(sat)

I have the same for cost. I want to use equation model using gsem. Do you know how can I do this? I have tried

Gsem (cost6 <- cost_b treat M1[site] M2[therapist<site]* treat) (Y_6m <- treat Y_B M1[site] M2[therapist<site]* treat) . I don’t know if it is correct or not and I couldn’t add

, nocons reml ///

residuals(independent, by(treat)) ///

dfmethod(sat)

on it.
Tags: SEM GSEM mixed effect mon
Hiro farabi

Join Date: Feb 2024

Posts: 22
#2

06 Sep 2024, 10:20

Carlo Lazzaro Nick Cox Clyde Schechter could you pleas help me with this issue?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#3

06 Sep 2024, 13:29

It would be very helpful if you could share a portion of your data using the dataex command (ssc search dataex if you are on an older version of Stata). gsem uses full maximum likelihood. If you prefer to use restricted maximum likelihood with the Satterthwaite df correction, then you will have to use mixed. You can, however, trick mixed to do bivariate multilevel models. Note that I'm not certain you can do a mixed version of the full model you show for q_6m. In particular, it might not be possible to allow for different residuals for the treatment groups.

For a version of your model without that particular complexity, you would need to first make both your outcome and baseline variables have the same variable name stub but with different endings. Then you reshape your data to long:

Code:

gen out1 = q_6m gen out_b1 = q_b gen out2 = cost6 gen out_b2 = cost_b * Note I assume you have a subject id called sid reshape long out out_b, i(sid) j(type)

Then you can estimate the bivariate mixed model as follows:

Code:

mixed out i.treat##i.type out_b || site: || therapist: treat /// || sid: , resid(un, t(type)) reml dfmethod(sat) // note: you may need to add noconstant as an option

See also this thread.

Last edited by Erik Ruzek; 06 Sep 2024, 13:53. Reason: Edited mixed code
2 likes
Comment
Hiro farabi

Join Date: Feb 2024

Posts: 22
#4

06 Sep 2024, 14:29

Originally posted by Erik Ruzek View Post

It would be very helpful if you could share a portion of your data using the dataex command (ssc search dataex if you are on an older version of Stata). gsem uses full maximum likelihood. If you prefer to use restricted maximum likelihood with the Satterthwaite df correction, then you will have to use mixed. You can, however, trick mixed to do bivariate multilevel models. Note that I'm not certain you can do a mixed version of the full model you show for q_6m. In particular, it might not be possible to allow for different residuals for the treatment groups.

For a version of your model without that particular complexity, you would need to first make both your outcome and baseline variables have the same variable name stub but with different endings. Then you reshape your data to long:

Code:

gen out1 = q_6m gen out_b1 = q_b gen out2 = cost6 gen out_b2 = cost_b * Note I assume you have a subject id called sid reshape long out out_b, i(sid) j(type)

Then you can estimate the bivariate mixed model as follows:

Code:

mixed out i.treat##i.type out_b || site: || therapist: treat /// || sid: , resid(un, t(type)) reml dfmethod(sat) // note: you may need to add noconstant as an option

See also this thread.

Thank you so much for your response Erike. I want to know how can I add "nocons reml residuals(independent, by(treat)) dfmethod(sat)" to gsem (cost6 <- cost_b treat M1[site] M2[therapist<site]* treat) (Y_6m <- treat Y_B M1[site] M2[therapist<site]* treat).

BW
Hiro
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#5

06 Sep 2024, 14:33

You cannot use restricted maximum likelihood (reml) with gsem. Likewise, you cannot use Satterthwaite standard errors with gsem. To the extent that you need those estimation options, you will have to use mixed.

Added:
You also haven't explained why you want to use gsem. Is it solely because you want to run a multivariate model? If so, the analysis you want to do can more or less be done with mixed, as shown in post #3.

Last edited by Erik Ruzek; 06 Sep 2024, 14:36.
Comment
Hiro farabi

Join Date: Feb 2024

Posts: 22
#6

06 Sep 2024, 15:05

Originally posted by Erik Ruzek View Post

You cannot use restricted maximum likelihood (reml) with gsem. Likewise, you cannot use Satterthwaite standard errors with gsem. To the extent that you need those estimation options, you will have to use mixed.

Added:
You also haven't explained why you want to use gsem. Is it solely because you want to run a multivariate model? If so, the analysis you want to do can more or less be done with mixed, as shown in post #3.

Thank you so much Erik! I am currently deciding on the type of estimation method to use, and gsem is one of the options that should be tested. What happens if I remove the reml option? Can I run the model without reml? Additionally, I would like to know if I have correctly specified the partially nested model. Also, is it possible to specify residuals by(treat)? I received an error when trying to do so.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#7

06 Sep 2024, 16:01

It might be possible to specify the gsem model in such a way that you could achieve something like resid(independent, by(treat)) but it would be hard for me to figure out without having the data. Joseph Coveney might have some insight into this issue. See this thread for a somewhat similar problem, for example. With regards to the partial nesting, I'm not sure I fully understand your data. Are the coaches encoded in the model in the therapist random effect?

Either way, you will help your cause by using -dataex- to share some representative slice of your data. This is a very complicated analysis.

Last edited by Erik Ruzek; 06 Sep 2024, 16:05. Reason: Added link to Statalist thread
Comment

Hiro farabi

Join Date: Feb 2024
Posts: 22

06 Sep 2024, 16:13

Originally posted by Erik Ruzek View Post

It might be possible to specify the gsem model in such a way that you could achieve something like resid(independent, by(treat)) but it would be hard for me to figure out without having the data. Joseph Coveney might have some insight into this issue. See this thread for a somewhat similar problem, for example. With regards to the partial nesting, I'm not sure I fully understand your data. Are the coaches encoded in the model in the therapist random effect?

Either way, you will help your cause by using -dataex- to share some representative slice of your data. This is a very complicated analysis.

the data is look like the below table:

pid	therapist	site	treat	cost_6m	cost_b	EQ5d_b	Qaly_6m
hj012	h01	M	1	12	20	0.32	0.89
hj013	h01	M	1	12	18	0.56	0.65
hj014	h01	M	1	22	20	0.23	0.56
hj015	h02	G	1	2	0	0.12	0.65
hj016	h02	G	1	14	5	0.65	0.78
hj017	h03	G	1	.	12	0.15	0.89
hj018	h03	G	1	16	20	0.32	0.89
hj019	0	M	0	6	5	0.56	0.65
hj020	0	M	0	5	1	0.23	0.56
hj021	0	G	0	7	5	0.12	0.65
hj022	0	G	0	1	1	0.65	0.78
hj023	0	G	0	3	4	0.15	0.89

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 398
#9

06 Sep 2024, 17:11

Thank you. How many sites do you have?

This thread is highly relevant. Please read it carefully: https://www.statalist.org/forums/for...gn-and-xtmixed

It does not look like you have coded your data as suggested in that thread and the linked article. Specifically, Scott Baldwin says that you need to assign a unique therapist ID for each individual in the control group such that each is a cluster of 1. Personally, I would focus on getting the data and model right for the univariate case. Folks might have ideas about the gsem multivariate case.
Comment
Hiro farabi

Join Date: Feb 2024

Posts: 22
#10

06 Sep 2024, 17:32

Originally posted by Erik Ruzek View Post

Thank you. How many sites do you have?

This thread is highly relevant. Please read it carefully: https://www.statalist.org/forums/for...gn-and-xtmixed

It does not look like you have coded your data as suggested in that thread and the linked article. Specifically, Scott Baldwin says that you need to assign a unique therapist ID for each individual in the control group such that each is a cluster of 1. Personally, I would focus on getting the data and model right for the univariate case. Folks might have ideas about the gsem multivariate case.

I have 14 site. Thanks for sharing that thread.
Comment
Hiro farabi

Join Date: Feb 2024

Posts: 22
#11

14 Oct 2024, 09:20

Originally posted by Erik Ruzek View Post

It would be very helpful if you could share a portion of your data using the dataex command (ssc search dataex if you are on an older version of Stata). gsem uses full maximum likelihood. If you prefer to use restricted maximum likelihood with the Satterthwaite df correction, then you will have to use mixed. You can, however, trick mixed to do bivariate multilevel models. Note that I'm not certain you can do a mixed version of the full model you show for q_6m. In particular, it might not be possible to allow for different residuals for the treatment groups.

For a version of your model without that particular complexity, you would need to first make both your outcome and baseline variables have the same variable name stub but with different endings. Then you reshape your data to long:

Code:

gen out1 = q_6m gen out_b1 = q_b gen out2 = cost6 gen out_b2 = cost_b * Note I assume you have a subject id called sid reshape long out out_b, i(sid) j(type)

Then you can estimate the bivariate mixed model as follows:

Code:

mixed out i.treat##i.type out_b || site: || therapist: treat /// || sid: , resid(un, t(type)) reml dfmethod(sat) // note: you may need to add noconstant as an option

See also this thread.

Hi Erik,
In this regression, by including type in the residuals, I can account for the joint association of the two equations. However, the partially nested structure of my model is not being considered. My data is designed with a partially nested structure, where patients in both groups are in different sites, and in the intervention group, patients within these sites are served by different coaches.

When I include treatment in the residuals, I encounter an error saying "convergence not achieved." Another concern is that since sid is repeated for both cost and QALY, I’m not sure if the time variable (t) is being applied correctly in the model.
could you please help me?

BW,
Hiro
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#12

14 Oct 2024, 14:39

Hiro,

I think you will have to abandon the multivariate aspect of the analysis in order to properly accommodate the partially nested structure and you want there to be heteroskedastic residuals by treatment. If you can convince yourself that the residuals are homoscedastic for both outcomes (see the proposed test in post #7 in the thread linked earlier), then you could probably move forward with the multivariate analysis.

If this were my data, I would 1) get the two univariate models correctly specified and estimated, 2) test for heteroskedasticity of residuals by treatment/arm, and 3) if homoskedastic residuals are appropriate for both outcomes, then put them into the multivariate model.
Comment
Hiro farabi

Join Date: Feb 2024

Posts: 22
#13

15 Oct 2024, 04:07

Originally posted by Erik Ruzek View Post

Hiro,

I think you will have to abandon the multivariate aspect of the analysis in order to properly accommodate the partially nested structure and you want there to be heteroskedastic residuals by treatment. If you can convince yourself that the residuals are homoscedastic for both outcomes (see the proposed test in post #7 in the thread linked earlier), then you could probably move forward with the multivariate analysis.

If this were my data, I would 1) get the two univariate models correctly specified and estimated, 2) test for heteroskedasticity of residuals by treatment/arm, and 3) if homoskedastic residuals are appropriate for both outcomes, then put them into the multivariate model.

Thank you so much for your response. What happens if homoskedastic residuals are appropriate for only one of the outcomes?
Comment
Hiro farabi

Join Date: Feb 2024

Posts: 22
#14

15 Oct 2024, 04:20

Erik Ruzek
Another question: In the recommended thread, you mentioned, "The distinction between the fully nested model and partially nested model is separate from the distinction between homo- and heteroscedastic (in our paper, anyway). The former has to do with the random effects, and the latter has to do with the residuals. The -residual- option in -mixed- controls the residuals; therefore, just using that option doesn't affect whether the model is fully or partially nested."

Does this mean my estimation doesn't need to include the variable related to the partially nested design in the residuals?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#15

15 Oct 2024, 08:04

Hiro,

The residual specification in this model is not related to the partially nested design. It relaxes the testable assumption that a single residual parameter is appropriate for your model. You have partial nesting whether or not you estimate separate residuals by treatment status.
Comment

Announcement