Hi Statalist,
I’m trying to run a gsem model for a multilevel path analysis in Stata 15.1 in Windows 10. My data has 4,888 observations (survey respondents from a multistage sample), and my target model has 19 observed variables and 2 latent variables.
I use my primary latent variable (Cultcap) as an exogenous variable in each of three paths, and I use the second latent variable to model a second level (the sample of schools that the respondents attended in adolescence) as a random intercept in each path. The outcome of the first path is continuous (ctr_gpa), the outcome of the second path is ordinal (educ_byw5), and the outcome of the final path is continuous (income_h5).
Specifically, I’m trying to estimate the following model:
Code:
This model does not converge by itself. Instead Stata only gets through fitting the fixed-effects model, tries to refine starting values but returns zeros for log likelihood, and lastly returns the error message “initial values not feasible” when trying to fit the full model.
Following the recommendations in the Stata manual on “Convergence problems…” (semintro12.pdf), I’ve used the solution of temporarily simplifying the model, storing estimates from simpler models (i.e., “matrix b =e(b)” after convergence), and using them as starting values for more complex models (i.e., the “from(b)” option). The manual also recommends trying alternative integration methods, fewer integration points, and alternative starting-value-calculation methods, as needed. By trying different subsets of each path, supplemented with alternative settings in one case, I’ve successfully gotten all three paths to converge in separate models. Each of the three models pairs the measurement model with a single path. Here’re the final models for each path:
Code:
My problem arises when I try to combine all three estimates into the full path analysis:
Code:
At this point, Stata immediately returns the error: “initial vector: duplicate entries for acad_pcact:Cultcap found”, as well as return code 507: “name conflict”. It sounds as if Stata cannot estimate my primary latent variable in more than one path. Is that the right interpretation?
In full disclosure, I’ve also run a version of the full, multilevel path analysis minus the primary latent variable, and that 3-path model converges without needing any intermediate stages of storing estimates for use as starting values. I’ve also run a version that separates (1) the measurement model to a separate model, which I used to generate predicted values for the primary latent variable and (2) a 3-path model that substitutes the predicted values variable for the latent variable in each path. To run the 3-path model, I used the Stata manual recommendations to get each path to converge separately, before trying to combine all three estimates. The result was that Stata returned a similar error: “initial vector: duplicate entries for /:var(M1[schools]) found.” In brief, I believe the problem is confined to combining my latent variable estimates.
Is gsem unable to combine matrices with identically named latent variables, whereas it can handle identically named observed variables?
Is the solution simply renaming the latent variables uniquely for each path to be combined (i.e., Cultcap1, Cultcap2, Cultcap3, M1, M2, M3)? Is it really necessary to inflate the number the variables in the overall model?
Or is there a better solution? Is there any mistake in my code causing these problems? Or is the complexity of the model just too much for the data or for gsem?
Thanks,
J
I’m trying to run a gsem model for a multilevel path analysis in Stata 15.1 in Windows 10. My data has 4,888 observations (survey respondents from a multistage sample), and my target model has 19 observed variables and 2 latent variables.
I use my primary latent variable (Cultcap) as an exogenous variable in each of three paths, and I use the second latent variable to model a second level (the sample of schools that the respondents attended in adolescence) as a random intercept in each path. The outcome of the first path is continuous (ctr_gpa), the outcome of the second path is ordinal (educ_byw5), and the outcome of the final path is continuous (income_h5).
Specifically, I’m trying to estimate the following model:
Code:
Code:
gsem (Cultcap -> acad_pcact parcontrol parexp_educ educ_effort) /// (ctr_cumgpa <- Cultcap /// ib4.racesingle i.female immigpar polstopbef18 ib1.region ib1.pars_edhi /// par_married anyrelig /// M1[schools]@1, regress) /// (educ_byw5 <- Cultcap ctr_cumgpa /// ib4.racesingle i.female immigpar polstopbef18 ib1.region ib1.pars_edhi /// M1[schools]@1, ologit) /// (income_h5 <- Cultcap ctr_cumgpa ib1.educ_byw5) /// reg_migrant child_lt6hh /// ib4.racesingle i.female immigpar polstopbef18 /// pars_postgradboth /// M1[schools]@1, regress) /// if sample_id==1, latent(Cultcap M1)
Following the recommendations in the Stata manual on “Convergence problems…” (semintro12.pdf), I’ve used the solution of temporarily simplifying the model, storing estimates from simpler models (i.e., “matrix b =e(b)” after convergence), and using them as starting values for more complex models (i.e., the “from(b)” option). The manual also recommends trying alternative integration methods, fewer integration points, and alternative starting-value-calculation methods, as needed. By trying different subsets of each path, supplemented with alternative settings in one case, I’ve successfully gotten all three paths to converge in separate models. Each of the three models pairs the measurement model with a single path. Here’re the final models for each path:
Code:
Code:
gsem (Cultcap -> acad_pcact parcontrol parexp_educ educ_effort) /// (ctr_cumgpa <- Cultcap /// ib4.racesingle i.female immigpar polstopbef18 ib1.region ib1.pars_edhi /// par_married anyrelig /// M1[schools]@1, regress) /// if sample_id==1, latent(Cultcap M1) /// from(b) matrix b = e(b) gsem (Cultcap -> acad_pcact parcontrol parexp_educ educ_effort) /// (educ_byw5 <- Cultcap ctr_cumgpa /// ib4.racesingle i.female immigpar polstopbef18 ib1.region ib1.pars_edhi /// M1[schools]@1, ologit) /// if sample_id==1, latent(Cultcap M1) /// from(c) matrix c = e(b) gsem (Cultcap -> acad_pcact parcontrol parexp_educ educ_effort) /// (income_h5 <- Cultcap ctr_cumgpa ib1.educ_byw5) /// reg_migrant child_lt6hh /// ib4.racesingle i.female immigpar polstopbef18 /// pars_postgradboth /// M1[schools]@1, regress) /// if sample_id==1, latent(Cultcap M1) /// from(d) matrix d = e(b)
Code:
Code:
gsem (Cultcap -> acad_pcact parcontrol parexp_educ educ_effort) /// (ctr_cumgpa <- Cultcap /// ib4.racesingle i.female immigpar polstopbef18 ib1.region ib1.pars_edhi /// par_married anyrelig /// M1[schools]@1, regress) /// (educ_byw5 <- Cultcap ctr_cumgpa /// ib4.racesingle i.female immigpar polstopbef18 ib1.region ib1.pars_edhi /// M1[schools]@1, ologit) /// (income_h5 <- Cultcap ctr_cumgpa ib1.educ_byw5) /// reg_migrant child_lt6hh /// ib4.racesingle i.female immigpar polstopbef18 /// pars_postgradboth /// M1[schools]@1, regress) /// if sample_id==1, latent(Cultcap M1) /// from(b c d)
In full disclosure, I’ve also run a version of the full, multilevel path analysis minus the primary latent variable, and that 3-path model converges without needing any intermediate stages of storing estimates for use as starting values. I’ve also run a version that separates (1) the measurement model to a separate model, which I used to generate predicted values for the primary latent variable and (2) a 3-path model that substitutes the predicted values variable for the latent variable in each path. To run the 3-path model, I used the Stata manual recommendations to get each path to converge separately, before trying to combine all three estimates. The result was that Stata returned a similar error: “initial vector: duplicate entries for /:var(M1[schools]) found.” In brief, I believe the problem is confined to combining my latent variable estimates.
Is gsem unable to combine matrices with identically named latent variables, whereas it can handle identically named observed variables?
Is the solution simply renaming the latent variables uniquely for each path to be combined (i.e., Cultcap1, Cultcap2, Cultcap3, M1, M2, M3)? Is it really necessary to inflate the number the variables in the overall model?
Or is there a better solution? Is there any mistake in my code causing these problems? Or is the complexity of the model just too much for the data or for gsem?
Thanks,
J