Hi Stata community,
I'm using SEM to investigate young people’s wellbeing. Data is self-reported, measured multidimensionally via a pre-validated instrument. Wellbeing is operationalised using 20 items measuring four different subconstructs (Interpersonal, Life Satisfaction, Negative and Eudaimonic). My research seeks to quantify each subconstruct's relative association with my outcome variable. At present I'm running EFA to build a model that best represents how each of the four wellbeing subconstructs as distinct but related.
Potential multicollinearity between the four subconstructs led me to consider using Exploratory Structural Equation Modelling (ESEM), which is now widely used in my field. However, Prof. Ender's materials here are the only resource I'm able to find on how to conduct ESEM in Stata. In terms of the extent of multicollinearity in my data, my exploration (scrutinising VIFs, bivariate correlations between variables and AVEs) found some evidence this may be problematic (output below). However, when building the model up into the full SEMs (i.e., Model 1 with only one wellbeing subconstruct, Model 2 with two wellbeing subconstructs... etc.) follow-up comparisons of path coefficients and SEs for each of the SEMs don't suggest multicollinearity is causing estimations to change too drastically (output again is below for Stata community to scrutinize).
My questions are:
Example of my data
Evidence of multicollinearity
Model comparisons (path coefficients and SEs in full SEM models) - NB. Models adding new wellbeing subconstructs with full four-factor SEM last (right hand-side)
Many thanks in advance for your time and expertise.
Kind regards,
Tania
I'm using SEM to investigate young people’s wellbeing. Data is self-reported, measured multidimensionally via a pre-validated instrument. Wellbeing is operationalised using 20 items measuring four different subconstructs (Interpersonal, Life Satisfaction, Negative and Eudaimonic). My research seeks to quantify each subconstruct's relative association with my outcome variable. At present I'm running EFA to build a model that best represents how each of the four wellbeing subconstructs as distinct but related.
Potential multicollinearity between the four subconstructs led me to consider using Exploratory Structural Equation Modelling (ESEM), which is now widely used in my field. However, Prof. Ender's materials here are the only resource I'm able to find on how to conduct ESEM in Stata. In terms of the extent of multicollinearity in my data, my exploration (scrutinising VIFs, bivariate correlations between variables and AVEs) found some evidence this may be problematic (output below). However, when building the model up into the full SEMs (i.e., Model 1 with only one wellbeing subconstruct, Model 2 with two wellbeing subconstructs... etc.) follow-up comparisons of path coefficients and SEs for each of the SEMs don't suggest multicollinearity is causing estimations to change too drastically (output again is below for Stata community to scrutinize).
My questions are:
- How ubiquitous is the use of ESEM in Stata in the way proposed by Prof. Ender? I've managed to replicate this with my data, but I'm puzzled by the lack of available resources for ESEM in Stata which is making me wonder why more resources are not available? I'm wondering whether I will struggle to use ESEM together with the structural part of my models due to complexity.
- When exploring multicollinearity in the context of SEM, should sum scores, factor scores or individual items be scrutinised when it comes to looking at VIFs, correlations and AVEs?
- Should the evidence of multicollinearity between my variables (output below) give me cause for concern when it comes to entering all four of these latent exogenous variables into a SEM together?
Example of my data
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long(wbs1 wbs4 wbs8 wbs10 wbs18 wbs5 wbs13 wbs17 wbs19 wbs6 wbs7 wbs14 wbs20 wbs21 wbs2 wbs3 wbs9 wbs15) 2 4 4 4 3 3 3 3 3 3 2 3 2 2 2 2 2 5 1 5 5 5 5 1 1 1 1 2 1 5 1 1 1 1 2 1 3 2 4 4 3 2 4 3 2 3 3 4 3 3 2 3 2 2 2 5 4 4 3 3 1 2 2 3 3 3 2 4 3 2 3 2 1 5 5 5 4 1 1 2 1 2 1 1 1 1 1 2 2 1 4 3 5 5 3 3 3 2 4 3 4 3 5 2 2 4 4 4 5 2 2 3 1 5 3 4 4 3 2 4 4 3 5 4 4 4 2 4 5 4 5 3 2 2 3 2 2 2 2 2 3 2 2 2 3 3 5 5 3 4 1 2 3 2 2 4 3 2 3 2 2 3 1 5 5 5 5 1 1 1 1 1 1 1 1 1 1 1 1 1 2 4 5 5 1 2 2 2 2 3 2 2 2 3 4 2 4 2 3 3 4 4 3 3 4 4 4 3 3 4 4 4 3 4 4 4 3 3 4 2 4 2 3 3 3 2 1 4 3 2 3 4 3 3 3 4 5 3 3 3 3 2 3 2 3 4 2 4 5 3 3 3 4 2 3 3 3 3 4 3 4 4 4 4 4 4 4 4 4 4 1 4 4 3 5 3 1 2 2 3 1 2 2 2 2 1 1 2 4 2 4 3 3 3 3 3 3 4 4 3 3 3 4 4 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 1 4 5 5 4 4 1 2 1 2 1 2 3 1 2 2 2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . end label def WB 1 "Never", modify label def WB 2 "Not Often", modify label def WB 3 "Sometimes", modify label def WB 4 "Often", modify label def WB 5 "Always", modify
Code:
**(1) Check VIFs for all factors in dataset. VIFs that are >.10 suggest a problem ** ** Using sumscores of the wellbeing subconstructs** . regress outcomevariable wbsint_sum2 wbseud_sum2 wbslife_sum2 wbsneg_sum2 Source | SS df MS Number of obs = 875 -------------+---------------------------------- F(4, 870) = 12.97 Model | 162.409346 4 40.6023365 Prob > F = 0.0000 Residual | 2722.69465 870 3.12953409 R-squared = 0.0563 -------------+---------------------------------- Adj R-squared = 0.0520 Total | 2885.104 874 3.30103432 Root MSE = 1.769 ------------------------------------------------------------------------------ outcomevariable~h | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- wbsint_sum2 | -.0289679 .0233199 -1.24 0.214 -.0747377 .0168019 wbseud_sum2 | .1358681 .0242438 5.60 0.000 .0882849 .1834513 wbslife_sum2 | -.0358798 .0248797 -1.44 0.150 -.0847111 .0129515 wbsneg_sum2 | -.002507 .0261081 -0.10 0.924 -.0537493 .0487352 _cons | 3.981834 .6141696 6.48 0.000 2.776406 5.187261 ------------------------------------------------------------------------------ . vif Variable | VIF 1/VIF -------------+---------------------- wbseud_sum2 | 3.20 0.312717 wbslife_sum2 | 2.94 0.340589 wbsint_sum2 | 2.53 0.395344 wbsneg_sum2 | 1.92 0.520799 -------------+---------------------- Mean VIF | 2.65 . ** Using factor scores of wellbeing subconstructs . regress outcomevariable Eudaimonic Interpersonal Lifesat Negative Source | SS df MS Number of obs = 917 -------------+---------------------------------- F(4, 912) = 12.90 Model | 163.757132 4 40.9392829 Prob > F = 0.0000 Residual | 2893.38791 912 3.17257446 R-squared = 0.0536 -------------+---------------------------------- Adj R-squared = 0.0494 Total | 3057.14504 916 3.33749458 Root MSE = 1.7812 ------------------------------------------------------------------------------- outcomevariable | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- Eudaimonic | .9078153 .190283 4.77 0.000 .5343718 1.281259 Interpersonal | -.4550978 .1775992 -2.56 0.011 -.8036483 -.1065472 Lifesat | -.29361 .266601 -1.10 0.271 -.8168328 .2296127 Negative | -.1445177 .0541875 -2.67 0.008 -.2508643 -.0381711 _cons | 4.854106 .2617944 18.54 0.000 4.340317 5.367896 ------------------------------------------------------------------------------- . vif Variable | VIF 1/VIF -------------+---------------------- Lifesat | 13.09 0.076418 Eudaimonic | 10.95 0.091284 Interperso~l | 9.01 0.110954 Negative | 1.03 0.967547 -------------+---------------------- Mean VIF | 8.52 . *Using individual items . regress outcomevariable wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 wbs20 Source | SS df MS Number of obs = 875 -------------+---------------------------------- F(15, 859) = 7.11 Model | 318.516808 15 21.2344539 Prob > F = 0.0000 Residual | 2566.58719 859 2.98787799 R-squared = 0.1104 -------------+---------------------------------- Adj R-squared = 0.0949 Total | 2885.104 874 3.30103432 Root MSE = 1.7285 ------------------------------------------------------------------------------ outcomevariable~h | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- wbs1 | .0059105 .1001725 0.06 0.953 -.1907011 .2025221 wbs2 | .093906 .08118 1.16 0.248 -.0654283 .2532404 wbs3 | .3839984 .0871796 4.40 0.000 .2128884 .5551084 wbs9 | .3026809 .0830548 3.64 0.000 .1396668 .4656949 wbs15 | -.0773907 .0832107 -0.93 0.353 -.2407107 .0859294 wbs6 | -.2069086 .0983767 -2.10 0.036 -.3999954 -.0138218 wbs7 | -.0988398 .098042 -1.01 0.314 -.2912697 .09359 wbs11 | -.1612017 .0884901 -1.82 0.069 -.3348838 .0124805 wbs14 | .0466253 .0756693 0.62 0.538 -.101893 .1951437 wbs21 | .1308543 .0767839 1.70 0.089 -.0198518 .2815604 wbs5 | -.2193482 .079129 -2.77 0.006 -.3746569 -.0640395 wbs13 | -.0558144 .078205 -0.71 0.476 -.2093097 .0976808 wbs17 | .0219586 .0921006 0.24 0.812 -.15881 .2027272 wbs19 | -.0667771 .0954528 -0.70 0.484 -.2541252 .120571 wbs20 | .2736249 .0985175 2.78 0.006 .0802617 .466988 _cons | 3.773488 .2565241 14.71 0.000 3.27 4.276975 ------------------------------------------------------------------------------ . vif Variable | VIF 1/VIF -------------+---------------------- wbs1 | 3.16 0.316891 wbs7 | 3.04 0.328607 wbs6 | 2.87 0.348388 wbs20 | 2.79 0.358511 wbs3 | 2.55 0.392300 wbs19 | 2.49 0.401982 wbs15 | 2.49 0.402062 wbs17 | 2.29 0.437160 wbs13 | 2.21 0.452178 wbs2 | 2.16 0.462730 wbs9 | 2.03 0.492234 wbs11 | 1.98 0.504404 wbs21 | 1.88 0.532478 wbs5 | 1.84 0.544028 wbs14 | 1.82 0.549123 -------------+---------------------- Mean VIF | 2.37 . collin wbsint_sum2 wbseud_sum2 wbslife_sum2 wbsneg_sum2 (obs=897) Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- wbsint_sum2 2.52 1.59 0.3964 0.6036 wbseud_sum2 3.15 1.77 0.3175 0.6825 wbslife_sum2 2.96 1.72 0.3377 0.6623 wbsneg_sum2 1.91 1.38 0.5247 0.4753 ---------------------------------------------------- Mean VIF 2.63 Cond Eigenval Index --------------------------------- 1 4.7957 1.0000 2 0.1590 5.4926 3 0.0199 15.5422 4 0.0186 16.0768 5 0.0070 26.2302 --------------------------------- Condition Number 26.2302 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.0867 . collin Eudaimonic Interpersonal Lifesat Negative (obs=942) Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- Eudaimonic 10.89 3.30 0.0918 0.9082 Interpersonal 9.11 3.02 0.1097 0.8903 Lifesat 13.31 3.65 0.0751 0.9249 Negative 1.03 1.01 0.9707 0.0293 ---------------------------------------------------- Mean VIF 8.59 Cond Eigenval Index --------------------------------- 1 4.7737 1.0000 2 0.1738 5.2402 3 0.0384 11.1434 4 0.0086 23.5783 5 0.0055 29.5547 --------------------------------- Condition Number 29.5547 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.0111 . ** (2) How are the factors correlated? [estat common command, run after EFA] . ** Eudaimonic*Interpersonal = .42, Eudaimonic*Lifesat = .45, Eudaimonic*Negative = -.35 . ** Interpersonal*Lifesat = .49, Interpersonal*Negative = -.26 . ** Lifesat*Negative = -.33 . ** Correlation between sum scores . pwcorr wbsint_sum2 wbseud_sum2 wbslife_sum2 wbsneg_sum2, sig star(0.05) | wbsint~2 wbseud~2 wbslif~2 wbsneg~2 -------------+------------------------------------ wbsint_sum2 | 1.0000 | | wbseud_sum2 | 0.7248* 1.0000 | 0.0000 | wbslife_sum2 | 0.7304* 0.7664* 1.0000 | 0.0000 0.0000 | wbsneg_sum2 | -0.5741* -0.6641* -0.6183* 1.0000 | 0.0000 0.0000 0.0000 | ** Correlation between factor scores . pwcorr Eudaimonic Interpersonal Lifesat, sig star(0.05) | Eudaim~c Interp~l Lifesat -------------+--------------------------- Eudaimonic | 1.0000 | | Interperso~l | 0.9206* 1.0000 | 0.0000 | Lifesat | 0.9474* 0.9368* 1.0000 | 0.0000 0.0000 | ** Correlation between individual items . pwcorr wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 wbs20, sig star(0.05) | wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 -------------+--------------------------------------------------------------- wbs1 | 1.0000 | | wbs2 | 0.6764* 1.0000 | 0.0000 | wbs3 | 0.7035* 0.6051* 1.0000 | 0.0000 0.0000 | wbs9 | 0.5713* 0.5334* 0.6235* 1.0000 | 0.0000 0.0000 0.0000 | wbs15 | 0.6973* 0.5813* 0.6441* 0.5725* 1.0000 | 0.0000 0.0000 0.0000 0.0000 | wbs6 | 0.5458* 0.5208* 0.5091* 0.4724* 0.4863* 1.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs7 | 0.6242* 0.5624* 0.5780* 0.4985* 0.5582* 0.7516* 1.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs11 | 0.5396* 0.4928* 0.4923* 0.5210* 0.5197* 0.5714* 0.5925* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs14 | 0.4532* 0.4397* 0.4382* 0.4426* 0.4388* 0.5454* 0.4681* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs21 | 0.4118* 0.4084* 0.4125* 0.3987* 0.3871* 0.5836* 0.5578* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs5 | 0.5286* 0.4962* 0.4735* 0.4370* 0.4864* 0.4935* 0.5113* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs13 | 0.5548* 0.4799* 0.5111* 0.4934* 0.5427* 0.5100* 0.5396* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs17 | 0.5248* 0.4694* 0.5115* 0.4646* 0.4958* 0.4940* 0.5123* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs19 | 0.5476* 0.5085* 0.5384* 0.5186* 0.5574* 0.5188* 0.5378* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs20 | 0.6407* 0.5552* 0.5755* 0.5575* 0.6139* 0.6026* 0.5939* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | | wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 -------------+--------------------------------------------------------------- wbs11 | 1.0000 | | wbs14 | 0.4462* 1.0000 | 0.0000 | wbs21 | 0.4601* 0.5494* 1.0000 | 0.0000 0.0000 | wbs5 | 0.4231* 0.3201* 0.3563* 1.0000 | 0.0000 0.0000 0.0000 | wbs13 | 0.5202* 0.4238* 0.4253* 0.4630* 1.0000 | 0.0000 0.0000 0.0000 0.0000 | wbs17 | 0.4878* 0.3967* 0.4295* 0.5101* 0.6532* 1.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs19 | 0.4864* 0.3686* 0.4463* 0.5971* 0.5770* 0.6331* 1.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | wbs20 | 0.5606* 0.5077* 0.5242* 0.5419* 0.6112* 0.6289* 0.6593* | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | ** (3) (Check the Average Variance Extracted in the measurement models ) . ** type 'condisc' after running SEM to assess the average variance extracted (AVE) ** . sem (Eudaimonic -> wbs1@1 wbs2 wbs3 wbs9 wbs15)(Interpersonal -> wbs6@1 wbs7 wbs11 wbs14 wbs21)(Lifesat -> wbs5@1 wbs13 wbs17 wbs19 wbs20)(Negative -> wbs4 wbs8 wbs10), latent(Eudaimonic Interpersonal Lifesat Negative) stand Endogenous variables Measurement: wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 wbs20 wbs4 wbs8 wbs10 Exogenous variables Latent: Eudaimonic Interpersonal Lifesat Negative [Full output omitted] -----------------------------+---------------------------------------------------------------- cov(Eudaimonic,Interpersonal)| .8145113 .016024 50.83 0.000 .7831049 .8459178 cov(Eudaimonic,Lifesat)| .8613038 .013381 64.37 0.000 .8350775 .88753 cov(Eudaimonic,Negative)| -.7648549 .0235044 -32.54 0.000 -.8109226 -.7187872 cov(Interpersonal,Lifesat)| .8369258 .015206 55.04 0.000 .8071225 .866729 cov(Interpersonal,Negative)| -.6530577 .028078 -23.26 0.000 -.7080895 -.5980259 cov(Lifesat,Negative)| -.7470133 .0242314 -30.83 0.000 -.794506 -.6995207 ---------------------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(129) = 562.28 Prob > chi2 = 0.0000 . condisc Convergent and Discriminant Validity Assessment ------------------------------------------------------------------------------------------ Squared correlations (SC) among latent variables ------------------------------------------------------------------------------------------ Eudaimonic Interperso~l Lifesat Negative Eudaimonic 1.000 Interperso~l 0.663 1.000 Lifesat 0.742 0.700 1.000 Negative 0.585 0.426 0.558 1.000 ------------------------------------------------------------------------------------------ Average variance extracted (AVE) by latent variables ------------------------------------------------------------------------------------------ type mismatch r(109); end of do-file
Code:
estimates table eud eudint interp lifesat neg eudintlife eudintlifeneg, se --------------------------------------------------------------------------------------------------------- Variable | eud eudint interp lifesat neg eudintlife eudintli~g -------------+------------------------------------------------------------------------------------------- outcomevariable~h Eudaimonic | .36133661 .75222035 .85003712 .86322774 | .06790982 .1384507 .18293653 .19670361 Interperso~l | -.48209672 .11152458 -.4168596 -.42507065 | .1372821 .0661019 .15463278 .15633722 Lifesat | .25266663 -.21519739 -.16072192 | .08652639 .23852414 .25306004 Negative | -.22298494 .07071995 | .07563153 .1527385
Kind regards,
Tania