CFA Goodness of fit

Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#1

CFA Goodness of fit

04 Jun 2018, 04:50

Dear Statalist respected users,

I am trying to estimate a latent variable using 5 observed variables via CFA. the syntax was as follows:

sem (AC -> totalassets, ) (AC -> lev, ) (AC -> FCF, ) (AC -> logbm, ) (AC -> industries, ), method(adf) latent(AC ) cov( e.FCF*e.lev e.logbm*e.FCF) nocapslatent

The factor loadings are not very good:
0.41
0.20
0.11
0.70
0.28

the goodness of fit tests are excellent:

chi2_ms(3) = 2.775
P>chi2 = 0.428

RMSEA 0.000
CFI and TLI are 1.000 and 1.005

SRMR = 0.014

Can I consider my model as "good" and continue?

I tried deleting the variables with small factor loadings, but this led to a worse goodness of fit results.

Your recommendation, please.
Thanks a lot in advance.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

04 Jun 2018, 09:03

What do you mean when you say the factor loadings are not very good? Not very good in what sense? By what criterion? They are what they are. Did you have some reason to expect different results--perhaps an earlier study?

I wonder if you are thinking of the common practice in exploratory factor analysis where you are often looking to reduce the dimensionality of a data set and you tend to retain variables with loadings that exceed 0.4 (or some other such stipulated threshold). That does not apply here, and the word "loading" has different meanings in these two contexts. In exploratory factor analysis (e.g. the kind of thing you get from the -factor- command) that loading is the correlation between the factor and the variable. In particular, it is not sensitive to the scale of the variables.

By contrast, in CFA, the "loading" is a regression coefficient, and its magnitude depends on the scale of the variables. So the application of some arbitrary threshold like 0.4 makes no sense, and could, in any case, be gamed by changing the units.
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#3

05 Jun 2018, 04:37

Clyde Schechter
Thanks a lot for your reply.

Yes, as far as I understood from my reading, one of the reliability measurement of the latent variable is to retain the variables with coefficients (factor loadings) which are greater than 0.40. But, now I understand your point which makes sense. Thanks a lot for your contribution.

I have another question if you allow me.

is it possible to predict and extract latent variables and then reuse them as observed variables in path analysis?
for example, I have two latent variables, one works as an Independent variable and the other is a mediator. I predicted them first using CFA, then I used them in the path analysis, is this statistically correct? or I need to estimate the latent variables and test the mediation effect in one run?
Thanks a lot in advance.

Kind regards,
Mohammed
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

05 Jun 2018, 10:05

It is possible to get estimated values of the latent variables and then use those in later analyses as if they were measured variables. See -help sem_predict- for details.

But it is discouraged. The problem is that it, in effect, treats the values of the latent variables as if they were measured without error, which is far from the truth. It is generally better to simply extend the -sem- model to including the latent variables in additional structural equations.
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#5

06 Jun 2018, 02:59

Clyde Schechter
Thanks a lot for your reply. Much appreciated.
I tried to estimate the latent variables in addition to the structural equations in one run but the model did not converge. I made sure that I have enough variance in each variable but still, the model doesn't converge.
is it possible to have the following situation:
- Latent variables, individually, passed all the goodness of fit tests and when it comes to estimating the latent variables in addition to the structural equations the model doesn't converge?

Thanks a lot in advance.

Kind regards,
Mohammed
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

06 Jun 2018, 11:53

Originally posted by Mohammed Kasbar View Post

Clyde Schechter
Thanks a lot for your reply. Much appreciated.
I tried to estimate the latent variables in addition to the structural equations in one run but the model did not converge. I made sure that I have enough variance in each variable but still, the model doesn't converge.
is it possible to have the following situation:
- Latent variables, individually, passed all the goodness of fit tests and when it comes to estimating the latent variables in addition to the structural equations the model doesn't converge?

Thanks a lot in advance.

Kind regards,
Mohammed

SEM can be tricky due to the identification issues. It is possible to use the estimates from the original measurement-only model as start values, which might aid convergence if the only problem was Stata getting confused about start values, e.g.

Code:

sem (AC -> totalassets lev FCF logbm industries), method(adf) latent(AC) cov( e.FCF*e.lev e.logbm*e.FCF) nocapslatent matrix b = e(b) sem (AC -> totalassets lev FCF logbm industries) /// (AC -> `other_DVs'), /// method(adf) latent(AC) cov( e.FCF*e.lev e.logbm*e.FCF) nocapslatent from(b)

The second line saves the betas for all estimated parameters. The bolded code instructs Stata to fit that new model from the saved parameters. Where Stata can't find a parameter (e.g. if it's for a variable you just added), then I forget what it does as its default, but it will independently choose start values for those additional parameters.

As usual, it helps if you post your full code and any relevant output in code delimiters - they're easy to read and they can be cut and pasted directly into Stata. We might be able to roughly gauge if the model isn't identified. If you post the first or last, say, 20 repetitions of the iteration log, we can also tell roughly what's going on and maybe make a recommendation. Also, it will probably help you to read SEM intro 12; if you have an infinite iteration log that's repeatedly not concave and the log-likelihood isn't changing much, that's often a sign that the model is not identified.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#7

07 Jun 2018, 02:54

Weiwen Ng
Thanks a lot for your reply.
I allowed some covariances and the model successfully converged. However, the GOF is still poor.
I will apply your recommendation by using the estimated parameters of the latent variables as starting values.
Thanks a lot for your cooperation. Much appreciated

Mohammed
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#8

12 Jun 2018, 03:09

Weiwen Ng
I am trying to apply the code you recommended to ask Stata to use the previous estimations as starting values in another model but it does not run. I always receive this error message

initial vector: extra parameter c1 found
specify skip option if necessary

This is the command syntax I ran:

sem (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), method(adf) latent(AC ) cov( e.FCF*e.lev) nocapslatent ///
matrix c=e(c) ///
sem (CG -> BrdIn, ) (CG -> NCIn, ) (CG -> EDComp, ) (CG -> NEDCom, ) (CG -> FemaleNED, ) (CG -> Foreign, ) (CG -> wac_indfb, ) (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), covstruct(_lexogenous, diagonal) method(adf) latent(CG AC ) cov( e.NCIn*e.BrdIn e.EDComp*e.BrdIn e.NEDCom*e.EDComp e.Foreign*e.EDComp e.Foreign*e.FemaleNED e.FCF*e.lev) nocapslatent from(c)

is it possible to estimate the 2 latent variables AC and CG individually and save the estimated parameters and use them later when I estimate my system of equations to test for the mediation effect of interest?

Thank you
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

12 Jun 2018, 08:32

Originally posted by Mohammed Kasbar View Post

Weiwen Ng
I am trying to apply the code you recommended to ask Stata to use the previous estimations as starting values in another model but it does not run. I always receive this error message

initial vector: extra parameter c1 found
specify skip option if necessary

This is the command syntax I ran:

sem (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), method(adf) latent(AC ) cov( e.FCF*e.lev) nocapslatent ///
matrix c=e(c) ///
sem (CG -> BrdIn, ) (CG -> NCIn, ) (CG -> EDComp, ) (CG -> NEDCom, ) (CG -> FemaleNED, ) (CG -> Foreign, ) (CG -> wac_indfb, ) (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), covstruct(_lexogenous, diagonal) method(adf) latent(CG AC ) cov( e.NCIn*e.BrdIn e.EDComp*e.BrdIn e.NEDCom*e.EDComp e.Foreign*e.EDComp e.Foreign*e.FemaleNED e.FCF*e.lev) nocapslatent from(c)

is it possible to estimate the 2 latent variables AC and CG individually and save the estimated parameters and use them later when I estimate my system of equations to test for the mediation effect of interest?

Thank you

What you describe should be possible. Inspecting your syntax, I can't see which parameter would be the extra one. Stata was merely telling you that it can't find a match for the estimated parameter; this could happen if you had, for example, an extra indicator for the variable AC. You don't seem to. I'd merely specify the -skip- option. That option just skips any parameters in the start matrices that aren't found in the current command.

You can save multiple matrices of start values like this:

Code:

sem (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), method(adf) latent(AC ) cov( e.FCF*e.lev) nocapslatent matrix c=e(b) sem (CG -> BrdIn, ) (CG -> NCIn, ) (CG -> EDComp, ) (CG -> NEDCom, ) (CG -> FemaleNED, ) (CG -> Foreign, ) (CG -> wac_indfb, ), method(adf) latent (CG) nocapslatent cov(e.NCIn*e.BrdIn e.EDComp*e.BrdIn e.NEDCom*e.EDComp e.Foreign*e.EDComp e.Foreign*e.FemaleNED) matrix d = e(b) sem (CG -> BrdIn, ) (CG -> NCIn, ) (CG -> EDComp, ) (CG -> NEDCom, ) (CG -> FemaleNED, ) (CG -> Foreign, ) (CG -> wac_indfb, ) (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), covstruct(_lexogenous, diagonal) method(adf) latent(CG AC ) cov( e.NCIn*e.BrdIn e.EDComp*e.BrdIn e.NEDCom*e.EDComp e.Foreign*e.EDComp e.Foreign*e.FemaleNED e.FCF*e.lev) nocapslatent from(b c) skip

Do note the recommendation to present code and results in the code delimiters (see my signature for how). It's much easier to read!

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Hassen Ali

Join Date: May 2018

Posts: 39
#10

12 Jun 2018, 08:57

Thank you all, I have learned a lot from your posts.
Cheers ,Hassen
Comment
Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#11

13 Jun 2018, 05:22

Weiwen Ng
Thanks a lot for your contribution. Your recommendation makes sense but Stata is still showing the same error message!

I saved the parameters of the estimation of each latent variable then I used the option from and I received the same previous error message

initial vector: extra parameter c1 found
specify skip option if necessary

Then, I used the skip option but Stata showed me a different error message stating that skip option is not allowed.

Thanks a lot for your contribution again.
Much appreciated.
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

#12

13 Jun 2018, 11:53

First, error in my syntax. Skip is a sub-option to the from option. So, this syntax is correct:

Code:

 
 sem (CG -> BrdIn, ) (CG -> NCIn, ) (CG -> EDComp, ) (CG -> NEDCom, ) (CG -> FemaleNED, ) (CG -> Foreign, ) (CG -> wac_indfb, ) (AC -> ta, ) (AC -> lev, ) (AC -> FCF, ) (AC -> industries, ), covstruct(_lexogenous, diagonal) method(adf) latent(CG AC ) cov( e.NCIn*e.BrdIn e.EDComp*e.BrdIn e.NEDCom*e.EDComp e.Foreign*e.EDComp e.Foreign*e.FemaleNED e.FCF*e.lev) nocapslatent from(b c, skip)

Assuming you reported the exact syntax you used, I still can't figure out what the parameter c1 would refer to. If you like, you can list the matrices you saved, e.g. (note the extra error covariance parameter in bold):

Code:

matrix list b
matrix list c

That will give you a potentially long list of coefficients that look cryptically named. For example,

Code:

use http://www.stata-press.com/data/r15/sem_hcfa1
sem (Phys -> phyab1 phyab2 phyab3 phyab4)
matrix phys = e(b)
sem (Appear -> appear1 appear2 appear3 appear4), cov(e.appear1*e.appear2)
mat appear = e(b)

sem (Phys -> phyab1 phyab2 phyab3 phyab4) (Appear -> appear1 appear2 appear3 appear4), from(phys appear)
initial vector: extra parameter /cov(e.appear1,e.appear2) found
specify skip option if necessary

sem (Phys -> phyab1 phyab2 phyab3 phyab4) (Appear -> appear1 appear2 appear3 appear4), from(phys appear, skip)

The last line is the correct line. The second-last line doesn't run and produces an error message, but the parameter name is informative, as you can see if you inspect the matrix involved:

Code:

mat list appear

appear[1,14]
          appear1:       appear1:       appear2:       appear2:       appear3:
                                                                              
           Appear          _cons         Appear          _cons         Appear
y1              1           7.41      1.0491581              7      1.2595212

          appear3:       appear4:       appear4:             /:             /:
                                                                              
            _cons         Appear          _cons  var(e.appe~1)  var(e.appe~2)
y1           7.17      1.0977995            7.4      2.7366053      3.7940695

                /:             /:             /:             /:
                                                 cov(e.appe~1,
    var(e.appe~3)  var(e.appe~4)    var(Appear)     e.appear2)
y1      1.8153746      2.1791344      2.7171796      1.5290572

So, your list of parameters is going to be longer. You should be able to re-specify the sem command to skip the parameter (apologies for my error, there; I've never used the skip option!). If you need to inspect things to drop the unnecessary parameter, this is what you'll need to do. I suspect you may have run your program with an extra variable inserted somewhere, but I'm not certain (usually, a stray parameter would be prefixed by the equation name, or if it's a covariance parameter then there it's clearly marked as such). Hope this syntax works.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

Mohammed Kasbar

Join Date: Apr 2017

Posts: 56
#13

14 Jun 2018, 04:03

Weiwen Ng
Thanks a lot indeed for the time and effort you put forward.
I am at a conference at the moment, I will try your recommendation and hopefully, it will work this time. Thanks a lot again.
Comment
John Sloan

Join Date: Mar 2019

Posts: 2
#14

21 Mar 2019, 08:53

Hello Statalist members:

A colleague and I are using Version 15 to run zinbcv regression models using the following syntax:

zinbcv init_count stable_dem1 stable_aut1 redem1 redem1_pautdur1 redem1_pdemdur1 growth prop_demsregion1 riots1, inflate(stable_dem1 stable_aut1 redem1 redem1_pautdur1 redem1_pdemdur1 growth prop_demsregion1 riots1) vuong nolog

When we execute the command, we receive the following error:

initial vector: extra parameter lnalpha:_cons found
specify skip option if necessary

Would appreciate any suggestions as to what this error means and how to address it.

Thank you!
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

21 Mar 2019, 12:24

John Sloan
Your post in #14 has no obvious connection to the topic of this thread. It is important to keep threads on topic because many people come to search the Forum looking for answers to questions that may have already been answered. If a person comes here searching for advice about goodness of fit for CFA, they will waste their time reading your post. If a person comes looking for advice about using -zinbcv-, they will not find it!

Also -zinbcv- is not part of official Stata. So in posting a question about it, it is helpful to explain what it is and where it comes from.

So please repost this as a new topic. Thank you.
Comment

Announcement