CFA and predict

fran

Join Date: Apr 2014

Posts: 39
#1

CFA and predict

13 Jun 2014, 09:57

Hi Statalisters,

I am interested in finding underling factors in my data so I split my sample into 2 sub-samples and conducted an EFA on the first sub-sample and a CFA in the second to check the model solution is a good fit to teh data - using Stata 12. I have a number of questions I am hoping you can help with.

1. The EFA indicates that there are 3 factors; one factor has 3 scores loading onto it but one of them also loads onto a different factor. Theoretically I feel that the scale should load onto the third factor so I decided to check using the CFA that this is a sensible solution - is this OK?

2. The CFA with the scale on the third factor offers a good fit (Chi2(41)=83.9, p<.001; RMSEA = .082, 95%CI: .06 to .11; CFI: .94 and TLI = .92). I would then like to extract the actual factors. In a standard factor analysis I could simply follow the factor command with predict, is there an equivalent command I can use after the 'sem' command?

I suppose I could still use factor followed by predict but in that case the extracted factor scores would be based on the fact that the one scale loads on factors 1 and 3. is there a way around any of this?

Thanks,
Fran
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

13 Jun 2014, 10:36

After doing the CFA in -sem- you can get factor scores with -predict- and the -latent- observation. See help sem post estimation.
Comment
fran

Join Date: Apr 2014

Posts: 39
#3

16 Jun 2014, 03:10

Thanks Clyde,

I ran the command as you suggested with the - latent - option

sem(F1 -> bQPR_intra bQPR_inter bHH_tot bMHCS_tot bWEM_tot ) ///
(F2 -> bBPRS_tot bCANSAS_unmet bMANSA_tot) ///
(F3 -> bGAF bHONOS_tot bCANSASw_unmet), stand

predict F1 F2 F3, latent

I noticed that the SD of the factors is > 1.

I thought extracted factors always had a mean of 0 and sd of 1. Is this not the case?

Thanks again,
Fran
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#4

19 Jun 2014, 21:55

No, it's not true that extracted factors have mean 0 and sd 1. You can impose those restrictions in your -sem- command if you want to, but the default way that Stata's -sem- identifies the CFA model is to fix the loading on the first variable in a factor at 1.
1 like
Comment
KateG

Join Date: Aug 2014

Posts: 2
#5

07 Aug 2014, 11:13

Hello, I would really appreciate an answer to the follow-up question. I see a lot of publications (many examples can be found in AMJ) running CFA to assess the factor structure and then conducting the main analysis with OLS based on factor scores. What would be the motivation to do that? Would the procedure recommended above help to obtain factor scores in this case or something else should be applied? Thanks, Kate
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#6

07 Aug 2014, 11:51

I think this is actually a very complicated question. I'll take a stab at it, but I predict that others will offer you conflicting views, and that all of us will be partly "right."

In some situations, the data being factor analyzed are heterogeneous and high-dimensional and the factor analysis is being done simply to produce a smaller number of variables for use in analysis, each of them having some content coherence. One can debate whether (when?) principal components analysis is the better approach to this problem. Suffice it to say that factor analysis is often used for that purpose. Once the smaller set of variables has been created, they are then used in regression analyses as predictors. Of course, conceptually it is wrong to use the predicted factor scores as independent variables in OLS regressions because doing so ignores the fact that the factor scores have errors associated with them.* Full structural equations modeling is more appropriate.

But OLS regression analysis is widely recognized. SEM is unfamiliar many disciplines. In somatic medicine (as opposed to psychiatry), for example, it is unknown to many readers and reviewers, and getting a paper based on SEM published in this area is extremely difficult, even though factor analysis is widely accepted. I conjecture that most of those readers and reviewers don't really understand factor analysis and don't realize that it is a special case of SEM. Alternate conjecture: many of those readers and reviewers don't realize that factor analysis is different from principal components analysis.

In short, I think what you have observed reflects confusion, in some disciplines, between factor analysis and principal components analysis, and a general low awareness of SEM.

By the way, what is AJM? Are you referring to the American Journal of Medicine, the American Journal of Management, the American Journal of Mathematics, or something else altogether?

*Of course, many variables that have no origins in factor analysis are also affected by error and are nevertheless used as predictors in OLS regression. We customarily turn a blind eye to such things.

Last edited by Clyde Schechter; 07 Aug 2014, 11:55. Reason: Clarify a sentence.
1 like
Comment
KateG

Join Date: Aug 2014

Posts: 2
#7

07 Aug 2014, 13:41

Dear Clyde,

Thanks a lot for your response. It is very interesting what you say about disciplines differences. Actually, AMJ refers to the Academy of Management Journal. I referred to the management field, where SEM is very common. The procedure is usually as follows. Take the measures developed by prior studies (so there is no real need to explore the factor structure with PCA, but rather to confirm an n-factor CFA model), conduct CFA and then run OLS on factor scores with all kinds of add-ons (interactions, use of many controls, etc.). I was thinking, can it be due to relative newness of e.g. moderation analysis in SEM (in the context of usually rather small sample sizes)? I was puzzled running across so many papers using the same scheme (CFA with previously established measures -> OLS) and neither of them really explains where factor scores come from (i.e. how they were constructed). As far as I know available SEM software, I found that only MPlus and Stata offer some kind of predicted scores in CFA. But a lot of the papers I came across use Lisrel or other software that does not have this function. So I was wondering, what kind of formula they might use to calculate the factor scores?

Regards,

Kate
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment