What's the difference between doing PCA and*"factor analysis using PCF"?

Graham Wright

Join Date: Nov 2021

Posts: 4
#1

What's the difference between doing PCA and*"factor analysis using PCF"?

12 Nov 2021, 12:13

Hi

Stata has a standalone command "pca" for doing principle components analysis. But it also has an option "pcf" for factor. I am one of the many people who are confused about what the difference is between these two commands, since a) some people claim that they are doing different things and b) they clearly produce (somewhat) different results.

This question has been asked multiple times here over the years and the answers here

https://www.statalist.org/forums/for...using-pcf-vs-p

and here

https://www.stata.com/statalist/arch.../msg00321.html)

seem to imply that

1) PCA is doing "real" PCA while factor, pcf is doing "factor analysis using principal component analysis for factor extraction" which are actually different things
2) In SPSS the only way to actually do "PCA " at all is to do "factor analysis using principal component analysis for factor extraction" - and this used to be true of Stata as well until the development of the PCA command.

But this still leaves me with some (related) questions

1) How exactly are PCA and "factor analysis using principal component analysis for factor extraction" different and why do they give such different answers for the loadings? I haven't seen any stats textbooks that make this distinction, only Stata.

2) Given that other programs (like SPSS) don't seem to make a distinction between these two approaches, how much does it really matter?

3) Are the different loadings I get for these two commands actually providing the same exact information in different form? If so, how are they related? If not, on what basis should I decide which to use?

Anyone have any insight on these questions?

If you are curious the differing results can be seen with the example data for "factor"

. webuse bg2
(Physician-cost data)

. factor bg2cost1-bg2cost6, pcf
(obs=568)

Factor analysis/correlation Number of obs = 568
Method: principal-component factors Retained factors = 2
Rotation: (unrotated) Number of params = 11

--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.70622 0.30334 0.2844 0.2844
Factor2 | 1.40288 0.49422 0.2338 0.5182
Factor3 | 0.90865 0.18567 0.1514 0.6696
Factor4 | 0.72298 0.05606 0.1205 0.7901
Factor5 | 0.66692 0.07456 0.1112 0.9013
Factor6 | 0.59236 . 0.0987 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
bg2cost1 | 0.3581 0.6279 | 0.4775
bg2cost2 | -0.4850 0.5244 | 0.4898
bg2cost3 | -0.5326 0.5725 | 0.3886
bg2cost4 | -0.4919 0.3254 | 0.6521
bg2cost5 | 0.6238 0.3962 | 0.4539
bg2cost6 | 0.6543 0.3780 | 0.4290
-------------------------------------------------

. pca bg2cost1-bg2cost6

Principal components/correlation Number of obs = 568
Number of comp. = 6
Trace = 6
Rotation: (unrotated = principal) Rho = 1.0000

--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.70622 .303339 0.2844 0.2844
Comp2 | 1.40288 .494225 0.2338 0.5182
Comp3 | .908652 .185673 0.1514 0.6696
Comp4 | .722979 .0560588 0.1205 0.7901
Comp5 | .66692 .074563 0.1112 0.9013
Comp6 | .592357 . 0.0987 1.0000
--------------------------------------------------------------------------

Principal components (eigenvectors)

----------------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | Unexplained
-------------+------------------------------------------------------------+-------------
bg2cost1 | 0.2741 0.5302 -0.2712 -0.7468 -0.0104 -0.1111 | 0
bg2cost2 | -0.3713 0.4428 -0.4974 0.2800 0.2996 0.5005 | 0
bg2cost3 | -0.4077 0.4834 0.0656 0.2466 -0.5649 -0.4646 | 0
bg2cost4 | -0.3766 0.2748 0.7266 -0.2213 0.4504 0.0538 | 0
bg2cost5 | 0.4776 0.3345 0.3829 0.1950 -0.3942 0.5657 | 0
bg2cost6 | 0.5009 0.3192 0.0144 0.4647 0.4824 -0.4453 | 0
----------------------------------------------------------------------------------------

.
You can see that PCA and factor PCF give identical eigenvalues, but very different loadings for the first two factors/components. Of course, the substantive "story" of the factors/components (in terms of positive/negative loadings)seem similar in both analyses.
Tags: None

1 like

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

12 Nov 2021, 12:53

They do the same things. The difference you see in the loadings is artifactual due to difference in choice of normalization. The eigenvectors you are seeing after -pca- are normalized to unit length, whereas the loadings you get from -factor, pcf- are normalized to the eigenvalue. If, after -pca- you run -estat loadings, cnorm(eigen)-, which normalizes to the eigenvalue, you get the same loadings that come out of -factor, pcf-:

Code:

. estat loadings, cnorm(eigen)

Principal component loadings (unrotated)
    component normalization: sum of squares(column) = eigenvalue

    --------------------------------------------------------------------------
                 |    Comp1     Comp2     Comp3     Comp4     Comp5     Comp6
    -------------+------------------------------------------------------------
        bg2cost1 |    .3581     .6279    -.2586     -.635  -.008511    -.0855
        bg2cost2 |    -.485     .5244    -.4741     .2381     .2447     .3852
        bg2cost3 |   -.5326     .5725    .06257     .2097    -.4613    -.3576
        bg2cost4 |   -.4919     .3254     .6926    -.1882     .3679    .04142
        bg2cost5 |    .6238     .3962      .365     .1658    -.3219     .4354
        bg2cost6 |    .6543      .378    .01368     .3951      .394    -.3428
    --------------------------------------------------------------------------

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

12 Nov 2021, 18:17

How exactly are PCA and "factor analysis using principal component analysis for factor extraction" different and why do they give such different answers for the loadings? I haven't seen any stats textbooks that make this distinction, only Stata.

Regardless of the textbooks you have available, there ares brief discussions comparing and contrasting Principal Components Analysis to Factor Analysis in the Wikipedia article on Principle Components Analysis at

https://en.wikipedia.org/wiki/Princi...actor_analysis

and in the Wikipedia Article on Factor Analysis at

https://en.wikipedia.org/wiki/Factor..._analysis_(PCA)

While the mechanics are the same, the philosophy - and the roots of statistics are in philosophy, the study of knowledge and how we know what we know, rather in mathematics - behind PCA and Factor Analysis differs, and that leads to different ways of thinking about their results.
3 likes
Comment
Graham Wright

Join Date: Nov 2021

Posts: 4
#4

14 Nov 2021, 17:54

William, I am aware of the conceptual and mathematical distinctions between "factor analysis" (in general) and "PCA." My question was whether there was a difference between "PCA" and "factor analysis using PCA for factor extraction." Previous comments on statalist implied that there is some sort of difference between these two procedures, but that didn't seem to make sense to me mathematically, since I always thought PCA could be treated as just a special case of EFA, where you set the communalities at 1. Clyde's answer seems to imply that my understanding is correct, at least as it involves these two commands - they both do the exact same thing and produce the exact same results, they just display the results differently due to their using different normalizations. This all now makes sense to me, although it might be good for Stata's documentation to make clear somewhere what the relationship between these two commands are...I couldn't seem to find any such reference in the documentation for either. In any case, thanks for the help in understanding this.

Last edited by Graham Wright; 14 Nov 2021, 17:58.
Comment
Stavros Vlachos

Join Date: Feb 2023

Posts: 1
#5

20 Mar 2024, 03:04

Hello all. Clyde thanks for your post. Very useful and clears a lot of the air around the two commands. Further to this comment I was wondering how you can code the varimax (or promax) rotation using the normalised, to eigenvalues, loadings in Stata. Any help will be appreciated!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

20 Mar 2024, 11:28

-help rotate-
1 like
Comment

Announcement

What's the difference between doing PCA and*"factor analysis using PCF"?

Comment

Comment

Comment

Comment

Comment