Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What's the difference between doing PCA and*"factor analysis using PCF"?

    Hi

    Stata has a standalone command "pca" for doing principle components analysis. But it also has an option "pcf" for factor. I am one of the many people who are confused about what the difference is between these two commands, since a) some people claim that they are doing different things and b) they clearly produce (somewhat) different results.

    This question has been asked multiple times here over the years and the answers here

    https://www.statalist.org/forums/for...using-pcf-vs-p

    and here

    https://www.stata.com/statalist/arch.../msg00321.html)

    seem to imply that

    1) PCA is doing "real" PCA while factor, pcf is doing "factor analysis using principal component analysis for factor extraction" which are actually different things
    2) In SPSS the only way to actually do "PCA " at all is to do "factor analysis using principal component analysis for factor extraction" - and this used to be true of Stata as well until the development of the PCA command.

    But this still leaves me with some (related) questions

    1) How exactly are PCA and "factor analysis using principal component analysis for factor extraction" different and why do they give such different answers for the loadings? I haven't seen any stats textbooks that make this distinction, only Stata.

    2) Given that other programs (like SPSS) don't seem to make a distinction between these two approaches, how much does it really matter?

    3) Are the different loadings I get for these two commands actually providing the same exact information in different form? If so, how are they related? If not, on what basis should I decide which to use?

    Anyone have any insight on these questions?


    If you are curious the differing results can be seen with the example data for "factor"


    . webuse bg2
    (Physician-cost data)

    . factor bg2cost1-bg2cost6, pcf
    (obs=568)

    Factor analysis/correlation Number of obs = 568
    Method: principal-component factors Retained factors = 2
    Rotation: (unrotated) Number of params = 11

    --------------------------------------------------------------------------
    Factor | Eigenvalue Difference Proportion Cumulative
    -------------+------------------------------------------------------------
    Factor1 | 1.70622 0.30334 0.2844 0.2844
    Factor2 | 1.40288 0.49422 0.2338 0.5182
    Factor3 | 0.90865 0.18567 0.1514 0.6696
    Factor4 | 0.72298 0.05606 0.1205 0.7901
    Factor5 | 0.66692 0.07456 0.1112 0.9013
    Factor6 | 0.59236 . 0.0987 1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000

    Factor loadings (pattern matrix) and unique variances

    -------------------------------------------------
    Variable | Factor1 Factor2 | Uniqueness
    -------------+--------------------+--------------
    bg2cost1 | 0.3581 0.6279 | 0.4775
    bg2cost2 | -0.4850 0.5244 | 0.4898
    bg2cost3 | -0.5326 0.5725 | 0.3886
    bg2cost4 | -0.4919 0.3254 | 0.6521
    bg2cost5 | 0.6238 0.3962 | 0.4539
    bg2cost6 | 0.6543 0.3780 | 0.4290
    -------------------------------------------------

    . pca bg2cost1-bg2cost6

    Principal components/correlation Number of obs = 568
    Number of comp. = 6
    Trace = 6
    Rotation: (unrotated = principal) Rho = 1.0000

    --------------------------------------------------------------------------
    Component | Eigenvalue Difference Proportion Cumulative
    -------------+------------------------------------------------------------
    Comp1 | 1.70622 .303339 0.2844 0.2844
    Comp2 | 1.40288 .494225 0.2338 0.5182
    Comp3 | .908652 .185673 0.1514 0.6696
    Comp4 | .722979 .0560588 0.1205 0.7901
    Comp5 | .66692 .074563 0.1112 0.9013
    Comp6 | .592357 . 0.0987 1.0000
    --------------------------------------------------------------------------

    Principal components (eigenvectors)

    ----------------------------------------------------------------------------------------
    Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | Unexplained
    -------------+------------------------------------------------------------+-------------
    bg2cost1 | 0.2741 0.5302 -0.2712 -0.7468 -0.0104 -0.1111 | 0
    bg2cost2 | -0.3713 0.4428 -0.4974 0.2800 0.2996 0.5005 | 0
    bg2cost3 | -0.4077 0.4834 0.0656 0.2466 -0.5649 -0.4646 | 0
    bg2cost4 | -0.3766 0.2748 0.7266 -0.2213 0.4504 0.0538 | 0
    bg2cost5 | 0.4776 0.3345 0.3829 0.1950 -0.3942 0.5657 | 0
    bg2cost6 | 0.5009 0.3192 0.0144 0.4647 0.4824 -0.4453 | 0
    ----------------------------------------------------------------------------------------

    .
    You can see that PCA and factor PCF give identical eigenvalues, but very different loadings for the first two factors/components. Of course, the substantive "story" of the factors/components (in terms of positive/negative loadings)seem similar in both analyses.





  • #2
    They do the same things. The difference you see in the loadings is artifactual due to difference in choice of normalization. The eigenvectors you are seeing after -pca- are normalized to unit length, whereas the loadings you get from -factor, pcf- are normalized to the eigenvalue. If, after -pca- you run -estat loadings, cnorm(eigen)-, which normalizes to the eigenvalue, you get the same loadings that come out of -factor, pcf-:
    Code:
    . estat loadings, cnorm(eigen)
    
    Principal component loadings (unrotated)
        component normalization: sum of squares(column) = eigenvalue
    
        --------------------------------------------------------------------------
                     |    Comp1     Comp2     Comp3     Comp4     Comp5     Comp6
        -------------+------------------------------------------------------------
            bg2cost1 |    .3581     .6279    -.2586     -.635  -.008511    -.0855
            bg2cost2 |    -.485     .5244    -.4741     .2381     .2447     .3852
            bg2cost3 |   -.5326     .5725    .06257     .2097    -.4613    -.3576
            bg2cost4 |   -.4919     .3254     .6926    -.1882     .3679    .04142
            bg2cost5 |    .6238     .3962      .365     .1658    -.3219     .4354
            bg2cost6 |    .6543      .378    .01368     .3951      .394    -.3428
        --------------------------------------------------------------------------

    Comment


    • #3
      How exactly are PCA and "factor analysis using principal component analysis for factor extraction" different and why do they give such different answers for the loadings? I haven't seen any stats textbooks that make this distinction, only Stata.
      Regardless of the textbooks you have available, there ares brief discussions comparing and contrasting Principal Components Analysis to Factor Analysis in the Wikipedia article on Principle Components Analysis at

      https://en.wikipedia.org/wiki/Princi...actor_analysis

      and in the Wikipedia Article on Factor Analysis at

      https://en.wikipedia.org/wiki/Factor..._analysis_(PCA)

      While the mechanics are the same, the philosophy - and the roots of statistics are in philosophy, the study of knowledge and how we know what we know, rather in mathematics - behind PCA and Factor Analysis differs, and that leads to different ways of thinking about their results.

      Comment


      • #4
        William, I am aware of the conceptual and mathematical distinctions between "factor analysis" (in general) and "PCA." My question was whether there was a difference between "PCA" and "factor analysis using PCA for factor extraction." Previous comments on statalist implied that there is some sort of difference between these two procedures, but that didn't seem to make sense to me mathematically, since I always thought PCA could be treated as just a special case of EFA, where you set the communalities at 1. Clyde's answer seems to imply that my understanding is correct, at least as it involves these two commands - they both do the exact same thing and produce the exact same results, they just display the results differently due to their using different normalizations. This all now makes sense to me, although it might be good for Stata's documentation to make clear somewhere what the relationship between these two commands are...I couldn't seem to find any such reference in the documentation for either. In any case, thanks for the help in understanding this.
        Last edited by Graham Wright; 14 Nov 2021, 17:58.

        Comment


        • #5
          Hello all. Clyde thanks for your post. Very useful and clears a lot of the air around the two commands. Further to this comment I was wondering how you can code the varimax (or promax) rotation using the normalised, to eigenvalues, loadings in Stata. Any help will be appreciated!

          Comment


          • #6
            -help rotate-

            Comment

            Working...
            X