Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCA Predicted Component Scores and Standardization

    Stata Experts:

    I ran a PCA in Stata (15) for data reduction purposes. Based on Kaiser's Rule of One, there was a two component solution to my analysis. After predicting both PC-1 and PC-2 scores, the summarize command showed that PC-1 had a Mean of 0 and a SD=1. PC-2 similarly had a Mean of 0, but a SD of 1.64. I was under the impression that in Stata, the predicted component scores were standardized. Given PC-2 with a SD of 1.64, is it necessary to then z-score it? I can work with the 1.64 SD in terms of interpreting results. Also, is there a way to get raw scores for PCA components?
    Many thanks,
    Pat

  • #2
    Perhaps for clarification, is it incorrect to z-score a principal component score if it's SD is not equal to 1?

    Comment


    • #3
      Time zones and sleep delayed my reply to this, but my second thoughts are the same as my first thoughts. I don't see why you report such results.

      PC scores are scaled to have mean zero (modulo precision issues) but their SDs (precisely, their variances) necessarily match their eigenvalues (precisely, in the case of variances).

      So, contrary to the idea that SD should be 1 for all PCs, the PCs do, and should, vary in SD and in non-decreasing manner (that is, in principle, two or more PCs might have identical eigenvalues but a later PC will never have a higher eigenvalue than an earlier one).

      Wanting to standardize them afterwards is puzzling, but may match some purposes outside PCA.

      Here is my demo, and I would be interested to see yours, based on a reproducible example. Perhaps you are using something other than plain PCA.

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . pca displacement length weight trunk headroom
      
      Principal components/correlation                 Number of obs    =         74
                                                       Number of comp.  =          5
                                                       Trace            =          5
          Rotation: (unrotated = principal)            Rho              =     1.0000
      
          --------------------------------------------------------------------------
             Component |   Eigenvalue   Difference         Proportion   Cumulative
          -------------+------------------------------------------------------------
                 Comp1 |      3.76201        3.026             0.7524       0.7524
                 Comp2 |      .736006      .427915             0.1472       0.8996
                 Comp3 |      .308091      .155465             0.0616       0.9612
                 Comp4 |      .152627      .111357             0.0305       0.9917
                 Comp5 |     .0412693            .             0.0083       1.0000
          --------------------------------------------------------------------------
      
      Principal components (eigenvectors) 
      
          ------------------------------------------------------------------------------
              Variable |    Comp1     Comp2     Comp3     Comp4     Comp5 | Unexplained 
          -------------+--------------------------------------------------+-------------
          displacement |   0.4610   -0.3390    0.3484    0.7065   -0.2279 |           0 
                length |   0.4863   -0.2372   -0.1050   -0.5745   -0.6051 |           0 
                weight |   0.4842   -0.3329    0.0737   -0.2669    0.7603 |           0 
                 trunk |   0.4334    0.3665   -0.7676    0.2914    0.0612 |           0 
              headroom |   0.3587    0.7640    0.5224   -0.1209    0.0130 |           0 
          ------------------------------------------------------------------------------
      
      . predict double PC1-PC5
      (score assumed)
      
      Scoring coefficients 
          sum of squares(column-loading) = 1
      
          ----------------------------------------------------------------
              Variable |    Comp1     Comp2     Comp3     Comp4     Comp5 
          -------------+--------------------------------------------------
          displacement |   0.4610   -0.3390    0.3484    0.7065   -0.2279 
                length |   0.4863   -0.2372   -0.1050   -0.5745   -0.6051 
                weight |   0.4842   -0.3329    0.0737   -0.2669    0.7603 
                 trunk |   0.4334    0.3665   -0.7676    0.2914    0.0612 
              headroom |   0.3587    0.7640    0.5224   -0.1209    0.0130 
          ----------------------------------------------------------------
      
      . su PC?
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
               PC1 |         74    1.95e-16    1.939589  -3.267336   4.186176
               PC2 |         74   -1.74e-16     .857908   -2.35203   2.123084
               PC3 |         74   -2.58e-16    .5550596  -1.343443   1.092869
               PC4 |         74   -1.51e-16    .3906745  -.6236665   1.297837
               PC5 |         74   -4.62e-16    .2031484  -.6033618   .7704091
      
      . moments PC?
      
      -----------------------------------------------------------------------
                      n = 74 |       mean          SD    skewness    kurtosis
      -----------------------+-----------------------------------------------
      Scores for component 1 |      0.000       1.940       0.233       1.864
      Scores for component 2 |     -0.000       0.858      -0.097       2.982
      Scores for component 3 |     -0.000       0.555      -0.187       2.392
      Scores for component 4 |     -0.000       0.391       0.789       3.422
      Scores for component 5 |     -0.000       0.203       0.800       5.779
      -----------------------------------------------------------------------
      
      .
      Squaring the SDs will be found to reproduce the eigenvalues.

      moments here is from SSC and its use is incidental to the question, but it rounds results in a way I find congenial.

      Comment


      • #4
        Nick:
        Thank you so much for your response and for clarifying this. I clearly misunderstood the interpretation of the PCA components.
        I am using my PC-1 and PC-2 scores in a later regression analysis and was going to z-score them for that purpose? Before that however, I wanted to report the mean and SD for the principal components solution in the descriptive results section. The variables represent the frequency of time spent doing eight pre-selected activities; all are on the same scale of 1 (not at all) to 4 (everyday). Here is the code I am using:

        Thank you for moments... great command.


        [CODE]
        . pca $jlist, means

        Principal components/correlation Number of obs = 10,838
        Number of comp. = 8
        Trace = 8
        Rotation: (unrotated = principal) Rho = 1.0000

        --------------------------------------------------------------------------
        Component | Eigenvalue Difference Proportion Cumulative
        -------------+------------------------------------------------------------
        Comp1 | 2.69185 1.6796 0.3365 0.3365
        Comp2 | 1.01224 .141702 0.1265 0.4630
        Comp3 | .870541 .0819474 0.1088 0.5718
        Comp4 | .788593 .0501567 0.0986 0.6704
        Comp5 | .738436 .0332266 0.0923 0.7627
        Comp6 | .70521 .0259729 0.0882 0.8509
        Comp7 | .679237 .165343 0.0849 0.9358
        Comp8 | .513894 . 0.0642 1.0000
        --------------------------------------------------------------------------

        Principal components (eigenvectors)

        ------------------------------------------------------------------------------------
        Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7
        -------------+----------------------------------------------------------------------
        P1NATURE | 0.3568 0.3501 -0.1144 -0.2816 -0.1888 0.6349 -0.4672
        P1TELLST | 0.3914 -0.4068 -0.2957 -0.2878 -0.2086 -0.1759 0.0506
        P1READBK | 0.3814 -0.5555 -0.1940 -0.0761 -0.0335 -0.0313 -0.0234
        P1GAMES | 0.3665 0.2752 -0.1226 0.2673 0.3337 -0.5542 -0.5337
        P1BUILD | 0.3323 0.3868 -0.4721 0.1234 0.2714 0.1205 0.6414
        P1NUMBRS | 0.3320 -0.3251 0.4065 0.5174 0.3488 0.4190 0.0147
        P1SINGSO | 0.3153 0.1519 0.6013 -0.5979 0.2758 -0.1799 0.2151
        P1HLPART | 0.3459 0.2220 0.3117 0.3467 -0.7327 -0.1881 0.1896
        ------------------------------------------------------------------------------------

        --------------------------------------
        Variable | Comp8 | Unexplained
        -------------+----------+-------------
        P1NATURE | 0.0246 | 0
        P1TELLST | -0.6588 | 0
        P1READBK | 0.7071 | 0
        P1GAMES | -0.0127 | 0
        P1BUILD | 0.0477 | 0
        P1NUMBRS | -0.2317 | 0
        P1SINGSO | 0.0620 | 0
        P1HLPART | 0.0739 | 0
        --------------------------------------


        . predict PC_1 PC_2, score

        Scoring coefficients
        sum of squares(column-loading) = 1

        ----------------------------------
        Variable | Comp1 Comp2
        -------------+--------------------
        P1NATURE | 0.3568 0.3501
        P1TELLST | 0.3914 -0.4068
        P1READBK | 0.3814 -0.5555
        P1GAMES | 0.3665 0.2752
        P1BUILD | 0.3323 0.3868
        P1NUMBRS | 0.3320 -0.3251
        P1SINGSO | 0.3153 0.1519
        P1HLPART | 0.3459 0.2220
        ----------------------------------

        . rename PC_1 EMBED

        . rename PC_2 DIRECT

        . sum EMBED DIRECT,detail

        Scores for component 1
        -------------------------------------------------------------
        Percentiles Smallest
        1% -4.100942 -6.908451
        5% -2.808859 -6.495844
        10% -2.166925 -6.391741 Obs 10,838
        25% -1.079302 -6.391741 Sum of Wgt. 10,838

        50% .1020723 Mean -6.33e-10
        Largest Std. Dev. 1.640685
        75% 1.171331 3.532775
        90% 1.998173 3.532775 Variance 2.691846
        95% 2.660899 3.532775 Skewness -.3245814
        99% 3.532775 3.532775 Kurtosis 2.951721

        Scores for component 2
        -------------------------------------------------------------
        Percentiles Smallest
        1% -2.310845 -3.739807
        5% -1.645477 -3.569346
        10% -1.26984 -3.569346 Obs 10,838
        25% -.6837445 -3.224525 Sum of Wgt. 10,838

        50% -.007464 Mean -4.51e-10
        Largest Std. Dev. 1.006103
        75% .6579043 3.975946
        90% 1.262447 3.975946 Variance 1.012243
        95% 1.659585 4.053739 Skewness .0900174
        99% 2.433635 4.458495 Kurtosis 3.151503

        Comment

        Working...
        X