Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • pcacoefsave available from SSC (a utility for pca users)

    Thanks as usual to Kit Baum, a new package pcacoefsave may be downloaded from SSC using

    Code:
    ssc inst pcacoefsave
    The aim is simple: to allow saving certain key results from PCA as obtained with the pca command to a new dataset, thereby making much easier various kinds of tabulation and graphing.

    Stata 9 is required.

    As commented in the help, the command does not extend to factor analysis. Users interested in factor analysis are likely to be using SEM any way and what they need or want may well be quite different and in any case beyond my experience.

    Here is a simple example. We first throw various variables related to size in some sense from the auto data into a pca:

    Code:
     
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . pca headroom trunk weight length displacement
    
    Principal components/correlation                 Number of obs    =         74
                                                     Number of comp.  =          5
                                                     Trace            =          5
        Rotation: (unrotated = principal)            Rho              =     1.0000
    
        --------------------------------------------------------------------------
           Component |   Eigenvalue   Difference         Proportion   Cumulative
        -------------+------------------------------------------------------------
               Comp1 |      3.76201        3.026             0.7524       0.7524
               Comp2 |      .736006      .427915             0.1472       0.8996
               Comp3 |      .308091      .155465             0.0616       0.9612
               Comp4 |      .152627      .111357             0.0305       0.9917
               Comp5 |     .0412693            .             0.0083       1.0000
        --------------------------------------------------------------------------
    
    Principal components (eigenvectors) 
    
        ------------------------------------------------------------------------------
            Variable |    Comp1     Comp2     Comp3     Comp4     Comp5 | Unexplained 
        -------------+--------------------------------------------------+-------------
            headroom |   0.3587    0.7640    0.5224   -0.1209    0.0130 |           0 
               trunk |   0.4334    0.3665   -0.7676    0.2914    0.0612 |           0 
              weight |   0.4842   -0.3329    0.0737   -0.2669    0.7603 |           0 
              length |   0.4863   -0.2372   -0.1050   -0.5745   -0.6051 |           0 
        displacement |   0.4610   -0.3390    0.3484    0.7065   -0.2279 |           0 
        ------------------------------------------------------------------------------
    
    . pcacoefsave using pca_results
    file pca_results.dta saved
    
    . use pca_results
    
    . describe 
    
    Contains data from pca_results.dta
      obs:            25                          
     vars:             8                          1 Jun 2015 11:21
     size:           575                          
    ---------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------------------
    varname         byte    %12.0g     names      variable
    varlabel        byte    %22.0g     labels     variable
    PC              byte    %8.0g                 
    corr            float   %9.0g                 correlation
    loading         float   %9.0g                 coefficient
    eigenvalue      float   %9.0g                 
    mean            float   %9.0g                 
    SD              float   %9.0g                 standard deviation
    ---------------------------------------------------------------------------------------
    Sorted by:
    One thing I like to do, which doesn't seem common in texts or papers, is to look at the correlations between the original variables and the components. I don't need the correlations between the PCs because these are 1 or 0 by definition. After this new command, it's a straightforward tabulation:

    Code:
     
    . l PC varname corr 
    
         +-------------------------------+
         | PC        varname        corr |
         |-------------------------------|
      1. |  1       headroom    .6957921 |
      2. |  2       headroom    .6554101 |
      3. |  3       headroom    .2899519 |
      4. |  4       headroom   -.0472426 |
      5. |  5       headroom    .0026352 |
         |-------------------------------|
      6. |  1          trunk    .8405304 |
      7. |  2          trunk    .3144061 |
      8. |  3          trunk   -.4260833 |
      9. |  4          trunk    .1138242 |
     10. |  5          trunk    .0124329 |
         |-------------------------------|
     11. |  1         weight     .939158 |
     12. |  2         weight   -.2856239 |
     13. |  3         weight    .0409204 |
     14. |  4         weight   -.1042662 |
     15. |  5         weight    .1544515 |
         |-------------------------------|
     16. |  1         length    .9432383 |
     17. |  2         length   -.2035082 |
     18. |  3         length   -.0582883 |
     19. |  4         length   -.2244516 |
     20. |  5         length   -.1229222 |
         |-------------------------------|
     21. |  1   displacement    .8942441 |
     22. |  2   displacement   -.2908539 |
     23. |  3   displacement     .193391 |
     24. |  4   displacement    .2760232 |
     25. |  5   displacement   -.0462885 |
         +-------------------------------+
    
    . tabdisp varname PC, cell(corr) format(%4.3f)
    
    -----------------------------------------------------
                 |                   PC                  
        variable |      1       2       3       4       5
    -------------+---------------------------------------
        headroom |  0.696   0.655   0.290  -0.047   0.003
           trunk |  0.841   0.314  -0.426   0.114   0.012
          weight |  0.939  -0.286   0.041  -0.104   0.154
          length |  0.943  -0.204  -0.058  -0.224  -0.123
    displacement |  0.894  -0.291   0.193   0.276  -0.046
    -----------------------------------------------------
    Another thing I often do is plot loadings in a particular way, as already documented in eofplot (SSC). Another way of getting that easily is to present the data as panel data:

    Code:
     
    . xtset PC varname
           panel variable:  PC (strongly balanced)
            time variable:  varname, 1 to 5
                    delta:  1 unit
    
    . xtline loading, overlay xla(, valuelabel) recast(connected) legend(pos(3) col(1)) yla(, ang(h))

  • #2
    Based on the above outcomes, is it correct to say that PC1 captures most of the total sample variance? Therefore in future approaches, can it be considered as an alternative to this illustrative variables?

    Thanks,
    Maria

    Comment


    • #3
      I assume you're referring to the example above. Here PC1 captures 75% of the total variance based on standardised variables. Objectively, that's "most", indeed.

      More crucially, whether PC1 is an adequate substitute for the original variables is a substantive decision for researchers. I can readily imagine different researchers jumping either way, some saying "Not enough", some saying "That's fine".

      Comment

      Working...
      X