Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate Loading/ weight of a variable on the Principal Component


    I want to calculate a composite index (similar to the Human Development Index). For this, I want to reduce the 26 variables I have (like gdp, ndp, per capita income, access to water etc.) by grouping these on the basis of the results of PCA. I have conducted PCA but am unable to make sense of how can one calculate the loading/ weight a variable has on a principal component? A lot of sources mention that eigen vectors are the loadings. Is that true? If yes, do we consider the absolute value of the eigen vector? (considering I want the index values to lie between 0 and 1). I want to use these loadings for constructing a component index. Here are the results of the pca.



  • #2
    If you want a single index out of a PCA, then the first PC is what you need and there is no need to construct it for yourself as predict will do that for you after pca

    I have to warn that nothing guarantees that the result will not be a meaningless mishmash. I am already alarmed from your description that gdp, ndp, per capita income mush together two and a half versions of very similar things.

    https://stats.stackexchange.com/ques...one-or-another may help on your technical questions.

    Comment


    • #3
      Okay. Exactly my point. Variables like GDP, NDP, per capita income are quite similar to each other, and basically represent the same wider group, say "Economic variables". So, can the results from PCA actually help me in grouping other variables (26 in my case) together as well, like factor analysis does? In other words, would it be right if, say from PC1, I pick up variables with high values of eigenvectors, group them together, and label them. From PC2, I pick up other set of variables with high values of eigenvectors, and label them (similar to how groups are extracted using factor analysis)? Or is it like since PC1 would be a linear combination of all the 26 variables in the dataset, I cannot choose a few?

      Comment


      • #4
        Sorry, but I can't easily improve on my statement in #2. There is some logic to choosing smaller groups of variables that are closely related. It's your choice depending on your substantive goals.

        Comment


        • #5
          Nick cox can you explain me please

          Comment


          • #6
            If #3 was not clear to you then sorry, I can't think how to restate it well for your circumstances, especially as you don't explain what those circumstances are.

            Just about every good multivariate text cautions that PCA gives no guarantee that it will help if you have a bundle of variables all rather loosely related. PCA can be defended as an exploratory data analysis method but when people want, as it were, one good summary in place of several candidate variables. I think they are often better advised to think hard and just choose one of those variables directly or use other methods to select predictors.

            Comment


            • #7
              Does the CPA do the same with panel data? Also, after the pca command, we have the predict factor command but what I don't understand is is it still factor1? It depends on what ?
              Is it possible to have factor 2?

              Comment


              • #8
                CPA is I guess PCA, perhaps abbreviated in the French manner. The pca command will accept panel data but not treat it in any special way; you would need to think about that. I think that is, under different terminology, a common application in climatology and oceanography.

                Please read the help and try the examples to see that the number of arguments to predict after pca gives you the same number of variables containing component (not factor) scores.

                Code:
                Setup
                        . sysuse auto
                        . pca trunk weight length headroom
                    
                    Statistics
                        . estat residuals, fitted
                        . estat loadings, cnorm(eigen)
                
                    
                    Individual scores for the components are obtained via predict
                        . predict f1
                        . drop f1
                        . predict f1 f2
                        . drop f1 f2
                        . predict f1-f4

                Comment

                Working...
                X