Principal component analysis

Nick Cox

Join Date: Mar 2014

Posts: 35436
#16

09 Dec 2019, 02:10

Santosh Pathak Although you're new to the thread several of the previous replies appear to apply to your question.

I agree with Clyde Schechter and add a few further comments.

In general, if you are using PCA then PC1 is the best single summary of the input variables and you can't improve on it by mushing it together with other PCs. This is a surprisingly common fallacy.

Something like

Code:

predict PC1

calculates the PC 1 scores after pca; there is no (zero!) need for any other calculation.

There is a broader question of whether PCA is a good idea here. I don't work in this field but I've seen several questions on Statalist that evidently are from students trying to follow earlier papers that used PCA to get some of the predictors.

That the first PC for 30 indicators captures 11% of the total variation is evidently disappointing to you but I don't find it at all surprising for the kind of data I guess you have.
1 like
Comment
Santosh Pathak

Join Date: Jul 2019

Posts: 11
#17

17 Dec 2019, 16:00

Thank you so much Nick Cox and Clyde Schechter. It is more clear now.

Clyde Schechter: I am using code with the outline:

local x x1 x2 x3 x4
foreach x of local x{
egen "n_`x'" = std(`x')
}

pca n_x1 - n_x4

egen index = (component 1 score of x1)* n_x1 + ......+ (component 1 score of x4)* n_x4

Regards,
Santosh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#18

18 Dec 2019, 02:26

It's my turn not to understand, The implication of #2 #4 #16 and other posts in this thread by Clyde Schechter and myself is that PC1 is the best single summary of a bunch of variables using correlation structure or covariance structure as your guide. You can get that PC1 as scores directly using predict after pca.

Even allowing that your code outline isn't all literal, note that egen is illegal for what I think you want and that generate would be unnecessary, given as said that predict will do it for you.

Your word equation also uses the term score incorrectly,

There are entire books on PCA and a chapter on PCA is also customary in survey texts on multivariate statistics. If you're going to use PCA at all, and I am not clear that you really need it, it's worth finding a congenial text to get all this straight.
1 like
Comment
Santosh Pathak

Join Date: Jul 2019

Posts: 11
#19

18 Dec 2019, 15:20

Thank you Nick Cox for your kind suggestions.
Comment

Hadi Kahalzadeh

Join Date: Nov 2019
Posts: 23

#20

05 Oct 2020, 16:18

Clyde Schechter and @Nick Cox, thank you for your explanations. I have the same question that Stephanie posed here. I am trying to calculate a wealth score for 25 variables. When I use

Code:

predict wealthscore

it generates a score for pc1. When I use

Code:

predict pc1 pc2, score

it generates two separate columns.

I was wondering how I can generate one score that covers both pc1 & pc2. For instance, Can I have a wealthscore that covers both pc1 and pc2

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
wealthscore      pc1           pc2
  1.9262853   1.9262853    .961527
  3.3533766   3.3533766  .51521915
  1.3404053   1.3404053   .3001724
  .53514564   .53514564   .4931327
  -1.274623   -1.274623  .10186155
  -.3982685   -.3982685   .9355345
 -1.8886043  -1.8886043   .5828356
  -3.481718   -3.481718  -1.284943
 -.26558822  -.26558822    .821162
 -1.4841796  -1.4841796 -.16323125
   .7865319    .7865319   .9410244
 -.58590585  -.58590585  .18407133
end

Last edited by Hadi Kahalzadeh; 05 Oct 2020, 16:50.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35436
#21

05 Oct 2020, 17:01

#20 is the same question as asked and answered in several posts already in this thread.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment