PCA: normalization and calculating the index

joan marc

Join Date: Nov 2015

Posts: 79
#1

PCA: normalization and calculating the index

09 Nov 2015, 05:38

Hi,

Two questions related to building an index out of PCA:

1) I thought that variables should be "normalized" before using pca, this is why I transformed them onto [0,1]. But when I arrive at the final stage (as far as I understand), according to "Postestimation tools for PCA and PCAmat", the standarization (mean 0 and variance 1) is done just before computing the index (see end page 15). When should it be done?

2) I would like to know if I can systemize the computing of the index using the factor weights (0.4011, 0.4210... in the mentioned example) for each observation

gen index = pc1*vector of variables

Thank you in advance

joan
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

09 Nov 2015, 07:01

There is no need to standardize variables before performing PCA. When Stata calculates the components using -predict- following -pca-, the results are centered at 0 (or very close to zero, with minor rounding error), but they are not standardized to variance 1. The variance of each component will instead be equal to the corresponding eigenvalue (again, with minimal rounding errors). The components are typically most useful when left that way. But if you have a need for standardized versions, you can standardize them yourself. -egen, std()- is probably the simplest way of doing that.

I don't understand what you want to do in question 2. Please provide a hand-worked example.
Comment
joan marc

Join Date: Nov 2015

Posts: 79
#3

26 Nov 2015, 03:57

Dear Clyde,
With regards to the normalization, it is clear to me now. Thank you
I was told to use factor instead of pca and to rotate the loadings. I attach the screenshot of my output. My question is now: how can I save the displayed numbers for further analysis?
Thank you

1 Photo
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#4

26 Nov 2015, 04:07

If you just want to use the factor loadings to create scores read the help file of predict:

Code:

help factor postestimation##predict

Last edited by Oded Mcdossi; 26 Nov 2015, 04:10.
Comment
joan marc

Join Date: Nov 2015

Posts: 79
#5

26 Nov 2015, 04:19

Hi Oded, thank for this fast response. The result of the code you say gives lots of different commands, can you explain a bit. Thank you
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#6

26 Nov 2015, 04:33

If you found your "best" solution for the factor analysis with (or without) rotation and now just want to save the scores for further analysis. Use the predict command. I suggest you to read the manual for factor and rotate.
Comment
joan marc

Join Date: Nov 2015

Posts: 79
#7

26 Nov 2015, 04:49

Hi Oded.
But I don't want to save all the vector, as predict f1 would make, but to take weights from one vector or the other, discretionally, and create my own vector

This is what I meant in my post before when I said

I attach the screenshot of my output. My question is now: how can I save the displayed numbers for further analysis?

Sorry if misunderstanding
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#8

26 Nov 2015, 05:09

O.K I think I understand what you want.
From a statistical point of view I think this is the wrong way to create scores. To my knowledge you should calculate the score using the weight of all variables and not just those that identify the meaning of the factor. In any case, Stata saves the rotated factor loading in e(r_L) so you can access the results and use it for your needs.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#9

26 Nov 2015, 08:27

Perhaps the user-written - nomolog - by Zlotnik and Abraira would fit your needs in terms of creating a score (now, after a logistic regression).

Here's the link to the article from Stata Journal: http://www.stata-journal.com/article...article=st0391

Hopefully that helps.

Best regards,

Marcos
Comment

Oded Mcdossi

Join Date: Jun 2014
Posts: 577

#10

27 Nov 2015, 05:48

Here is an example of a way to import the factor loadings into your data

Code:

clear*
webuse bg2
factor bg2cost1-bg2cost6
rotate
mat factor_s=e(r_L)
reshape long bg2cost,i(clinid)j(item)
tempfile factors
save `factors', replace
clear
svmat factor_s
g item=_n
merge 1:m item using `factors', nogen
sort clinid item
.    l in 1/6, sepby(clinid) 

    +---------------------------------------------------------------+
    factor_s1   factor_s2   factor_s3   item   clinid     bg2cost 
    ---------------------------------------------------------------
    1.   .4210681    .1279617   -.0637191      1        1   -1.915584 
    2.  -.0540779    .4716006   -.0690916      2        1    .9380358 
    3.  -.0519108    .5287738    .0291713      3        1   -.2946705 
    4.  -.1207441    .3530449    .1136311      4        1    .3302429 
    5.   .5115369   -.0972753    .0379866      5        1   -1.427679 
    6.   .5170078   -.1185958   -.0334809      6        1   -1.012556 
    +---------------------------------------------------------------+

Now you can manipulate the loadings by focusing only on those above a specific threshold.

Code:

foreach i of var factor_s* {
    replace `i'=. if `i'<.4
}
.    l in 1/6, sepby(clinid) 

    +------------------------------------------------------------+
    factor~1   factor~2   factor~3   item   clinid     bg2cost 
    ------------------------------------------------------------
    1.  .4210681          .          .      1        1   -1.915584 
    2.         .   .4716006          .      2        1    .9380358 
    3.         .   .5287738          .      3        1   -.2946705 
    4.         .          .          .      4        1    .3302429 
    5.  .5115369          .          .      5        1   -1.427679 
    6.  .5170078          .          .      6        1   -1.012556 
    +------------------------------------------------------------+

This is a good starting point for later calculation of scores based on factor loadings.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35433
#11

27 Nov 2015, 07:06

I wrote a program pcacoefsave (SSC) to save PCA results.
http://www.statalist.org/forums/foru...-for-pca-users

I kept clear of factor analysis; many of the tribal habits of factor analysts I don't understand or know about. But someone enthusiastic and knowledgeable might want to clone and extend that program for factor analysis.
Comment
joan marc

Join Date: Nov 2015

Posts: 79
#12

30 Nov 2015, 02:08

Hi @Oded Mcdossi
Questions derived
1. I understand the lines in the table

l in 1/6, sepby(clinid)

are the six variables (numbers on the left)? I would then understand the role of the three factors, but not the variables item clinidi and bg2cost. Maybe I just don't understand why you reshape.

2. How can I then export the final list into excel? I know putexcel but I guess this is only for table.

Sorry I edited because tried to insert the whole mentioned table

Last edited by joan marc; 30 Nov 2015, 02:36.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#13

30 Nov 2015, 02:21

Code:

help export excel
Comment
joan marc

Join Date: Nov 2015

Posts: 79
#14

30 Nov 2015, 03:21

Hi @Nick Cox

I'm sorry but I have extensively read help export and others and can't manage to export the results (not the data, not the results stored of the commands, not a specific regression... but the table resulting of the factor or rotate commands) of the tables. I am surprised because it should be straightforward right? Or I am very new on this

In some places I read that if you copy (as table) and paste it, it should work, but the cells collapse and they are not separated. Big mess.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#15

30 Nov 2015, 03:31

As I understand it, Oded has shown precisely the first step you need to put the results you want in new variables. He did indicate that you may need to do other calculations.

See the list as the last element in #10.

Hence my advice just to use export excel.
Comment

Announcement

PCA: normalization and calculating the index

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment