Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCA: normalization and calculating the index

    Hi,

    Two questions related to building an index out of PCA:

    1) I thought that variables should be "normalized" before using pca, this is why I transformed them onto [0,1]. But when I arrive at the final stage (as far as I understand), according to "Postestimation tools for PCA and PCAmat", the standarization (mean 0 and variance 1) is done just before computing the index (see end page 15). When should it be done?

    2) I would like to know if I can systemize the computing of the index using the factor weights (0.4011, 0.4210... in the mentioned example) for each observation

    gen index = pc1*vector of variables

    Thank you in advance

    joan

  • #2
    There is no need to standardize variables before performing PCA. When Stata calculates the components using -predict- following -pca-, the results are centered at 0 (or very close to zero, with minor rounding error), but they are not standardized to variance 1. The variance of each component will instead be equal to the corresponding eigenvalue (again, with minimal rounding errors). The components are typically most useful when left that way. But if you have a need for standardized versions, you can standardize them yourself. -egen, std()- is probably the simplest way of doing that.

    I don't understand what you want to do in question 2. Please provide a hand-worked example.

    Comment


    • #3
      Dear Clyde,
      With regards to the normalization, it is clear to me now. Thank you
      I was told to use factor instead of pca and to rotate the loadings. I attach the screenshot of my output. My question is now: how can I save the displayed numbers for further analysis?
      Thank you

      Comment


      • #4
        If you just want to use the factor loadings to create scores read the help file of predict:
        Code:
        help factor postestimation##predict
        Last edited by Oded Mcdossi; 26 Nov 2015, 04:10.

        Comment


        • #5
          Hi Oded, thank for this fast response. The result of the code you say gives lots of different commands, can you explain a bit. Thank you

          Comment


          • #6
            If you found your "best" solution for the factor analysis with (or without) rotation and now just want to save the scores for further analysis. Use the predict command. I suggest you to read the manual for factor and rotate.

            Comment


            • #7
              Hi Oded.
              But I don't want to save all the vector, as predict f1 would make, but to take weights from one vector or the other, discretionally, and create my own vector

              This is what I meant in my post before when I said
              I attach the screenshot of my output. My question is now: how can I save the displayed numbers for further analysis?
              Sorry if misunderstanding

              Comment


              • #8
                O.K I think I understand what you want.
                From a statistical point of view I think this is the wrong way to create scores. To my knowledge you should calculate the score using the weight of all variables and not just those that identify the meaning of the factor. In any case, Stata saves the rotated factor loading in e(r_L) so you can access the results and use it for your needs.

                Comment


                • #9
                  Perhaps the user-written - nomolog - by Zlotnik and Abraira would fit your needs in terms of creating a score (now, after a logistic regression).

                  Here's the link to the article from Stata Journal: http://www.stata-journal.com/article...article=st0391

                  Hopefully that helps.
                  Best regards,

                  Marcos

                  Comment


                  • #10
                    Here is an example of a way to import the factor loadings into your data
                    Code:
                    clear*
                    webuse bg2
                    factor bg2cost1-bg2cost6
                    rotate
                    mat factor_s=e(r_L)
                    reshape long bg2cost,i(clinid)j(item)
                    tempfile factors
                    save `factors', replace
                    clear
                    svmat factor_s
                    g item=_n
                    merge 1:m item using `factors', nogen
                    sort clinid item
                    .    l in 1/6, sepby(clinid) 
                    
                        +---------------------------------------------------------------+
                        factor_s1   factor_s2   factor_s3   item   clinid     bg2cost 
                        ---------------------------------------------------------------
                        1.   .4210681    .1279617   -.0637191      1        1   -1.915584 
                        2.  -.0540779    .4716006   -.0690916      2        1    .9380358 
                        3.  -.0519108    .5287738    .0291713      3        1   -.2946705 
                        4.  -.1207441    .3530449    .1136311      4        1    .3302429 
                        5.   .5115369   -.0972753    .0379866      5        1   -1.427679 
                        6.   .5170078   -.1185958   -.0334809      6        1   -1.012556 
                        +---------------------------------------------------------------+
                    ​​​Now you can manipulate the loadings by focusing only on those above a specific threshold.

                    Code:
                    foreach i of var factor_s* {
                        replace `i'=. if `i'<.4
                    }
                    .    l in 1/6, sepby(clinid) 
                    
                        +------------------------------------------------------------+
                        factor~1   factor~2   factor~3   item   clinid     bg2cost 
                        ------------------------------------------------------------
                        1.  .4210681          .          .      1        1   -1.915584 
                        2.         .   .4716006          .      2        1    .9380358 
                        3.         .   .5287738          .      3        1   -.2946705 
                        4.         .          .          .      4        1    .3302429 
                        5.  .5115369          .          .      5        1   -1.427679 
                        6.  .5170078          .          .      6        1   -1.012556 
                        +------------------------------------------------------------+
                    This is a good starting point for later calculation of scores based on factor loadings.

                    Comment


                    • #11
                      I wrote a program pcacoefsave (SSC) to save PCA results.
                      http://www.statalist.org/forums/foru...-for-pca-users

                      I kept clear of factor analysis; many of the tribal habits of factor analysts I don't understand or know about. But someone enthusiastic and knowledgeable might want to clone and extend that program for factor analysis.

                      Comment


                      • #12
                        Hi @Oded Mcdossi
                        Questions derived
                        1. I understand the lines in the table

                        l in 1/6, sepby(clinid)

                        are the six variables (numbers on the left)? I would then understand the role of the three factors, but not the variables item clinidi and bg2cost. Maybe I just don't understand why you reshape.

                        2. How can I then export the final list into excel? I know putexcel but I guess this is only for table.

                        Sorry I edited because tried to insert the whole mentioned table
                        Last edited by joan marc; 30 Nov 2015, 02:36.

                        Comment


                        • #13
                          Code:
                           
                          help export excel

                          Comment


                          • #14
                            Hi @Nick Cox

                            I'm sorry but I have extensively read help export and others and can't manage to export the results (not the data, not the results stored of the commands, not a specific regression... but the table resulting of the factor or rotate commands) of the tables. I am surprised because it should be straightforward right? Or I am very new on this

                            In some places I read that if you copy (as table) and paste it, it should work, but the cells collapse and they are not separated. Big mess.

                            Comment


                            • #15
                              As I understand it, Oded has shown precisely the first step you need to put the results you want in new variables. He did indicate that you may need to do other calculations.

                              See the list as the last element in #10.

                              Hence my advice just to use export excel.

                              Comment

                              Working...
                              X