Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Santosh Pathak Although you're new to the thread several of the previous replies appear to apply to your question.

    I agree with Clyde Schechter and add a few further comments.

    In general, if you are using PCA then PC1 is the best single summary of the input variables and you can't improve on it by mushing it together with other PCs. This is a surprisingly common fallacy.

    Something like

    Code:
    predict PC1
    calculates the PC 1 scores after pca; there is no (zero!) need for any other calculation.

    There is a broader question of whether PCA is a good idea here. I don't work in this field but I've seen several questions on Statalist that evidently are from students trying to follow earlier papers that used PCA to get some of the predictors.

    That the first PC for 30 indicators captures 11% of the total variation is evidently disappointing to you but I don't find it at all surprising for the kind of data I guess you have.

    Comment


    • #17
      Thank you so much Nick Cox and Clyde Schechter. It is more clear now.

      Clyde Schechter: I am using code with the outline:

      local x x1 x2 x3 x4
      foreach x of local x{
      egen "n_`x'" = std(`x')
      }

      pca n_x1 - n_x4

      egen index = (component 1 score of x1)* n_x1 + ......+ (component 1 score of x4)* n_x4


      Regards,
      Santosh

      Comment


      • #18
        It's my turn not to understand, The implication of #2 #4 #16 and other posts in this thread by Clyde Schechter and myself is that PC1 is the best single summary of a bunch of variables using correlation structure or covariance structure as your guide. You can get that PC1 as scores directly using predict after pca.

        Even allowing that your code outline isn't all literal, note that egen is illegal for what I think you want and that generate would be unnecessary, given as said that predict will do it for you.

        Your word equation also uses the term score incorrectly,

        There are entire books on PCA and a chapter on PCA is also customary in survey texts on multivariate statistics. If you're going to use PCA at all, and I am not clear that you really need it, it's worth finding a congenial text to get all this straight.

        Comment


        • #19
          Thank you Nick Cox for your kind suggestions.

          Comment


          • #20
            Clyde Schechter and @Nick Cox, thank you for your explanations. I have the same question that Stephanie posed here. I am trying to calculate a wealth score for 25 variables. When I use

            Code:
            predict wealthscore
            it generates a score for pc1. When I use

            Code:
            predict pc1 pc2, score
            it generates two separate columns.

            I was wondering how I can generate one score that covers both pc1 & pc2. For instance, Can I have a wealthscore that covers both pc1 and pc2

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            wealthscore      pc1           pc2
              1.9262853   1.9262853    .961527
              3.3533766   3.3533766  .51521915
              1.3404053   1.3404053   .3001724
              .53514564   .53514564   .4931327
              -1.274623   -1.274623  .10186155
              -.3982685   -.3982685   .9355345
             -1.8886043  -1.8886043   .5828356
              -3.481718   -3.481718  -1.284943
             -.26558822  -.26558822    .821162
             -1.4841796  -1.4841796 -.16323125
               .7865319    .7865319   .9410244
             -.58590585  -.58590585  .18407133
            end
            Last edited by Hadi Kahalzadeh; 05 Oct 2020, 16:50.

            Comment


            • #21
              #20 is the same question as asked and answered in several posts already in this thread.

              Comment

              Working...
              X