Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCA for dataset with existing clusters

    Hi
    I am computing principal Analysis on school quality proxies. The data are provided per region (34).
    It seems like stata command pca varlist, cluster(...) is not valid.
    Could anyone assist please. Is there an alternative way to run PCA and composite index in taking account the existing cluster in the data?

  • #2
    not sure what your after, but maybe: https://stats.oarc.ucla.edu/stata/fa...ents-analysis/

    Comment


    • #3
      George Ford gives excellent advice.

      I will add only that the -vce(cluster ...)- option is used to provide cluster-robust adjustment of standard errors in regression analyses. -pca- is not a regression command and it does not calculate any standard errors, so there would be nothing for -vce(cluster ...)- to apply to.

      Comment


      • #4
        What may apply is that if you use principal component scores in some later model fit, then taking account of clusters may be a good idea at that later stage. I have never found using PC scores to be superior to using original data, but circumstances, tastes and experiences do vary.

        Comment


        • #5
          Going back to the inner logic of PC, correlation and variance of VARIABLES is the input to derive components and then chose the ones with higher variance - eigenvalue. In the current state of the method, as I know it and stata applies, location of OBSERVATIONS does not matter (which is clearly interesting). You might want to run PC for each region, if so, this means you are on the assumption that regions are independent, and then you would have a PC for each region which might differ enough and accomplish your idea. Also, you might want to run all observations, obtain the scores and run a regression with dummy variables for each region (even using the cluster variance-covariance option) and then procede with those predicted values. this would take you to have a score that has accounted for regions.

          Comment


          • #6
            Thank you Rodrigo for your suggestion. I am not sure whether I understand. However, I run the PCA on all observations, then run my regressions using cluster(.).

            Comment

            Working...
            X