Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCA for Dependent Variable?

    Hello all,

    Although familiar with the concepts associated with principal component analysis, I am using it for the first time myself.

    I have 17 variables that I am hoping to use to PCA on for the purpose of data reduction. Essentially, these measures represent three different dimensions of "social capital," as identified in the literature (e.g., 1. Solidarity, trust, and tolerance, 2. A strong associational life, and 3. Political and civic engagement). My data is cross-sectional-times-series, and the observations are at the country level. I have 16 countries and five periods, for a total of 80 observations. My Stata version is 14.2.

    After running the PCA analysis, there are 5 components with eigenvalues over 1 which cumulatively account for .74 of the variance. The K-M-O measuring sampling adequacy gives an overall measure of .6375 (so, PCA is justified). So, I used predict to create component measures (i.e., pc1, pc2, pc3, pc4, pc5).

    My question is about how to go about the next step--creating a single variable by combining these five component variables to serve as the dependent variable for my analysis. Basically, social capital is my dependent variable. Rather than run 5 regressions with each of the five components variables serving as its own separate DV, I would like to combine these five component variables into a single composite (i.e., index) variable to use as my primary dependent variable.

    My first question is whether there are any issues with doing this, which I should be aware of? My second question is whether there is a preferred approach for going about creating such a composite/idex varaible? I am also open to any other related tips or advice anyone might have to conducting such an analysis.

    Thank you in advance for your time and help!

    Best,
    Catie




  • #2
    Combining them into a single index would seem to go against the whole idea of a PCA, namely that you are extracting separable components. If you had extracted a single component accounting for most of the variation, then there would be reason to create a single index.

    Comment


    • #3
      Hi!

      Thanks for your answer, I appreciate it.

      The single component with the highest eigenvalue only accounts for about .28 of the variance. My understanding is that your target should be components that (cumulatively) account for about .7 or more. But, I see your point. It may be that PCA is not the correct approach. I have also been doing more research, and it seems that it is controversial to use PCA for your outcome variable, which is what I was originally intending.

      Now, I am just curious more broadly about the utility and appropriateness of PCA for anything other than data reduction. For example, erring on the side of the argumentthat PCA should only be used for predictor variable, is there ever a good reason to create a single index variable out of multiple principal component variables? For example, what if you wanted to use this index variable in an interaction?

      I would love to hear any thoughts you (or anyone else with more experience on this than myself) might have on the utility of PCA, and whether there is ever a justification to combine components into a single index variable (and, if so, the preferred approach for doing so)?

      Thanks again!

      Comment


      • #4
        PC1 is the best single summary you can get from a PCA in terms of its criteria. I don't know where the idea you sketch came from -- I have often seen it mentioned in forums -- but there is no logic to it. If it were a really good idea, you can be sure that it would be a wired-in option to predict after pca.

        Alternatively, given a hypothesis that there are three key latent variables, you might be much better off seeking the best summary of each. I suspect that many people would mutter structural equation models at this point but I am not a good person to expand on that.

        I like the ideas in

        Cumming, J.A. and Wooff, D.A. 2007. Dimension reduction via principal variables.Computational Statistics & Data Analysis 52: 550-565.

        which I have been meaning to implement in Stata for several years.
        Last edited by Nick Cox; 21 Jun 2018, 00:19.

        Comment


        • #5
          Thank you for your response.

          I am currently leaning toward factor analysis. It seems to be the preferred approach for this sort of thing, based on my reading so far. And, FA shows strong support for the interrelationship between the variables I have employed to operationalize the three primary components of social capital identified in the literate (i.e., 1. tolerance and equality, 2. political and social engagement, and 3. political trust). I am still thinking through whether to just use the factor scores for each as the indexes representing these three sub-components of social capital, or alternatively creating linear combinations of these three sets of variables to serve as the composites (probably by averaging them together). I am leaning toward the former, since determining the optimal weights for combining these measures seems to be the primary advantage of using a technique like FA. But the latter seems more intuitive for interpreting the meaning of the coefficients. I also still wonder if the latter lends itself more nicely to creating a single, "master" index of social capital out of these three sub-components? (I may also opt not to create a single index of social capital and just keep them as three different predictors in my analysis.)

          I did see some mention of SEM as a potential alternative in my reading, however. I will look into this further and also check out the article you recommended.

          Thanks again for sharing your time and expertise!

          Comment


          • #6
            Dear all,

            Based on the discussion above, I have a question about the interpretation of PCA indices as an outcome variable. I did something similar as described by Catie, as I formed three dimensions for measuring women's empowerment (personal, relational and environmental) for which a survey of 28 questions was constructed. Eventually, I created PCA indices out of these questions for the three dimensions, resulting in 4 components for the social, and 2 components for the relational and environmental dimension. After the creation of the indices, I did a kernel PSM analysis however I do not understand how to read the output correctly:

            - How to read the significance? (DF --> look up in table?)
            - How to interpret the ouctomes. E.g. what does the ATT of -.664 mean? Negative impact on pcarelational 1..?

            Code:
            psmatch2 round age_1 i.schoolatt howmanyHHM_1 i.partner i.childmortality schoolagedchildren_1 , out (pcamaterial1 pcamaterial2 pcaenvironmental1 pcapersonal1 pcapersonal2 pcapersonal3 pcapersonal4) kernel kerneltype(epan) bwidth(.05)
            Click image for larger version

Name:	output pca.png
Views:	1
Size:	37.4 KB
ID:	1571073

            Would be great if someone could share some thoughts.

            Best,

            Linda

            Comment

            Working...
            X