Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Principal Component Analysis

    Hi everyone,

    I have a question regarding PCA and in general Factor Analysis. I want to create an Index and validate it with PCA. Now my question: Can I use PCA and in general Factor Analysis only with (quasi)metric variables?

    I actually have several variables that are a dummy variable with 0/1 and then I want to create a count index that counts the 1. But in the first step, I need to validate the variables with PCA.

    Many thanks in advance!

  • #2
    How do you validate variables with PCA?

    I have very mixed feelings about PCA. I read positive accounts about it and the sales pitch sounds great. Then I try it and it works well if and only if I have a bundle of very highly correlated variables on the same footing, and I want to extract a single dimension, or perhaps a very small number of dimensions. But even then I would be better off with say picking one variable as similar to the others or averaging directly.

    I think you need to listen for the silence, for the experts, books and literatures that never use PCA at all.

    In most applications I read about people seem optimistic that PCA has white magic to extract latent dimensions from an arbitrary bundle of variables. The optimism is especially strong if the variables have quite different units or measurement scales. But PCA is not like a washing machine that removes dirt from your clothes. Using the same strained analogy, the dirt just gets redistributed.

    That's largely oblique to your question, as I don't know what you mean by validation. But the only rule seems to be to use what works. Stata's pca command won't object to input of (e.g.) indicator variables, but no PCA routine can find anything but a structure in terms of correlations and linear relations.

    Comment


    • #3
      I came across these sources recently, maybe they are helpful for you:

      Kubinger, K. D. 2003. On artificial results due to using factor analysis for dichotomous
      variables. Psychology Science 45: 106–110

      Gadermann, A. M., M. Guhn, and B. D. Zumbo. 2012. Estimating ordinal reliability for
      Likert-type and ordinal item response data: A conceptual, empirical, and practical
      guide. Practical Assessment, Research & Evaluation 17(3).
      Best wishes

      (Stata 16.1 MP)

      Comment


      • #4
        With binary and ordinal variables, Stata's pca is not appropriate because it uses Pearson correlations. You want to use tetrachoric and polychoric correlations for 0/1 and 0/1/2.. variables, respectively. Stas Kolenikov's polychoric command has what you need (search polychoric). Also see this helpful UCLA tutorial on factor analysis on binary and ordinal data.

        Comment


        • #5
          with a dichotomous variables, you need to get the tetrachoric matrix and apply factor analysis to that matrix. I think there's an example in the help file.

          HTML Code:
          https://www.stata.com/manuals/rtetrachoric.pdf

          Comment


          • #6
            It doesn't seem to be much in favour in Stata circles but correspondence analysis seems to be an alternative here -- given that you trust multivariate analysis to do a good job.

            Comment

            Working...
            X