Principal Component Analysis

Philipp Hoffmann

Join Date: Jun 2019

Posts: 12
#1

Principal Component Analysis

14 Nov 2024, 01:51

Hi everyone,

I have a question regarding PCA and in general Factor Analysis. I want to create an Index and validate it with PCA. Now my question: Can I use PCA and in general Factor Analysis only with (quasi)metric variables?

I actually have several variables that are a dummy variable with 0/1 and then I want to create a count index that counts the 1. But in the first step, I need to validate the variables with PCA.

Many thanks in advance!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35446
#2

14 Nov 2024, 04:13

How do you validate variables with PCA?

I have very mixed feelings about PCA. I read positive accounts about it and the sales pitch sounds great. Then I try it and it works well if and only if I have a bundle of very highly correlated variables on the same footing, and I want to extract a single dimension, or perhaps a very small number of dimensions. But even then I would be better off with say picking one variable as similar to the others or averaging directly.

I think you need to listen for the silence, for the experts, books and literatures that never use PCA at all.

In most applications I read about people seem optimistic that PCA has white magic to extract latent dimensions from an arbitrary bundle of variables. The optimism is especially strong if the variables have quite different units or measurement scales. But PCA is not like a washing machine that removes dirt from your clothes. Using the same strained analogy, the dirt just gets redistributed.

That's largely oblique to your question, as I don't know what you mean by validation. But the only rule seems to be to use what works. Stata's pca command won't object to input of (e.g.) indicator variables, but no PCA routine can find anything but a structure in terms of correlations and linear relations.
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 663
#3

14 Nov 2024, 07:26

I came across these sources recently, maybe they are helpful for you:

Kubinger, K. D. 2003. On artificial results due to using factor analysis for dichotomous
variables. Psychology Science 45: 106–110

Gadermann, A. M., M. Guhn, and B. D. Zumbo. 2012. Estimating ordinal reliability for
Likert-type and ordinal item response data: A conceptual, empirical, and practical
guide. Practical Assessment, Research & Evaluation 17(3).

Best wishes

(Stata 16.1 MP)
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 418
#4

14 Nov 2024, 17:23

With binary and ordinal variables, Stata's pca is not appropriate because it uses Pearson correlations. You want to use tetrachoric and polychoric correlations for 0/1 and 0/1/2.. variables, respectively. Stas Kolenikov's polychoric command has what you need (search polychoric). Also see this helpful UCLA tutorial on factor analysis on binary and ordinal data.
Comment
George Ford

Join Date: Aug 2014

Posts: 3121
#5

14 Nov 2024, 17:24

with a dichotomous variables, you need to get the tetrachoric matrix and apply factor analysis to that matrix. I think there's an example in the help file.

HTML Code:

https://www.stata.com/manuals/rtetrachoric.pdf
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35446
#6

15 Nov 2024, 07:39

It doesn't seem to be much in favour in Stata circles but correspondence analysis seems to be an alternative here -- given that you trust multivariate analysis to do a good job.
Comment

Announcement

Principal Component Analysis

Comment

Comment

Comment

Comment

Comment