After varimax rotation stata says factors explain more than 100% of the variance.

Graham Wright

Join Date: Nov 2021

Posts: 4
#1

After varimax rotation stata says factors explain more than 100% of the variance.

05 Nov 2021, 13:25

I'm using "factor." When I ask for a rotated solution (using the default - varimax orthogonal) it gives me (aside from the loadings etc) a value called "proportion" for each factor (with the first factor having the highest value). Stata's documentation never explicitly says what this value actually is but I assume that "proportion" represents the proportion of the total variance in the observed variable explained by each latent factor. However, when I view a rotated solution the total "proportion" values often add up to more than 1, implying that the factors in total explain more than 100% of the variance in the observed variables, which seems nonsensical. This can be seen in Stata's example dataset for the "factor" commaand:

. webuse bg2
(Physician-cost data)

. factor bg2cost1 bg2cost2 bg2cost3 bg2cost4 bg2cost5 bg2cost6
(obs=568)

Factor analysis/correlation Number of obs = 568
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 15

--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 0.85389 0.31282 1.0310 1.0310
Factor2 | 0.54107 0.51786 0.6533 1.6844
Factor3 | 0.02321 0.17288 0.0280 1.7124
Factor4 | -0.14967 0.03951 -0.1807 1.5317
Factor5 | -0.18918 0.06197 -0.2284 1.3033
Factor6 | -0.25115 . -0.3033 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

-----------------------------------------------------------
Variable | Factor1 Factor2 Factor3 | Uniqueness
-------------+------------------------------+--------------
bg2cost1 | 0.2470 0.3670 -0.0446 | 0.8023
bg2cost2 | -0.3374 0.3321 -0.0772 | 0.7699
bg2cost3 | -0.3764 0.3756 0.0204 | 0.7169
bg2cost4 | -0.3221 0.1942 0.1034 | 0.8479
bg2cost5 | 0.4550 0.2479 0.0641 | 0.7274
bg2cost6 | 0.4760 0.2364 -0.0068 | 0.7175
-----------------------------------------------------------

.
.
.
. rotate

Factor analysis/correlation Number of obs = 568
Method: principal factors Retained factors = 3
Rotation: orthogonal varimax (Kaiser off) Number of params = 15

--------------------------------------------------------------------------
Factor | Variance Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 0.72646 0.05991 0.8772 0.8772
Factor2 | 0.66655 0.64139 0.8048 1.6820
Factor3 | 0.02516 . 0.0304 1.7124
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000

you can see that the "proportion" values for the 3 retained factors add up to 1.7124.:

So what's going on here? Obviously I'm confused about what "proportion" means, but Stata's documentation doesn't seem to provide any guidance on how I should interpret this value. Also, how can I correctly characterize the explanatory power of rotated factors? I would like to be able to say that my first factor explains X% of the variance while the second factor explains Y%, but is that even possible with rotated solutions?

Someone else already asked this exact same question but never got a response:
https://www.statalist.org/forums/for...eater-than-100
Tags: None
Fei Wang

Join Date: Oct 2021

Posts: 726
#2

06 Nov 2021, 09:44

Graham, the analysis has issues of negative eigenvalues. Stata FAQ gives an explanation on this: https://www.stata.com/support/faqs/s...e-eigenvalues/.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35431
#3

07 Nov 2021, 08:01

Cross-posted at https://stats.stackexchange.com/ques...f-the-variance

Please note our policy about cross-posting, which is that you should tell us about it. https://www.statalist.org/forums/help#crossposting

Discussion on Cross Validated led to identification of previous detailed discussions.
Comment
Graham Wright

Join Date: Nov 2021

Posts: 4
#4

07 Nov 2021, 10:41

OK, sorry for not mentioning that I also asked for help elsewhere. I understand how mathematically speaking, negative eigenvalues can lead to a cumulative proportion being greater than 1. I guess my new question is how I should deal with this in terms of reporting. I obviously don't want to say "factor 1 explained 56% of the variance while factor 2 explained 52% of the variance," that just sounds wrong. Negative eigenvalues or not it's clearly absurd in some sense to claim that my two factors have explained more than 100% of the variance in the observed items, right? So is there a standard way of dealing with this? The FAQ says I could just re run the analysis using PCF, but that imposes additional assumptions that I'm not sure I want to make, and gives slightly different results. Or I could manually calculate the proportion of the "total" variance (which could be more than 1).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35431
#5

07 Nov 2021, 11:15

It seems to me that you'd have a simpler time with principal component analysis. I don't have a suggestion for how to conduct a factor analysis in your terms.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

07 Nov 2021, 12:57

I could manually calculate the proportion of the "total" variance (which could be more than 1).

I suppose you could do that. Before doing so, ask yourself why something so obvious is not done by Stata already, nor apparently by any other factor analysis program.

In the discussion of doing a factor analysis with SAS found at

https://stats.idre.ucla.edu/sas/output/factor-analysis/

we see the following discussion:
An eigenvalue is the variance of the factor. Because this is an unrotated solution, the first factor will account for the most variance, the second will account for the second highest amount of variance, and so on. Some of the eigenvalues are negative because the matrix is not of full rank. This means that there are probably only four dimensions (corresponding to the four factors whose eigenvalues are greater than zero). Although it is strange to have a negative variance, this happens because the factor analysis is only analyzing the common variance, which is less than the total variance. If we were doing a principal components analysis, we would have had 1’s on the diagonal, which means that all of the variance is being analyzed (which is another way of saying that we are assuming that we have no measurement error), and we would not have negative eigenvalues. In general, it is not uncommon to have negative eigenvalues.
This suggests that there is more going on here than meets the eye.
Comment

Announcement

After varimax rotation stata says factors explain more than 100% of the variance.

Comment

Comment

Comment

Comment

Comment