Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • After varimax rotation stata says factors explain more than 100% of the variance.

    I'm using "factor." When I ask for a rotated solution (using the default - varimax orthogonal) it gives me (aside from the loadings etc) a value called "proportion" for each factor (with the first factor having the highest value). Stata's documentation never explicitly says what this value actually is but I assume that "proportion" represents the proportion of the total variance in the observed variable explained by each latent factor. However, when I view a rotated solution the total "proportion" values often add up to more than 1, implying that the factors in total explain more than 100% of the variance in the observed variables, which seems nonsensical. This can be seen in Stata's example dataset for the "factor" commaand:

    . webuse bg2
    (Physician-cost data)

    . factor bg2cost1 bg2cost2 bg2cost3 bg2cost4 bg2cost5 bg2cost6
    (obs=568)

    Factor analysis/correlation Number of obs = 568
    Method: principal factors Retained factors = 3
    Rotation: (unrotated) Number of params = 15

    --------------------------------------------------------------------------
    Factor | Eigenvalue Difference Proportion Cumulative
    -------------+------------------------------------------------------------
    Factor1 | 0.85389 0.31282 1.0310 1.0310
    Factor2 | 0.54107 0.51786 0.6533 1.6844
    Factor3 | 0.02321 0.17288 0.0280 1.7124
    Factor4 | -0.14967 0.03951 -0.1807 1.5317
    Factor5 | -0.18918 0.06197 -0.2284 1.3033
    Factor6 | -0.25115 . -0.3033 1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000

    Factor loadings (pattern matrix) and unique variances

    -----------------------------------------------------------
    Variable | Factor1 Factor2 Factor3 | Uniqueness
    -------------+------------------------------+--------------
    bg2cost1 | 0.2470 0.3670 -0.0446 | 0.8023
    bg2cost2 | -0.3374 0.3321 -0.0772 | 0.7699
    bg2cost3 | -0.3764 0.3756 0.0204 | 0.7169
    bg2cost4 | -0.3221 0.1942 0.1034 | 0.8479
    bg2cost5 | 0.4550 0.2479 0.0641 | 0.7274
    bg2cost6 | 0.4760 0.2364 -0.0068 | 0.7175
    -----------------------------------------------------------

    .
    .
    .
    . rotate

    Factor analysis/correlation Number of obs = 568
    Method: principal factors Retained factors = 3
    Rotation: orthogonal varimax (Kaiser off) Number of params = 15

    --------------------------------------------------------------------------
    Factor | Variance Difference Proportion Cumulative
    -------------+------------------------------------------------------------
    Factor1 | 0.72646 0.05991 0.8772 0.8772
    Factor2 | 0.66655 0.64139 0.8048 1.6820
    Factor3 | 0.02516 . 0.0304 1.7124
    --------------------------------------------------------------------------
    LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000


    you can see that the "proportion" values for the 3 retained factors add up to 1.7124.:


    So what's going on here? Obviously I'm confused about what "proportion" means, but Stata's documentation doesn't seem to provide any guidance on how I should interpret this value. Also, how can I correctly characterize the explanatory power of rotated factors? I would like to be able to say that my first factor explains X% of the variance while the second factor explains Y%, but is that even possible with rotated solutions?

    Someone else already asked this exact same question but never got a response:
    https://www.statalist.org/forums/for...eater-than-100




  • #2
    Graham, the analysis has issues of negative eigenvalues. Stata FAQ gives an explanation on this: https://www.stata.com/support/faqs/s...e-eigenvalues/.

    Comment


    • #3
      Cross-posted at https://stats.stackexchange.com/ques...f-the-variance

      Please note our policy about cross-posting, which is that you should tell us about it. https://www.statalist.org/forums/help#crossposting

      Discussion on Cross Validated led to identification of previous detailed discussions.

      Comment


      • #4
        OK, sorry for not mentioning that I also asked for help elsewhere. I understand how mathematically speaking, negative eigenvalues can lead to a cumulative proportion being greater than 1. I guess my new question is how I should deal with this in terms of reporting. I obviously don't want to say "factor 1 explained 56% of the variance while factor 2 explained 52% of the variance," that just sounds wrong. Negative eigenvalues or not it's clearly absurd in some sense to claim that my two factors have explained more than 100% of the variance in the observed items, right? So is there a standard way of dealing with this? The FAQ says I could just re run the analysis using PCF, but that imposes additional assumptions that I'm not sure I want to make, and gives slightly different results. Or I could manually calculate the proportion of the "total" variance (which could be more than 1).

        Comment


        • #5
          It seems to me that you'd have a simpler time with principal component analysis. I don't have a suggestion for how to conduct a factor analysis in your terms.

          Comment


          • #6
            I could manually calculate the proportion of the "total" variance (which could be more than 1).
            I suppose you could do that. Before doing so, ask yourself why something so obvious is not done by Stata already, nor apparently by any other factor analysis program.

            In the discussion of doing a factor analysis with SAS found at

            https://stats.idre.ucla.edu/sas/output/factor-analysis/

            we see the following discussion:
            An eigenvalue is the variance of the factor. Because this is an unrotated solution, the first factor will account for the most variance, the second will account for the second highest amount of variance, and so on. Some of the eigenvalues are negative because the matrix is not of full rank. This means that there are probably only four dimensions (corresponding to the four factors whose eigenvalues are greater than zero). Although it is strange to have a negative variance, this happens because the factor analysis is only analyzing the common variance, which is less than the total variance. If we were doing a principal components analysis, we would have had 1’s on the diagonal, which means that all of the variance is being analyzed (which is another way of saying that we are assuming that we have no measurement error), and we would not have negative eigenvalues. In general, it is not uncommon to have negative eigenvalues.
            This suggests that there is more going on here than meets the eye.

            Comment

            Working...
            X