Low Eigenvalue with PCA, High Eigenvalue with FA

Katerina Novakova

Join Date: Feb 2024

Posts: 12
#1

Low Eigenvalue with PCA, High Eigenvalue with FA

23 Feb 2024, 06:25

//EDIT, I SWITCHED THE ORDER IN THE POST HEADING, IT IS LOW WITH FA AND HIGH WITH PCA, SORRY FOR THAT :D//

Hi,
I have a question regarding the creation of latent variables.

In my case, I am trying to create a latent variable called Type of Data.
This variable would be created from four questions (Likert scale 1-7) and is based on literature (however, the scale is my creation).

Unfortunately, when running my FA, it seems that the Eigenvalue is very low (0.48), the Alpha is also low (0.38), and (logically) the correlation between these questions is also very low. Interestingly, the KMO seems to be sufficient (0.58). I do not see a reason why that would be the case, given my dataset and the theoretical reasoning.

After this disappointing result, I tried running PCA, which (surprisingly) came with a high Eigenvalue (1.42).

I am now quite unsure of what to do.

I am aware that FA should be the way to go; however, how do you handle such a case? I tried dropping some of the questions, but that does not seem to be affecting it much (if anything, the Eigenvalue is lower). I tried rotation (with no satisfactory result).

The variable is one of my moderators, so I could technically drop it altogether; however, I am not too keen on doing that. Secondly, I can simply use one of the questions as a representation of the whole concept, but that seems highly unreliable.

Overall, it seems to be obvious that I created the scale for this concept wrong. I do not have the time and space to recollect my data, so what would your advice be? Drop it all? Use only one of the questions as a representation? Something different?

Thank you for any comments and help.

Katerina

Last edited by Katerina Novakova; 23 Feb 2024, 07:22.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35212
#2

23 Feb 2024, 06:46

I think we need to see the exact code you used for both methods and the results shown.
Comment

Katerina Novakova

Join Date: Feb 2024
Posts: 12

23 Feb 2024, 07:10

Hi, thank you for your fast response.

I do not have all the iterations of the code since I have been playing around with it (and adjusting it) for the past two hours.
But I can post the underlying "basic" code for this FA and PCA (without rotations, without graphs).

Code:

factor ToD1-ToD4, factors(1)
alpha ToD1-ToD4, item 
estat kmo
pca ToD1-ToD4

The results in this case are as follows.

Results:

Code:

 //Type of Data//
. factor ToD1-ToD4, factors(1)
(obs=312)

Factor analysis/correlation                      Number of obs    =        312
    Method: principal factors                    Retained factors =          1
    Rotation: (unrotated)                        Number of params =          4

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      0.48045      0.43933            2.1371       2.1371
        Factor2  |      0.04112      0.15589            0.1829       2.3199
        Factor3  |     -0.11478      0.06720           -0.5105       1.8094
        Factor4  |     -0.18197            .           -0.8094       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(6)  =   37.84 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable |  Factor1 |   Uniqueness 
    -------------+----------+--------------
            ToD1 |   0.3886 |      0.8490  
            ToD2 |   0.3394 |      0.8848  
            ToD3 |   0.4108 |      0.8312  
            ToD4 |   0.2132 |      0.9545  
    ---------------------------------------

. alpha ToD1-ToD4, item 

Test scale = mean(unstandardized items)

                                                            Average
                             Item-test     Item-rest       interitem
Item         |  Obs  Sign   correlation   correlation     covariance      alpha
-------------+-----------------------------------------------------------------
ToD1         |  312    +       0.5706        0.2230        .2595158      0.2998
ToD2         |  312    +       0.6152        0.2223        .2361833      0.2961
ToD3         |  312    +       0.6395        0.2479        .2038365      0.2653
ToD4         |  312    +       0.5402        0.1334        .3456729      0.3930
-------------+-----------------------------------------------------------------
Test scale   |                                             .2613021      0.3808
-------------------------------------------------------------------------------

. estat kmo

Kaiser-Meyer-Olkin measure of sampling adequacy

    -----------------------
        Variable |     kmo 
    -------------+---------
            ToD1 |  0.5600 
            ToD2 |  0.6078 
            ToD3 |  0.5678 
            ToD4 |  0.5800 
    -------------+---------
         Overall |  0.5754 
    -----------------------

. pca ToD1-ToD4

Principal components/correlation                 Number of obs    =        312
                                                                     Number of comp.  =          4
                                                                     Trace            =          4
    Rotation: (unrotated = principal)               Rho              =     1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      1.41653      .417089             0.3541       0.3541
           Comp2 |      .999438      .161033             0.2499       0.6040
           Comp3 |      .838405     .0927738             0.2096       0.8136
           Comp4 |      .745631            .             0.1864       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    --------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 | Unexplained 
    -------------+----------------------------------------+-------------
            ToD1 |   0.5465   -0.4582    0.0873    0.6955 |           0 
            ToD2 |   0.5057    0.3017   -0.8023   -0.0979 |           0 
            ToD3 |   0.5758   -0.2899    0.3376   -0.6858 |           0 
            ToD4 |   0.3377    0.7842    0.4845    0.1905 |           0 
    --------------------------------------------------------------------

I am now wondering if the fact that I am using panel data affects these results. I am working with two periods (thus, n=156); however, this variable does not change between these two periods. I am not sure if that is relevant.

Thank you

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35212
#4

23 Feb 2024, 07:36

To me the key point is that the term "eigenvalue" doesn't have the same meaning for both commands. PCA Is strict that given what you supplied -- by implication a correlation matrix -- the PCs will correspond to eigenvalues (a) that are positive or just possibly zero and (b) that sum to the number of variables, 4. Judgments differ but I wouldn't call 1.42 high at all for 4 variables, and in any case its magnitude implies that picking just one PC to characterise the data ignores the majority of the variation.

FA is working with quite different rules and criteria and "eigenvalues" can be negative with those rules, but I don't think the overall message is much different.

Your judgement that you expect the correlations to be low is likely to be on target. You know that they are asking for answers to different questions. It follows that any desire to reduce four items to one PC or factor contradicts both that judgement and -- as shown -- the evidence provided by the data.

I don't follow all your comments but my impression is that neither PCA nor FA offers any advantage over using your Likert items directly.

Last edited by Nick Cox; 23 Feb 2024, 07:45.
Comment
Katerina Novakova

Join Date: Feb 2024

Posts: 12
#5

23 Feb 2024, 08:08

Thank you for your answer!

I do agree with you that the FA (and seemingly also the PCA) do not provide evidence that ToD1-ToD4 can be made into a latent variable.

The problem is that these questions were all created to measure one underlying concept (Type of Data - as the potential latent variable).
The whole model works with panel data and consists of an IV, DV, and 4 moderators (Type of Data being one of the moderators). All the other questions from my survey seem to measure what is intended (with great eigenvalues, alphas, KMOs, etc. = I am able to make those latent variables).

Seemingly, the direct Likert items measure something unexpected (due to me creating an unfit scale) and are not as useful on their own. At least that is my interpretation as I am unsure about how I would continue further analysis (e.g., regression; in my case xtreg) with 5 latent variables (IV, DV, 3 moderators) plus 4 direct items.

This brings me to the ultimate question of whether it is then advisable to drop this moderator and slightly rework my model.

Another option would be to pick one of the questions (e.g., ToD1) and use that as the measure of the Type of Data. But I think this would not be too reliable as it does not capture the whole concept of the Type of Data.

I am open to any opinions. (I, myself, am now more inclined to drop the moderator).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#6

23 Feb 2024, 08:16

Your initial model appears dubious given your data and your results. That is an interesting discovery, not an irritating difficulty.
Comment
Katerina Novakova

Join Date: Feb 2024

Posts: 12
#7

23 Feb 2024, 12:41

Interesting point.
I will think about it, thank you Nick.
Comment

Announcement