Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low Eigenvalue with PCA, High Eigenvalue with FA

    //EDIT, I SWITCHED THE ORDER IN THE POST HEADING, IT IS LOW WITH FA AND HIGH WITH PCA, SORRY FOR THAT :D//


    Hi,
    I have a question regarding the creation of latent variables.

    In my case, I am trying to create a latent variable called Type of Data.
    This variable would be created from four questions (Likert scale 1-7) and is based on literature (however, the scale is my creation).

    Unfortunately, when running my FA, it seems that the Eigenvalue is very low (0.48), the Alpha is also low (0.38), and (logically) the correlation between these questions is also very low. Interestingly, the KMO seems to be sufficient (0.58). I do not see a reason why that would be the case, given my dataset and the theoretical reasoning.

    After this disappointing result, I tried running PCA, which (surprisingly) came with a high Eigenvalue (1.42).

    I am now quite unsure of what to do.

    I am aware that FA should be the way to go; however, how do you handle such a case? I tried dropping some of the questions, but that does not seem to be affecting it much (if anything, the Eigenvalue is lower). I tried rotation (with no satisfactory result).

    The variable is one of my moderators, so I could technically drop it altogether; however, I am not too keen on doing that. Secondly, I can simply use one of the questions as a representation of the whole concept, but that seems highly unreliable.

    Overall, it seems to be obvious that I created the scale for this concept wrong. I do not have the time and space to recollect my data, so what would your advice be? Drop it all? Use only one of the questions as a representation? Something different?

    Thank you for any comments and help.

    Katerina





    Last edited by Katerina Novakova; 23 Feb 2024, 07:22.

  • #2
    I think we need to see the exact code you used for both methods and the results shown.

    Comment


    • #3
      Hi, thank you for your fast response.

      I do not have all the iterations of the code since I have been playing around with it (and adjusting it) for the past two hours.
      But I can post the underlying "basic" code for this FA and PCA (without rotations, without graphs).

      Code:
      factor ToD1-ToD4, factors(1)
      alpha ToD1-ToD4, item 
      estat kmo
      pca ToD1-ToD4
      The results in this case are as follows.

      Results:
      Code:
       //Type of Data//
      . factor ToD1-ToD4, factors(1)
      (obs=312)
      
      Factor analysis/correlation                      Number of obs    =        312
          Method: principal factors                    Retained factors =          1
          Rotation: (unrotated)                        Number of params =          4
      
          --------------------------------------------------------------------------
               Factor  |   Eigenvalue   Difference        Proportion   Cumulative
          -------------+------------------------------------------------------------
              Factor1  |      0.48045      0.43933            2.1371       2.1371
              Factor2  |      0.04112      0.15589            0.1829       2.3199
              Factor3  |     -0.11478      0.06720           -0.5105       1.8094
              Factor4  |     -0.18197            .           -0.8094       1.0000
          --------------------------------------------------------------------------
          LR test: independent vs. saturated:  chi2(6)  =   37.84 Prob>chi2 = 0.0000
      
      Factor loadings (pattern matrix) and unique variances
      
          ---------------------------------------
              Variable |  Factor1 |   Uniqueness 
          -------------+----------+--------------
                  ToD1 |   0.3886 |      0.8490  
                  ToD2 |   0.3394 |      0.8848  
                  ToD3 |   0.4108 |      0.8312  
                  ToD4 |   0.2132 |      0.9545  
          ---------------------------------------
      
      . alpha ToD1-ToD4, item 
      
      Test scale = mean(unstandardized items)
      
                                                                  Average
                                   Item-test     Item-rest       interitem
      Item         |  Obs  Sign   correlation   correlation     covariance      alpha
      -------------+-----------------------------------------------------------------
      ToD1         |  312    +       0.5706        0.2230        .2595158      0.2998
      ToD2         |  312    +       0.6152        0.2223        .2361833      0.2961
      ToD3         |  312    +       0.6395        0.2479        .2038365      0.2653
      ToD4         |  312    +       0.5402        0.1334        .3456729      0.3930
      -------------+-----------------------------------------------------------------
      Test scale   |                                             .2613021      0.3808
      -------------------------------------------------------------------------------
      
      . estat kmo
      
      Kaiser-Meyer-Olkin measure of sampling adequacy
      
          -----------------------
              Variable |     kmo 
          -------------+---------
                  ToD1 |  0.5600 
                  ToD2 |  0.6078 
                  ToD3 |  0.5678 
                  ToD4 |  0.5800 
          -------------+---------
               Overall |  0.5754 
          -----------------------
      
      . pca ToD1-ToD4
      
      Principal components/correlation                 Number of obs    =        312
                                                                           Number of comp.  =          4
                                                                           Trace            =          4
          Rotation: (unrotated = principal)               Rho              =     1.0000
      
          --------------------------------------------------------------------------
             Component |   Eigenvalue   Difference         Proportion   Cumulative
          -------------+------------------------------------------------------------
                 Comp1 |      1.41653      .417089             0.3541       0.3541
                 Comp2 |      .999438      .161033             0.2499       0.6040
                 Comp3 |      .838405     .0927738             0.2096       0.8136
                 Comp4 |      .745631            .             0.1864       1.0000
          --------------------------------------------------------------------------
      
      Principal components (eigenvectors) 
      
          --------------------------------------------------------------------
              Variable |    Comp1     Comp2     Comp3     Comp4 | Unexplained 
          -------------+----------------------------------------+-------------
                  ToD1 |   0.5465   -0.4582    0.0873    0.6955 |           0 
                  ToD2 |   0.5057    0.3017   -0.8023   -0.0979 |           0 
                  ToD3 |   0.5758   -0.2899    0.3376   -0.6858 |           0 
                  ToD4 |   0.3377    0.7842    0.4845    0.1905 |           0 
          --------------------------------------------------------------------

      I am now wondering if the fact that I am using panel data affects these results. I am working with two periods (thus, n=156); however, this variable does not change between these two periods. I am not sure if that is relevant.

      Thank you


      Comment


      • #4
        To me the key point is that the term "eigenvalue" doesn't have the same meaning for both commands. PCA Is strict that given what you supplied -- by implication a correlation matrix -- the PCs will correspond to eigenvalues (a) that are positive or just possibly zero and (b) that sum to the number of variables, 4. Judgments differ but I wouldn't call 1.42 high at all for 4 variables, and in any case its magnitude implies that picking just one PC to characterise the data ignores the majority of the variation.

        FA is working with quite different rules and criteria and "eigenvalues" can be negative with those rules, but I don't think the overall message is much different.

        Your judgement that you expect the correlations to be low is likely to be on target. You know that they are asking for answers to different questions. It follows that any desire to reduce four items to one PC or factor contradicts both that judgement and -- as shown -- the evidence provided by the data.

        I don't follow all your comments but my impression is that neither PCA nor FA offers any advantage over using your Likert items directly.
        Last edited by Nick Cox; 23 Feb 2024, 07:45.

        Comment


        • #5
          Thank you for your answer!

          I do agree with you that the FA (and seemingly also the PCA) do not provide evidence that ToD1-ToD4 can be made into a latent variable.

          The problem is that these questions were all created to measure one underlying concept (Type of Data - as the potential latent variable).
          The whole model works with panel data and consists of an IV, DV, and 4 moderators (Type of Data being one of the moderators). All the other questions from my survey seem to measure what is intended (with great eigenvalues, alphas, KMOs, etc. = I am able to make those latent variables).

          Seemingly, the direct Likert items measure something unexpected (due to me creating an unfit scale) and are not as useful on their own. At least that is my interpretation as I am unsure about how I would continue further analysis (e.g., regression; in my case xtreg) with 5 latent variables (IV, DV, 3 moderators) plus 4 direct items.

          This brings me to the ultimate question of whether it is then advisable to drop this moderator and slightly rework my model.

          Another option would be to pick one of the questions (e.g., ToD1) and use that as the measure of the Type of Data. But I think this would not be too reliable as it does not capture the whole concept of the Type of Data.

          I am open to any opinions. (I, myself, am now more inclined to drop the moderator).





          Comment


          • #6
            Your initial model appears dubious given your data and your results. That is an interesting discovery, not an irritating difficulty.

            Comment


            • #7
              Interesting point.
              I will think about it, thank you Nick.

              Comment

              Working...
              X