Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Principal Components Analysis for Composite Dependent Variable?

    Hi all,

    I was hoping to get some input on the appropriateness of principal components analysis in an analysis of experimental data.

    I recently came across an analysis that uses PCA to build a "composite" variable for dependent variables that measure related sets of evaluations, such as trust or specific types of beliefs. The data is from an experiment, in which they test different messages' effects on these attitudes. The goal is to determine which messages are most effective at moving respondents' opinions in the desired direction across a range of related evaluations,

    In my understanding, PCA is used as a data reduction tool when you have a number of measures that you are likely highly correlated. So, PCA enables you to build a more parsimonious model by identifying patterns in these correlated variables and then building a new "composite" measure that retains the information from variables that account for the most variation. This also addresses multicollinearity concerns because the fewer remaining variables included in the PCA are also uncorrelated/orthogonal.

    Soooo, I completely understand the value in using PCA to build independent variables, particularly when over-specification or multicollinearity is an issue. HOWEVER, I am circumspect of using it to build dependent variables. Especially in an experiment in which testing for how different conditions affect these various evaluations differently (or not at all) is the point of the whole analysis. Aren't you just throwing useful data away using PCA to form dependent variables? Isn't uncovering and understanding that underlying variation between your dependent variables across different conditions the whole point? And, if you were to build a composite variable, would an additive or averaged variable be better? When I do see composite dependent variables in analyses of experimental results, I feel as though they are more often additive or averages I can't recall ever seeing PCA used in this way.

    Am I missing something? Is there value in using PCA to build dependent variables in this case? What would you say is the preferable way to build a composite dependent variable measuring related attitudes in an experiment?

    Any thoughts or feedback anyone might have would be very appreciated.

    Thank you!!!

  • #2
    I suspect at least some of this can be analysed as confusion from the literature given different senses of the words "dependent" and "independent".

    1. The principal components produced by PCA are uncorrelated (which does not necessarily mean independent of each other, but people often loosely assume that).

    2. This construction of PCs could not possibly work usefully unless there were at least moderate correlations (a pattern of dependence) between the original variables, which allows some PCs to capture common patterns of variability.

    3. Whether you think of PCs as possible dependent variables (also better known as response, outcome, criterion, target or regressand variables) or independent variables (also better known as predictors, explanatory variables, covariates, describing variables, or regressors) for some causal, explanatory or behavioural model is essentially a separate issue.

    I'd stress further that PCA is often presented as essentially a transformation procedure: there is no question of building a model or of a generating process or of a probabilistic mechanism. In contrast many people regard PCA as a primitive, degenerate or limiting case of more interesting models, e.g. various kinds of structural equation models.

    So, I doubt I've answered your question because I don't think I understand it. But I'm advising

    a. Be careful on what you mean by dependent and independent.

    b. Be careful that even PCA is a controversial technique. I leave factor analysis to believers and evangelists.

    Comment


    • #3
      Thank you.

      Yes, all good points.

      What I meant by DV is outcome variables. Sorry for the confusion.

      I, too, am a skeptic of PCA. And, I have a particularly hard time understanding why you would want to use it to build a composite of your outcome variables. Why would you want to transform your outcome variables? Why would you want to use a "data reduction" tool for your outcome variables? Aren't you just throwing away valuable information? Again, I can somewhat see the argument for predictor variables where multicollinearity or over-specification are concerns, but what does one gain by using PCA to transform your outcome variables? It really does not make sense to me, but I have found myself in a debate about the value of using that approach to create a composite outcome variable from a small number of correlated outcome variables, and I just can't see the value of using PCA in this way. If anything, it seems problematic for various reasons. So, I just wanted to make sure I wasn't missing something important or overlooking something...

      Comment


      • #4
        I am quite intrigued by PCA.

        The simplest argument for it is that you often have bundles of variables that are highly correlated. In morphometric problems, for example, you may have a bundle of variables to with size and another bundle to do with shape, and you suspect that a reduction in the number of "variables" is possible. But in practice I find that the results of PCA often enable you to dispense with it, because you see a way of respecting the correlation structure by just choosing particular variables.

        Another common argument is that PCA allows you to identify a low-dimensional space in which to plot data. Being able to interpret the PCs is then helpful but not essential. Reductions of vegetation composition often hint at controls related to e.g. moisture and temperature, at least loosely. Here the key is that the high dimensionality of the original data makes direct plotting infeasible.

        In general, there is no obligation to ignore or discard the original variables. PCA can be demystified by insisting that it is a multivariate transformation; as with transformations elsewhere a transformation doesn't have to be **indubitably the right scale to work on** to be **helpful in seeing structure**.

        Comment


        • #5
          I'll add one comment on whether PCA discards data. A full PCA on k variables provides k principal components. The transformation from the original variables to the k components is invertible: no information is created or destroyed. However, if, as is common practice, the desire is to reduce the number of dimensions being worked with, you choose to use only some smaller number of the components for analysis (often just the first, in fact), then, indeed, you are discarding information. You can even get a sense of how much information is discarded by looking at the eigenvalues of the components.

          Also, in your first post, you suggested that the interventions in your experiment might affect each of the original response variables differently. If that is the case, then the covariance matrix for these response variables will differ between intervention and non-intervention states, which means that a PCA performed for either state does not appropriately represent the other state. So in that situation I would be very wary of using principal components in lieu of your original response variables.

          Comment


          • #6
            Thank you for following up, Nick. Very helpful.

            Yes, Clyde, you are exactly right. I do believe the interventions might affect the original response variables differently, which is exactly why I was wary of using PCA for this purpose. I appreciate your feedback on this!

            Comment

            Working...
            X