Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Likert analysis with latent variables method choice: SEM/CFA vs IRT vs ologit vs OLS

    Hi all,

    I have a question surrounding what method to choose to infer causal relationships between latent constructs.
    I know that there are several existing topics surrounding (parts of) this topic already, but to be honest, I feel completely overwhelmed by the amount of different suggestions.
    Additionally, I thought that perhaps a topic where more of these together are handled instead of just one versus another might be useful for future users.
    Here I go:

    I have survey data where several 7-point Likert scale items measure a smaller amount of latent variables (more specifically, 34 items on 11 latent variables).
    I am interested in the causal relationship between some of these latent variables.
    In addition to these Likert scales, I have some standard (control) descriptive statistics like gender and age.

    I would say the easiest method would be to just add or average these items to create the latent variables, and regress these.
    What I now did as a basic first estimation is to use
    Code:
    alpha xyz1-xyz3, std item gen (xyz)
    to both check the Cronbach's Alpha of the latent variables, as well as generate the latent variables.
    I then simply regressed these using OLS.
    However, this feels wrong and I am quite sure this violates several assumptions.

    This brought me into a forest of different methods I could employ, none of which I am familiar with.
    SEM seems to be the most exhaustive/flexible, as far as I can tell.
    I have tried some different things with SEM, but am not at all sure I am doing it all correctly (I have found this site from David Kenny and Stata's own overview quite helpful, but it is still a lot to comprehend)
    It raises questions for me like when to covary variables, when(if) to drop factors, whether or not I am overfitting, etc.
    In addition, it gives me problems with adding control variables.

    ologit sounds like a simple extension/improvement of what I am doing now.
    Would simply that be exhaustive/robust enough?

    I have read a lot of people pointing towards IRT (/RSM), and it looks promising with some insightful graphs.
    However, for some latent variables, not all items have the same range as sometimes one option is never chosen, which gives me some problems with putting that through IRT.
    Additionally, I have no clue if/how it is possible to do some causal inference with IRT, or if it simply helps give a (graphical) overview of my latent variables.

    Finally, there is the added worry about sample size. My sample size is around 110, which is not that large, and probably greatly limits my options if I understand things a bit.

    In summary these are my concrete questions:
    1) What model would be best in this situation? Why?
    2) Is that model worth it? Or is a simpler model also fine as results are often similar?
    3) Can/should I combine several models? E.g., (no clue if this sounds stupid) use SEM for CFA to get some validity checks, but do the analysis itself with ologit.
    4) Can I run several models as some form of robustness test on a simpler one?
    5) (bonus) Are there any (other) validity checks I should incorporate?

    Any help is greatly appreciated, thanks in advance!

  • #2
    Johannes,

    This is a huge and active area of methodological and substantive research. There are multiple primer type articles out there, but you would do yourself a favor by reading a good intro book and taking a course. More than one of each is preferable. Measurement modeling, which falls under SEM (and IRT), is its own topic entirely. Causal models can be estimated in SEMs, but this is also a massive topic. A good start for the merging of SEM and causality is this article by Bollen & Pearl.

    I think in your situation with such a small sample size, you probably need to aim smaller with your ambitions, unless you want to use full Bayesian modeling (another massive topic in its own right!). I would probably suggest that you use alpha reliability to help justify the averaging or combining of sets of variables. That said, please read carefully about what alpha does and does not tell you and its limitations. Similarly, pursue further exploratory factor analysis where you jointly examine whether you indeed have 11 latent variables. Once you have done that and are satisfied that your scales have some reliability and empirical basis, you could proceed with regression-based approaches for your ultimate model of interest using the scores for the scales you created using averaging (e.g., with the alpha command).

    Comment


    • #3
      Hi Erik,

      Thanks a lot for your reply with such an extensive amount of references! I will look into them now!

      With regards to what you mention on exploratory factor analysis: I have based my latent variables and measurement factors on existing literature, which, if I am not mistaken, means I CFA would be the correct approach right?
      Additionally, based on your reply I take that the (causal) regression and the reliability/validity analysis are two distinct parts, and that I thus could incorporate one method for the reliability/validity and another for the causal analysis?
      Finally, what if my scales are not that reliable? They are based on existing literature. Would I then still use them, but mention this as a limitation in my (masters) thesis, or can I then not use that variable?

      Perhaps (some of) these questions are answered in the references you gave me, in which case I'm sorry for asking them already here

      Comment

      Working...
      X