Hi all,
I have a question surrounding what method to choose to infer causal relationships between latent constructs.
I know that there are several existing topics surrounding (parts of) this topic already, but to be honest, I feel completely overwhelmed by the amount of different suggestions.
Additionally, I thought that perhaps a topic where more of these together are handled instead of just one versus another might be useful for future users.
Here I go:
I have survey data where several 7-point Likert scale items measure a smaller amount of latent variables (more specifically, 34 items on 11 latent variables).
I am interested in the causal relationship between some of these latent variables.
In addition to these Likert scales, I have some standard (control) descriptive statistics like gender and age.
I would say the easiest method would be to just add or average these items to create the latent variables, and regress these.
What I now did as a basic first estimation is to use
to both check the Cronbach's Alpha of the latent variables, as well as generate the latent variables.
I then simply regressed these using OLS.
However, this feels wrong and I am quite sure this violates several assumptions.
This brought me into a forest of different methods I could employ, none of which I am familiar with.
SEM seems to be the most exhaustive/flexible, as far as I can tell.
I have tried some different things with SEM, but am not at all sure I am doing it all correctly (I have found this site from David Kenny and Stata's own overview quite helpful, but it is still a lot to comprehend)
It raises questions for me like when to covary variables, when(if) to drop factors, whether or not I am overfitting, etc.
In addition, it gives me problems with adding control variables.
ologit sounds like a simple extension/improvement of what I am doing now.
Would simply that be exhaustive/robust enough?
I have read a lot of people pointing towards IRT (/RSM), and it looks promising with some insightful graphs.
However, for some latent variables, not all items have the same range as sometimes one option is never chosen, which gives me some problems with putting that through IRT.
Additionally, I have no clue if/how it is possible to do some causal inference with IRT, or if it simply helps give a (graphical) overview of my latent variables.
Finally, there is the added worry about sample size. My sample size is around 110, which is not that large, and probably greatly limits my options if I understand things a bit.
In summary these are my concrete questions:
1) What model would be best in this situation? Why?
2) Is that model worth it? Or is a simpler model also fine as results are often similar?
3) Can/should I combine several models? E.g., (no clue if this sounds stupid) use SEM for CFA to get some validity checks, but do the analysis itself with ologit.
4) Can I run several models as some form of robustness test on a simpler one?
5) (bonus) Are there any (other) validity checks I should incorporate?
Any help is greatly appreciated, thanks in advance!
I have a question surrounding what method to choose to infer causal relationships between latent constructs.
I know that there are several existing topics surrounding (parts of) this topic already, but to be honest, I feel completely overwhelmed by the amount of different suggestions.
Additionally, I thought that perhaps a topic where more of these together are handled instead of just one versus another might be useful for future users.
Here I go:
I have survey data where several 7-point Likert scale items measure a smaller amount of latent variables (more specifically, 34 items on 11 latent variables).
I am interested in the causal relationship between some of these latent variables.
In addition to these Likert scales, I have some standard (control) descriptive statistics like gender and age.
I would say the easiest method would be to just add or average these items to create the latent variables, and regress these.
What I now did as a basic first estimation is to use
Code:
alpha xyz1-xyz3, std item gen (xyz)
I then simply regressed these using OLS.
However, this feels wrong and I am quite sure this violates several assumptions.
This brought me into a forest of different methods I could employ, none of which I am familiar with.
SEM seems to be the most exhaustive/flexible, as far as I can tell.
I have tried some different things with SEM, but am not at all sure I am doing it all correctly (I have found this site from David Kenny and Stata's own overview quite helpful, but it is still a lot to comprehend)
It raises questions for me like when to covary variables, when(if) to drop factors, whether or not I am overfitting, etc.
In addition, it gives me problems with adding control variables.
ologit sounds like a simple extension/improvement of what I am doing now.
Would simply that be exhaustive/robust enough?
I have read a lot of people pointing towards IRT (/RSM), and it looks promising with some insightful graphs.
However, for some latent variables, not all items have the same range as sometimes one option is never chosen, which gives me some problems with putting that through IRT.
Additionally, I have no clue if/how it is possible to do some causal inference with IRT, or if it simply helps give a (graphical) overview of my latent variables.
Finally, there is the added worry about sample size. My sample size is around 110, which is not that large, and probably greatly limits my options if I understand things a bit.
In summary these are my concrete questions:
1) What model would be best in this situation? Why?
2) Is that model worth it? Or is a simpler model also fine as results are often similar?
3) Can/should I combine several models? E.g., (no clue if this sounds stupid) use SEM for CFA to get some validity checks, but do the analysis itself with ologit.
4) Can I run several models as some form of robustness test on a simpler one?
5) (bonus) Are there any (other) validity checks I should incorporate?
Any help is greatly appreciated, thanks in advance!
Comment