Hey everyone,
I'm kind of new to stata and currently working on my bachelors thesis. I'm trying to create a "migration hostility index" from different variables using the european social survey, unfortunately they do not have the same scales (some have 5 point likert, some have 10 point semantic differential). As the index will be my dependent variable in an OLS Regression, it would be great to find a way combining the different variables into one index. The easiest way, of course, would be to standardize every variable. However, I believe this would not be statistically correct, as 5-point likert and 10-point semantic differentials can not be treated alike. So what I did is run factor analysis using "factor" in stata with all 10 variables. Fortunately I found only one factor with eigenvalue beyond 1 (i.e. eigenvalue of 4.25), and therefore I used "predict" to create a new variable containing each observation's score on said factor. So far this kind of works out, however, I'm not completely sure if this way of dealing with the above issue of combining different scales is statistically proper. It would be great to hear some experts' comments on my procedure.
Thanks,
Tilman
I'm kind of new to stata and currently working on my bachelors thesis. I'm trying to create a "migration hostility index" from different variables using the european social survey, unfortunately they do not have the same scales (some have 5 point likert, some have 10 point semantic differential). As the index will be my dependent variable in an OLS Regression, it would be great to find a way combining the different variables into one index. The easiest way, of course, would be to standardize every variable. However, I believe this would not be statistically correct, as 5-point likert and 10-point semantic differentials can not be treated alike. So what I did is run factor analysis using "factor" in stata with all 10 variables. Fortunately I found only one factor with eigenvalue beyond 1 (i.e. eigenvalue of 4.25), and therefore I used "predict" to create a new variable containing each observation's score on said factor. So far this kind of works out, however, I'm not completely sure if this way of dealing with the above issue of combining different scales is statistically proper. It would be great to hear some experts' comments on my procedure.
Thanks,
Tilman
Comment