Combining variables on a likert scale to one

Guest
#1

Combining variables on a likert scale to one

21 Jan 2017, 05:47

Hi people! I have a survey about Quality of work! Now there was asked to decide how much statements are right:
D: Your boss can instruct workers.
F: The supervisors explain to you openly if they are not satisfied with your work.
H: The supervisors communicate clearly their aims.
C: I think my boss is treating me fair in all aspects.

So, the people have to choosee for each question:
1: Totally agree
2: Partially agree
3: drawn
4: not completely agree
5: totally not agree

I want to create a variable called neglect, if there is no good communication in the firm and so on.
How can I combine these for my analysis?
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

21 Jan 2017, 09:22

Just add them.

Code:

egen neglect = rowtotal(D,F,H,C), missing

That will treat missing as 0, which you may or may not wish to do, and if all of the questions are missing, it will return a missing. If you with to return a 0 even if all questions are missing, delete the missing option.

Either way, you'll wind up with a quasi-continuous variable with a range of 4 to 20.

You may wish to investigate, though, if the items do in fact hang together well. The Cronbach's Alpha command will show you the internal consistency of the scale, i.e. how well the items hang together.

http://www.stata.com/manuals13/mvalpha.pdf

Sometimes, not enough attention gets paid to psychometric properties of scales. Even for a quick and dirty analysis, it might help to say you verified that the Cronbach's Alpha was over about 0.7 and perhaps also discuss why these questions got asked, e.g. who came up with them, are they known to relate to poor communication, some more theoretical concepts. Perhaps I am being pedantic here, but a lot of my own work (in health outcomes) involves measurement of subjective concepts like self-perceived health or patient experience of care, and we want to know that we are measuring them well.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Guest
#3

22 Jan 2017, 06:22

Ok, lets say I sum them up (unweighted?), how can i build a independent variable out of that, because i have so many binary and likert ones in it......?
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

22 Jan 2017, 06:33

In #1, I understood you have only Likert- scale variables. However, it seems in #3 you said you have "so many binary" variables as well.

I gather you could get more precise advice, as long as you present further details of the data, as recommended in the FAQ.

That said, and only trusting in what was stated in #1, sometimes we see Likert-scales being transformed into scales ranging 0 to 20 or 0 to 100, hence producing a variable with the mean value of the questions selected.

All in all, with too many questions to be assembled in a single analysis, one should reflect about parceling as well.

Last edited by Marcos Almeida; 22 Jan 2017, 06:35.

Best regards,

Marcos
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#5

22 Jan 2017, 09:01

Why combine them at all? At some point these were judged different questions. Why mush them together and pretend that they really measure some ill-defined dimension? (Sure, I know that some fields spend much of the time doing precisely this...)
Comment
Guest
#6

22 Jan 2017, 10:26

So Nick, you think if they are correlate just chose one item which contains the variation of the rest in it?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

22 Jan 2017, 12:02

Whether the responses to the different items correlate is, once you have the data, not just a matter of opinion: it's an empirical question. And you can see the results in your data and decide. As Weiwen Ng noted earlier, Cronbach's alpha gives a reasonable summary of the extent to which these items have shared variance. Factor analysis is another approach. If either of these procedures suggest there is appreciable shared variance here, then averaging the responses will concentrate the signal and average down the noise. In that situation it would not make sense to pick one and disregard the others: no one variable will capture the shared variation as well as the average.

On the other hand, there is not appreciable shared variance, though, then it makes more sense to just use all the items as separate predictors. Here, too, I would not just pick a single variable: if they are all measuring different things, that would just be throwing away all of the information in the unpicked variables.
2 likes
Comment
Guest
#8

23 Jan 2017, 03:45

Ok. I will summarize for my understanding: If I think for example about harm as one working condition, I will pick up first all questions, where I think that These could have something in common responding to the Content. Then I can check the factor Analysis (between them?) and Cronbach's Alpha, if it is so. ( I don't know some thumb rules which numbers should be ok, greater than 0.7?)

And if this is the case, what shall I do then?
I could sum them up, thats great and then I could recode them if there are no numbers between, like 7 to 4 and 9 to 5 if there is no 8 in total sum (example of 2 questions of a likert scale), because of the ordinal nature. But then how shall I handle them, lets say the likert scale is about 1 "strongly agree" 2 "agree" 3 "drawn" 4 "not agree" and 5 "totally not agree", for example for me it is only for interest if there is harm on the workplace, so only categories 1 and 2 make sense, but how should I work with them, just cutting binary is problematic between both or?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

23 Jan 2017, 09:18

In your original post you said you want to create a variable called neglect, which you perceive to be a construct that is at least imperfectly measured by the responses to the questions in your #1 post. Assuming that these responses do show common variance, averaging those responses is probably the simplest way to get an estimate of the construct. Although weighted averages sometimes are better than simple averages (particularly if the scales of the variables differ--probably not an issue here), identification of appropriate weights would require factor analysis, and in contexts like these the results are typically not much different from a simple average. If you really want to do that refined an analysis, you probably will want to use structural equations modeling when you do analyses based on neglect. It doesn't sound, though, you are looking for that level of statistical refinement.

and then I could recode them if there are no numbers between, like 7 to 4 and 9 to 5 if there is no 8 in total sum (example of 2 questions of a likert scale),

First of all, it is unlikely there will be a gap like that. Even if there is, it sounds like you are fixated on making a dichotomy out of this. That's a bad idea, plain and simple, and I have no advice for you on how to do that, only to not do it. If you want to provide descriptive statistics on neglect, you score each respondent with his or her average response to the items in the neglect scale, and then your describe our sample with means and standard deviations, or medians and interquartile ranges. You use the neglect score directly in any correlations, regressions, t-tests, etc. that you do for analyses that seek to find associations between perceived neglect and other employee attributes or circumstances.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#10

23 Jan 2017, 09:34

The thread http://www.statalist.org/forums/foru...than-one-items raised very similar issues.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

23 Jan 2017, 09:37

Originally posted by David March View Post

Ok. I will summarize for my understanding: If I think for example about harm as one working condition, I will pick up first all questions, where I think that These could have something in common responding to the Content. Then I can check the factor Analysis (between them?) and Cronbach's Alpha, if it is so. ( I don't know some thumb rules which numbers should be ok, greater than 0.7?)

And if this is the case, what shall I do then?
I could sum them up, thats great and then I could recode them if there are no numbers between, like 7 to 4 and 9 to 5 if there is no 8 in total sum (example of 2 questions of a likert scale), because of the ordinal nature. But then how shall I handle them, lets say the likert scale is about 1 "strongly agree" 2 "agree" 3 "drawn" 4 "not agree" and 5 "totally not agree", for example for me it is only for interest if there is harm on the workplace, so only categories 1 and 2 make sense, but how should I work with them, just cutting binary is problematic between both or?

One very rough rule of thumb is that the Cronbach's Alpha should be >= 0.7. See citations in source below.

https://www.ijme.net/archive/2/cronbachs-alpha.pdf

Dichotomizing the questions would throw away information, so don't do it. You're probably better off just using the scale (i.e. the total of the 4 questions, like you asked about originally). You'd then just use that variable, treating it as continuous, in a regression analysis. That preserves the most information in the questions. You will have a scale with a total score of 4 to 20.

A bunch of posters here are trying to get you to realize, though, that measuring subjective things, like employees' perception of neglect, is very tricky. You can get a whole PhD on measurement. A lot of people would prefer that you had some good theoretical basis for considering those items to represent your construct of neglect. With only 4 items, you can get a measure of internal consistency, but having sufficient internal consistency does not guarantee that your items really do represent a single construct well enough. As Nick was alluding to, there's no guarantee that those items really represent 2, 3, or 4 separate constructs. You mentioned factor analysis yourself, but I don't think you've got enough items to really show up multiple dimensions in a factor analysis anyway.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Guest
#12

23 Jan 2017, 09:59

Originally posted by Clyde Schechter View Post

In your original post you said you want to create a variable called neglect, which you perceive to be a construct that is at least imperfectly measured by the responses to the questions in your #1 post. Assuming that these responses do show common variance, averaging those responses is probably the simplest way to get an estimate of the construct. Although weighted averages sometimes are better than simple averages (particularly if the scales of the variables differ--probably not an issue here), identification of appropriate weights would require factor analysis, and in contexts like these the results are typically not much different from a simple average. If you really want to do that refined an analysis, you probably will want to use structural equations modeling when you do analyses based on neglect. It doesn't sound, though, you are looking for that level of statistical refinement.

First of all, it is unlikely there will be a gap like that. Even if there is, it sounds like you are fixated on making a dichotomy out of this. That's a bad idea, plain and simple, and I have no advice for you on how to do that, only to not do it. If you want to provide descriptive statistics on neglect, you score each respondent with his or her average response to the items in the neglect scale, and then your describe our sample with means and standard deviations, or medians and interquartile ranges. You use the neglect score directly in any correlations, regressions, t-tests, etc. that you do for analyses that seek to find associations between perceived neglect and other employee attributes or circumstances.

So do I get you right? I am averaging the sum of questions which could belong together by content and being tested by Cronbach's alpha for example as a scale which is founded to have influence by the science and theory? But then I treat this variable as a continous variable.... I am going to make ordered Logit/Probit estimations, maybe bivariate Probit estimations or even structural ones.....this variable is not the main interest of my work but another one which is pretty similar.....And if I have only one likert scale item with 1:totally agree, 2: agree, 3:drwan etc. and I am only interested in agreement, I could create one binary for strong agree, one binary for agree, and the rest is the reference group, because not of interest ; there is no explicit agreement?

Last edited by David March; 23 Jan 2017, 10:04.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

23 Jan 2017, 10:08

I am averaging the sum of questions

Not exactly. Suppose you have the questions' responses in four variables (one for each question), call them qf, qh, qc, qd. Then you don't want to average the sum (I don't even know what that would mean). You want the average of those five variables, for example,

Code:

egen neglect = rowmean(qf qh qc qd)

You can use this variable as a predictor in your logit/probit estimations; evidently it cannot serve as the dependent variable.
Comment
Guest
#14

23 Jan 2017, 10:11

Yeah I understand that, but then it would get a continous interpretation even it is only ordinal..... is that ok? Yeah, I will use it only as independent variable.
Thats not my phd work, it is my master thesis.
And if I have only one likert scale item with 1:totally agree, 2: agree, 3:drwan etc. and I am only interested in agreement, I could create one binary for strong agree, one binary for agree, and the rest is the reference group, because not of interest ; there is no explicit agreement?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

23 Jan 2017, 10:22

but then it would get a continous interpretation even it is only ordinal..... is that ok?

This is somewhat controversial. My opinion is yes. It is not uncommon to treat ordinal variables as if they were continuous. With Likert-scale items this is particularly common. There is a widespread belief (though I'm not aware of any strong empirical foundation for it) that the points on the scale are nearly equidistant in terms of the psychological construct of agreement, which would justify treating it as continuous. If you wanted to verify that this is appropriate for your data, in your model you could first run it using the continuous version, then run it with the continuous version and a completely discrete version, and do a likelihood ratio test comparing the two models. If the likelihood ratio test doesn't reject, then you would be on pretty safe grounds in saying that treating neglect exclusively as a continuous linear variable is OK, at least for these purposes with this data. (Note: assumes you have a reasonably large sample size so that the LR test is adequately powered.)

And if I have only one likert scale item with 1:totally agree, 2: agree, 3:drwan etc. and I am only interested in agreement, I could create one binary for strong agree, one binary for agree, and the rest is the reference group, because not of interest

Yes, if you want to report the prevalence of agreement with the stem of an item, it is reasonable to dichotomize 1 and 2 vs 3-5.
1 like
Comment

Announcement

Combining variables on a likert scale to one

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment