Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining multiple Likert scales and converting to cardinal measure

    Hi everyone,

    I am using the British Household Panel Survey (BHPS) datasets and I have a question regarding how to combine several Likert scale variables into one variable and then how to convert it to a cardinal measure.

    I am doing this because this is done in a paper that I want to use. They have tested it to show that the measures do correspond to the underlying variable (through category confirmatory factor analysis). So I have 6 separate Likert scales with 4 point answers for each one. An example is shown below.

    Click image for larger version

Name:	GHQ.png
Views:	1
Size:	4.3 KB
ID:	1419051

    I want to combine the six scales and then create a numerical variable for it with values ranging from -2 to +2 for most cases (the paper mentions that it is an approximate normal distribution).

    I would appreciate any ideas on how to proceed with this. Thanks.


  • #2
    Not familiar with that survey, but chances are that each variable is already coded as a number, maybe 0 to 3. Surely there is a survey codebook? If not, try:

    Code:
    tab concentration, nolabel
    That will show the values of the variable concentration (assuming that's the name) without the value labels. From there, you can use recode, for example:

    Code:
    recode concentration (0 = -2) (1 = -1) (2 = 1) (3 = 2)
    You can then use egen to quickly create a total:

    Code:
    egen ghq = rowtotal(concentration blah blah blah), missing
    Replace blah blah blah with your variable names. You have to note: if all the variables are missing, you get ghq = missing. However, if any one variable is missing, it's treated as zero. You may not want to do this if there are a lot of people who miss one or more of the GHQ questions. I like to do the following:

    Code:
    misstable summarize concentration blah blah blah
    egen ghq_missing = rowmiss(concentration blah blah blah)
    replace ghq = . if ghq_missing > 0
    The first line produces a summary. The second line produces a count of how many GHQ questions are missing. The last converts the entire score to missing if any one question is missing. In many circumstances, simply omitting anyone with any missing response (if the proportion is ~5%) from any subsequent analysis can be OK. For people with a minority of the questions missing, calculating their sum score based only on the questions they answered can be OK for some analyses. If you have more missing info, you may want to consider multiple imputation for a sensitivity analysis at the very least.

    Also, most properly, the sum score is not cardinal. It may be acceptable to treat it as such in a regression, but that could be a big debate in itself.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thanks Weiwen. It answers the question perfectly and I am quite sure I can proceed with this in the way it is done in the paper. As for treating the sum score as cardinal, I think it is fine to proceed with it since it is treated like that in the paper I look at, but I might ask for confirmation from my supervisor.

      Comment


      • #4
        Where does the +/-2 come in with items having a 4 category ordered response framework? On an interval scale, the -2, -1, 1, 2 coding treats the difference between the two middle categories as being twice as large as the difference between the lowest category and the next lowest category. I don't think that's what you want. I generally think treating missing as 0 on any item is not appropriate. It would mean very different things depending on how the items were coded, and that is often arbitrary. My usual approach to generating summated-rating scales is:

        alpha item1, item2, .... itemk, generate(newvar)

        k = number of items.

        This returns the mean of the items. By default, an observation missing on any item is defined as missing on the scale. Assuming the items are effect indicators that are assumed to reflect a common underlying latent construct, and if there is not a scoring protocol that explicitly states an observation missing on any indicator should be defined as missing, I usually use something like:

        alpha item1, item2, .... itemk, generate(newvar) missing(n)

        n = # of items allowed to be missing for any case. The choice of n is obviously arbitrary, I usually use something like n = .75*k. E.g., for any case I require about 75% of the items to be valid. If there are lots of cases with many missing items, then you need to engage in more complete sensitivity analysis.

        This returns the mean of the valid items.

        Depending on the use and preferences of persons with whom I'm working, I sometimes re-express that on an additive metric:

        gen newvar2 = newvar*k

        None of this gets you to a metric ranging from -2 to +2 with items coded on a 4 point ordered response scale. It would work for a 5-point ordered response scale.


        Comment


        • #5
          Originally posted by Brad Anderson View Post
          Where does the +/-2 come in with items having a 4 category ordered response framework? On an interval scale, the -2, -1, 1, 2 coding treats the difference between the two middle categories as being twice as large as the difference between the lowest category and the next lowest category. I don't think that's what you want. I generally think treating missing as 0 on any item is not appropriate. It would mean very different things depending on how the items were coded, and that is often arbitrary. My usual approach to generating summated-rating scales is:

          alpha item1, item2, .... itemk, generate(newvar)

          k = number of items.

          This returns the mean of the items. By default, an observation missing on any item is defined as missing on the scale. Assuming the items are effect indicators that are assumed to reflect a common underlying latent construct, and if there is not a scoring protocol that explicitly states an observation missing on any indicator should be defined as missing, I usually use something like:

          alpha item1, item2, .... itemk, generate(newvar) missing(n)

          n = # of items allowed to be missing for any case. The choice of n is obviously arbitrary, I usually use something like n = .75*k. E.g., for any case I require about 75% of the items to be valid. If there are lots of cases with many missing items, then you need to engage in more complete sensitivity analysis.

          This returns the mean of the valid items.

          Depending on the use and preferences of persons with whom I'm working, I sometimes re-express that on an additive metric:

          gen newvar2 = newvar*k

          None of this gets you to a metric ranging from -2 to +2 with items coded on a 4 point ordered response scale. It would work for a 5-point ordered response scale.

          Thanks Brad. I contacted the author of the paper in which this conversion is done and he hasn't got back to me. I will try an alternative approach.

          Comment

          Working...
          X