Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do i transform categorical variables to continuous scores?

    Hello Statlist,

    I am working on the Health Information National Trends Survey (HINTS) which has 7 variables measuring the functions of patient-centered communication (output below).

    I am looking at creating an overall patient-centered communication score, which is my outcome variable. But I am having a hard time doing this. So far I know I should average scores of the individual variables and then transform them to a 0–100 scale, but I'm not sure how. I am using STATA v. 14.

    Thank you in advance.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(chanceaskquestions feelingsaddressed involveddecisions understoodnextsteps explainedclearly spentenoughtime helpuncertainty)
     1  1  1  1  1  1  1
    -1 -1 -1 -1 -1 -1 -1
     1  1  1  1  1  1  2
     1  1  1  1  1  1  1
     3  3  3  2  2  2  3
     1  1  1  1  1  1  1
     2  3  2  1  1  2  2
     2  2  2  2  2  2  2
     4  4  4  3  3  4  4
     2  2  3  2  2  2  4
     1  2  1  1  1  1  1
     2  2  2  3  2  3  2
    -1 -1 -1 -1 -1 -1 -1
     2  2  1  1  1  2  2
     3  4  3  2  2  2  3
    -1 -1 -1 -1 -1 -1 -1
     2  4  3  2  4  4  3
     4  4  1  3  1  4  4
     1  3  2  2  2  2  2
     3  3  3  3  2  3  3
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     2  2  2  1  2  2  2
     1  2  2  1  2  2  2
     3  4  4  3  3  3  4
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
    -1 -1 -1 -1 -1 -1 -1
    -1 -1 -1 -1 -1 -1 -1
     2  1  2  2  2  1  2
     1  1  2  1  1  1  2
     1  1  1  1  1  1  1
     1  2  2  1  1  2  2
     1  1  1  1  1  1  1
     1  2  1  1  1  2  2
     1  2  1  1  1  3  2
     1  2  1  1  1  1  1
    -2 -2 -2 -2 -2 -2 -2
    -1 -1 -1 -1 -1 -1 -1
    -1 -1 -1 -1 -1 -1 -1
     1  1  1  1  1  1  1
     1  2  2  1  1  2  3
     1  1  1  1  1  1  1
     2  2  2  2  2  2  2
     1  4  1  1  1  1  1
     3  2  3  3  3  3  3
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     3  4  3  3  2  2  4
     1  2  1  1  1  2  1
     3  3  3  3  3  3  3
     1  1  1  1  1  1  1
     2  2  2  2  2  2  2
     1  1  1  1  1  1  2
     2  2  2  2  2  2  3
    -9 -9 -9 -9 -9 -9 -9
     1  2  2  1  2  2  2
    -1 -1 -1 -1 -1 -1 -1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
    -2 -2 -2 -2 -2 -2 -2
     2  2  1  1  1  2  4
     1  3  1  1  1  2  2
     1  1  2  2  1  1  2
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     2  2  2  1  1  1  1
     1  1  1  1  1  1  1
     1  2  1  1  1  1  2
     1  1  1  1  1  1  1
     1  1  1  1  1  1  2
     1  1  1  1  1  1  1
     1  3  3  1  1  2  3
     2  2  1  1  1  2  2
     1  1  1  1  1  1  1
     2  2  2  1  2  2  2
     1  1  1  1  1  1  1
     2  3  3  3  3  4  3
     1  1  1  1  1  1  1
     1  1  1  1  1  2  1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
    -1 -1 -1 -1 -1 -1 -1
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     2  1  1  1  1  1  2
     2  2  2  2  2  2  4
     1  1  1  1  1  1  1
     1  1  1  1  1  1  1
     2  2  2  2  2  2  2
     1  1  1  1  1  1  1
     1  1  1  1  1  1  4
     2  1  1  1  1  1  1
     2  2  2  3  3  3  2
     1  2  2  1  1  2  2
    end
    label values chanceaskquestions chanceaskquestions
    label def chanceaskquestions -9 "Missing data (Not Ascertained)", modify
    label def chanceaskquestions -2 "Question answered in error (Commission Error)", modify
    label def chanceaskquestions -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def chanceaskquestions 1 "Always", modify
    label def chanceaskquestions 2 "Usually", modify
    label def chanceaskquestions 3 "Sometimes", modify
    label def chanceaskquestions 4 "Never", modify
    label values feelingsaddressed feelingsaddressed
    label def feelingsaddressed -9 "Missing data (Not Ascertained)", modify
    label def feelingsaddressed -2 "Question answered in error (Commission Error)", modify
    label def feelingsaddressed -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def feelingsaddressed 1 "Always", modify
    label def feelingsaddressed 2 "Usually", modify
    label def feelingsaddressed 3 "Sometimes", modify
    label def feelingsaddressed 4 "Never", modify
    label values involveddecisions involveddecisions
    label def involveddecisions -9 "Missing data (Not Ascertained)", modify
    label def involveddecisions -2 "Question answered in error (Commission Error)", modify
    label def involveddecisions -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def involveddecisions 1 "Always", modify
    label def involveddecisions 2 "Usually", modify
    label def involveddecisions 3 "Sometimes", modify
    label def involveddecisions 4 "Never", modify
    label values understoodnextsteps understoodnextsteps
    label def understoodnextsteps -9 "Missing data (Not Ascertained)", modify
    label def understoodnextsteps -2 "Question answered in error (Commission Error)", modify
    label def understoodnextsteps -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def understoodnextsteps 1 "Always", modify
    label def understoodnextsteps 2 "Usually", modify
    label def understoodnextsteps 3 "Sometimes", modify
    label values explainedclearly explainedclearly
    label def explainedclearly -9 "Missing data (Not Ascertained)", modify
    label def explainedclearly -2 "Question answered in error (Commission Error)", modify
    label def explainedclearly -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def explainedclearly 1 "Always", modify
    label def explainedclearly 2 "Usually", modify
    label def explainedclearly 3 "Sometimes", modify
    label def explainedclearly 4 "Never", modify
    label values spentenoughtime spentenoughtime
    label def spentenoughtime -9 "Missing data (Not Ascertained)", modify
    label def spentenoughtime -2 "Question answered in error (Commission Error)", modify
    label def spentenoughtime -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def spentenoughtime 1 "Always", modify
    label def spentenoughtime 2 "Usually", modify
    label def spentenoughtime 3 "Sometimes", modify
    label def spentenoughtime 4 "Never", modify
    label values helpuncertainty helpuncertainty
    label def helpuncertainty -9 "Missing data (Not Ascertained)", modify
    label def helpuncertainty -2 "Question answered in error (Commission Error)", modify
    label def helpuncertainty -1 "Inapplicable, coded 0 in FreqGoProvider", modify
    label def helpuncertainty 1 "Always", modify
    label def helpuncertainty 2 "Usually", modify
    label def helpuncertainty 3 "Sometimes", modify
    label def helpuncertainty 4 "Never", modify

  • #2
    Disclaimer: I am not an expert on the topic so other users may have a greater knowledge to share. In any way you are interested in the topic of data reduction, and I recommend having a closer look at factor analysis
    Code:
    help factor
    or principal component analysis.
    Code:
    help pca
    In both cases I strongly recommend having a look at the complete PDF manual entry, where you will find comprehensive explanations and examples.

    Comment


    • #3
      Originally posted by Kobi Ajayi View Post
      . . .I know I should average scores of the individual variables and then transform them to a 0–100 scale, but I'm not sure how. I am using STATA v. 14.
      Code:
      mvdecode chanceaskquestions-helpuncertainty, mv(-9=.m \ -2=.e \ -1=.n)
      
      egen double sco = rowmean(chanceaskquestions-helpuncertainty)
      
      replace sco = (sco - 1) / (4 - 1) * 100
      
      assert inrange(sco, 0, 100) if !mi(sco)
      I don't recall whether gsem was available at the time of Release 14.2, but Felix's suggestion for factor analysis and by implication SEM is worth pursuing in lieu of the sumscore approach that you've been instructed to follow.

      Comment


      • #4
        Thank you, Felix and Joseph. I will run with your advice and code and then take it from there.

        Comment


        • #5
          For clarity, I would code
          Code:
          replace sco = (sco - 1) / (4 - 1) * 100
          as
          Code:
          replace sco = ((sco-1)/(4-1))*100
          Both will produce the same result in Stata. That is because in Stata's order of operations division precedes multiplication (-help operator-). But there are other programming languages for which multplication and division are considered equivalent in the order hierarchy and in some languages they might be performed left-to-right (which would be equivalent) or right-to-left (which would give a different result). Also, in some languages multiplication takes precedence over division, and that is commonly the way ordinary algebraic notation is interpreted as well (which would give the wrong result here). The code I am suggesting leaves nothing to chance or the imagination, it will never be misinterpreted by a human reader, nor by any language interpreter/compiler.

          If you make a point of programming in ways that do not depend on remembering how your particular language orders multiplication and division, you won't have to make a point of remembering how each language you work in handles this, and you may avoid a kind of mistake that is particularly difficult to spot but could have bad consequences if it goes unnoticed.
          Last edited by Clyde Schechter; 16 Oct 2020, 23:50.

          Comment


          • #6
            Good point. Or maybe in two steps. (In order to help avoid getting lost among the nested parentheses.)
            Code:
            replace sco = (sco - 1) / (4 - 1)
            replace sco = 100 * sco

            Comment

            Working...
            X