Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scale Construction and Standardization

    Hello,

    I have a couple of questions regarding scale-construction.

    Let's say that I have 4 variables: X1, X2, X3, and X4.

    X1 is a Likert-Scale: it takes values from 1 to 5.
    X2 is a Likert-Scale: it takes values from 1 to 4.
    X3 is a binary variable: it takes 0 or 1.
    X4 is a variable constructed by diving a variable to another variable: it takes values from 0 to 1, but they are proportional, like 0.20 or 0.45.

    My questions are as follows:
    1. If I want to create a scale out of X1, X2, and X3, should I standardize them all (I guess standardizing X3, which is a binary variable, makes no sense).
    2. If I standardize these variables and add them together, should I standardize the final variable again if I want to estimate it in a regression?
    3. Should I standardize X4, as it is proportional, to make all the variables comparable.
    Thanks a lot!

  • #2
    Let me bump this one time.

    Comment


    • #3
      If you want to create a scale out of 4 variables, then you typically want to give them equal weight or you want to estimate the (optimal in some sense) weights. If the units of the variables differ, then the variable with the smallest unit gets the highest weight, so that is not what you want. Standardization helps with that. If that is the approach for you, then you definitely want to apply it to all variables, including binary variables.

      You can standardize in different ways:
      • You can standardize such that the mean is 0 and the standard deviation is 1
      • You can standardize such that the minimum is 0 and the maximum is 1 (this tends to work well when all variables have a fixed range, but not so well otherwise)
      • You can standardize using percentile scores, that is, the proportion of people who have less than you. See: https://www.stata.com/support/faqs/s...ing-positions/
      Standardizing the sum can sometimes help interpretation: the sum of standardized variables is itself not standardized. However, if you use that scale as an explanatory/right-hand-side/independent/x-variable then that is not necessary: https://journals.sagepub.com/doi/pdf...867X1201200211
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thank you very much Maarten! These are all extremely helpful! I wish you a great weekend.

        Comment


        • #5
          Maarten Buis Hi there! Your suggestion "You can standardize such that the minimum is 0 and the maximum is 1 (this tends to work well when all variables have a fixed range, but not so well otherwise)"I was wondering if you could expand on that? I've been using
          Code:
          egen newvar = std(oldvar), mean(0) sd(1)
          and I haven't been able to figure out how to do that. Please let me know if you have any code suggestions. or anywhere I could look for help on this. I looked at the links you shared in your comment but I think they were more related to your other suggestions.

          Comment


          • #6
            The code you showed will set the mean at 0 and the standard deviation to 1. That is different from setting the minimum at 0 and the maximum at 1. To do the latter you can use the code below.
            Code:
            sum oldvar, meanonly
            gen newvar = (oldvar-r(min))/(r(max)-r(min))
            This way of standardizing is of course highly influenced by outliers. So this way of standardizing is not suitable for variables with an open range, for example age or income. This is more for variables with a fixed range like Likert iterms. However, if you have all Likert items, you don't need to standardize. As a consequence this is one of the least useful ways of standardizing variables. If I standardize variables I most often use percentile scores.

            Code:
            egen p_oldvar = rank(oldvar)
            qui count if !missing(oldvar)
            replace p_oldvar = (p_oldvar - 0.5)/r(N)
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Sam Volpe This related thread seems to need a cross-reference:

              https://www.statalist.org/forums/for...nd-min-value-1

              Perhaps you would be well advised to back up and explain your context, including why you seek some kind of standardization. Like anything else in this territory, sometimes standardization is sorely and surely needed, and other times it is a distraction or makes matters more complicated than is helpful.

              Comment


              • #8
                Maarten Buis Thank you very much for your concise and clear code instructions, and the explanation surrounding when to use the different approaches. Very helpful!

                Comment


                • #9
                  Dr Buis
                  I had three questions on Likert scale from 1 -7 and I summed up their z-score to create an Anxiety Scale. The correlation between the 3 questions was high and the Cronbach Alpha was .80 so it justified my approach. However, now I want to add another variable (binary outcome of Distress (Y/N)) to the Scale to create a two-column DV e.g high Anxiety/yes distress, low anxiety/yes distress, high anxiety/no distress, low anxiety/no distress. How can I do this? I envision using Ordinal Logistic Regression. The max and min of the Anxiety Scale was +4 and -4. Do I divide it up in order to create the category high vs low Anxiety scale? I have 5 predictors and I want to see the effect size of each predictor on the outcome. Thank you for your time and kind consideration.

                  Comment

                  Working...
                  X