Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any way to standardize ordinal/categorical variables

    Hi, I have different ordinal/categorical variables measured in different scale, some are 1-7, 1-3, 1-10 etc. How can I standardize the variables to measure on the same scale say 0-1 in Stata 12?

  • #2
    If you want to take them one by one, you can apply the following formula:
    -for the one that is 1-7: (initial value -1)/6
    -for the one that is 1-3: (initial value -1)/2
    ...and so on

    Example
    Initial Rescaled
    1 0.00
    2 0.17
    3 0.33
    4 0.50
    5 0.67
    6 0.83
    7 1.00

    Comment


    • #3
      You could use sheaf coefficients for that, see: http://www.maartenbuis.nl/software/sheafcoef.html
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Hi Stefan, Thanks, could you elaborate the command please?

        Comment


        • #5
          He just made the range of all variables go from 0 to 1. If you added your ordinal variables linearly then the unit of all these variable represent going from minimum to maximum, which sometimes you can consider to be comparable.

          However, I would be hesitent to do so for two reasons:
          1. You would need to enter your ordinal variables linearly, which is often problematic for ordinal variables
          2. Whether the minimum in a 3 category variable is really comparabel to the minimum in a 7 category variable (and similarly for the maxima) is often doubtful for substantive reasons. In a three category variable the minim is a generic negative response to the quesiton, while the minimum in a 7 category variable a severe negative response to the question. So the unit for the recoded 3 category variable is "generic negative to generic positive" while the unit for the 7 category variable is "severe negative to severe positive". I would not call that comparable.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Suppose your variable named var1 is 1-7, then you have to use:
            .generate var1_rescaled= (var1-1)/6

            But keep in mind what Maarten wrote. Depending on what these numbers are and what do you want to do with them it's a good idea or not to do like in my example.

            Comment


            • #7
              Thanks. @Maarten Buis, I know there are problems with simple arithmetic standardization. I have tried to go through your paper. Sorry if I am too naive to say, it talks about post estimation standardization. I need a pre-estimation standardization, Could you please tell me how I can use sheaf coeff for rescaling the variables (0-1)?,

              Comment


              • #8
                Why would you need pre-estimation standardization?
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  ~~Ignore me if I’m off track, but this discussion seems to be predicated on the assumption that the original scaling (ie 1-7, 1-3, 1-10 etc) has some underlying meaning. My guess is that they are arbitrary scaling and so any re-scaling seems to me to be neither good nor bad, just convenient.

                  Comment


                  • #10
                    Originally posted by Stefan View Post
                    generate var1_rescaled= (var1-1)/6
                    I think this could be stated more abstract as "(var1-(min(var1))/(max(var1)-(min(var1))". For three variables "var1", "var2" and "var3" it would be:
                    Code:
                    foreach var in var1 var2 var3 {
                        summarize `var' , meanonly
                        generate `var'_std=(`var'-`r(min)')/(`r(max)'-`r(min)')
                    }
                    Regards
                    Bela

                    Comment


                    • #11
                      Another (simple) possibility here would be to replace each ordinal value with its approximate fractional rank, giving at least some crude capacity to compare scores from the two variables. One common approximation is so-called "ridit" scoring, which is conveniently available in the -egenmore- package from SSC.
                      (e.g. egen newvar1 = ridit(var1) )

                      Regards, Mike

                      Comment


                      • #12
                        @Maarten I want build an index after standardizing the variables. Building index is another story. I am not adroit enough to interpret MCA (multiple correspondence), I am trying factor analysis for this. If any help on interpreting MCA appreciated.

                        But as said, it is necessary for some variables to assume an underlying value behind the rank of the categorical variables.

                        Comment


                        • #13
                          I am still not convinced that you need to standardize: None of the things you want to do require standardization. Could you tell us more why you think you need to standardize your ordinal variables?

                          My intuition is that you don't need to standardize. This could be a very good thing, as pre-estimation standardization of ordinal variables is very tricky. The concept of standardization fits much better with continuous variables, so any "standardization" of ordinal variables is going to be somewhat ad hoc, in a way that may work in some special situations, but certainly not in all or even the majority of sitiations. So if you can avoid the entire issue, so much the better.
                          ---------------------------------
                          Maarten L. Buis
                          University of Konstanz
                          Department of history and sociology
                          box 40
                          78457 Konstanz
                          Germany
                          http://www.maartenbuis.nl
                          ---------------------------------

                          Comment


                          • #14
                            I'd side with Maarten on this. If you're trying factor analysis, then there won't be any need to standardize your ordered-categorical indicator variables. Stata's factor analysis command gsem handles indicator variables with different numbers of categories without any need to standardize (see do-file and associated log file and graph below). It seems to me that attempting to standardize beforehand will unnecessarily complicate interpretation of the factor loadings.

                            Click image for larger version

Name:	Ujjwal.png
Views:	1
Size:	39.9 KB
ID:	100804
                            Attached Files

                            Comment


                            • #15
                              Thanks for all the responses. Let me elaborate what exactly I am trying to do. From a large panel survey, I have picked a number of questions' responses that express peoples' state of financial distress. The responses are on different scales for example,

                              1. How do you think you manage finances now a day - a) b) c) d) d) e)
                              2. Have your situation changed since last year this time? - a) b) c)
                              3. Do you save from your current income? - a) b)
                              4. Having problem with housing payment? - a) b) c)
                              There are seven questions. I am trying to build an index of 'financial distress'. This index will be my main independent variable. My dependent variable will be Subjective Well being which is also measured on a seven point response, e.g., a) Not happy at all..........b) c) d) e) f) ......g) completely happy
                              I would like to standardize all these eight variables. Then build an index with seven variables that express state of financial distress. And then use panel data models to regress on Subjective well being.

                              I have 12 year unbalanced panel with more than 114,000 observations (over all these years), more than 8000 (varying) persons interviewed each year.
                              @Joseph, Thanks for your valuable help. I am using Stata 12 where gsem command would not probably work. How can I use Stata 12 for this?

                              Comment

                              Working...
                              X