Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalising a variable to be between 0 and 1

    Hi all,

    I have four variables which each have values which range between -2.5 and 2.5.

    I would like to alter the values of the variables so they are between 0 and 1.

    How do I go about doing that?!

  • #2
    If that is exactly correct, this is simple algebra,

    Code:
    replace x = (x + 2.5)/5
    generalised with a foreach loop.

    My instinct is always to leave original data exactly as they come and to create a new variable.

    Code:
    foreach v in frog toad newt dragon { 
          gen `v'2 = (`v' + 2.5)/5 
          label var `v'2 "`v' scaled to [0,1]" 
    }

    Comment


    • #3
      Or more generally,

      Code:
      foreach v of varlist ... {
          qui summ `v'
          gen `v'2 = (`v' - r(min)) / (r(max) - r(min))
      }
      Note that per Nick's comment, the above code assumes the desired scaling is based on the actual minimum and maximum values in the data -- i.e., that you want the data to be rescaled to [0,1] based on observed values, not exogenous definitions of the range. Nevertheless, I hope the more general formula & link are useful.

      See https://en.wikipedia.org/wiki/Feature_scaling
      Last edited by Brendan Cox; 27 Oct 2015, 08:06.

      Comment


      • #4
        Brendan's method (*) may not be quite the same. For example, the definition of varying from -2.5 to 2.5 may mean that in principle these are the limits. If the extremes in the data are not the same, then the scaling will be different.

        (*) We're not related. Or at least, I don't know that we are.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          If that is exactly correct, this is simple algebra,

          Code:
          replace x = (x + 2.5)/5
          generalised with a foreach loop.

          My instinct is always to leave original data exactly as they come and to create a new variable.

          Code:
          foreach v in frog toad newt dragon {
          gen `v'2 = (`v' + 2.5)/5
          label var `v'2 "`v' scaled to [0,1]"
          }

          Thanks. How does one then interpret such values then?

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Brendan's method (*) may not be quite the same. For example, the definition of varying from -2.5 to 2.5 may mean that in principle these are the limits. If the extremes in the data are not the same, then the scaling will be different.

            (*) We're not related. Or at least, I don't know that we are.
            Ah, yes, quite so. I will edit the post accordingly, but hopefully the more generalised formula & link prove useful references.

            * Yes, an unfortunate coincidence for you. For me, perhaps it lends an extra (and undeserved) 'je ne sais quoi' to my posts.

            Comment


            • #7
              Coxes everywhere (we're good at survival):

              We all have little stories. Long ago, I heard a lecture with an explanation of a statistical problem. (Yes, I could see that was a key problem.) And here is the standard device we use to solve it. (That's smart! I bet the other guys were kicking themselves when they saw that.) And this, of course, is the standard Cox model. (No, not me. But it still sounds great: the standard Cox model.)

              That is, naturally, Sir David Cox (1924- ), much and rightly honoured, and again we are not related.

              Comment


              • #8
                Chris:

                How does one then interpret such values then?
                Not sure what you want here. The variables are now scaled to [0, 1], which is what you asked for.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Chris:



                  Not sure what you want here. The variables are now scaled to [0, 1], which is what you asked for.

                  True, but is there any reference on how to interpret them when I use these variables in a regression, for example. What does an increase of 0.1 in a variable which is now scaled [0, 1] mean? Does it mean a 10% increase in [normalised variable] will result in a x% increase in the dependent variable (if the dependent variable is in percent form) ?

                  Comment


                  • #10
                    Forgetting about other predictors, and even the intercept, focus on y = ... + bx + ... where x varies between 0 and 1. The difference between the predicted value for x = 0 and that for x = 1 is precisely b. That's a change over the entire possible range. For any fraction of the range, multiply down.

                    That doesn't sound like your interpretation, although I don't understand it (what is x in your notation? the coefficient? the predictor?) .

                    Comment

                    Working...
                    X