Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sort command does not work perfectly

    I have a column in my dataset containing age groups such as 50-51, 52-53, ..., 100-101. When I sort this column, the 100-101 value shows up at the first line followed by 50-51, 52-53, etc. Obviously, it should show up after 98-99. I appreciate your thoughts.
    Best,
    Nader

  • #2
    Your column is a string variable. You'll need to encode it to get the categories (or prefix the strings under 100 with a zero) to sort as you want. At the command line in Stata, type
    Code:
    help encode
    and
    Code:
    help label
    for more information about how to create the set of value labels and encode the string variable using the set.

    Comment


    • #3
      In Stata, what you see is not necessarily what you get (nor what you have). Items like 50-51 are not numbers. Although your brain perceives them as such. To Stata "50-51" is a string. The variable that contains these is a string variable. So when you sort it, it sorts in dictionary order, just as all strings do. In dictionary order 100-101 comes before 50-51.

      If you want the data to sort more appropriately for the meaning of these values, you need to create a value-labeled numeric variable where the actual values of the variable are numbers like 1, 2, 3,... but these numbers are associated with labels in the appropriate order, "50-51", "52-53",...with "100-101" far down the list. The -encode- command is often a convenient way to create such variables. Do read -help encode-. But -encode- will, by default, put things in alphabetical order, so in order to get this to work, you actually need to first write (or have Stata build) a label where 50-51 is associated with 1, and so on. You can then specify that label as an option to -encode-.

      Added: crossed with #2 which says the same things more succinctly.

      Comment

      Working...
      X