Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting a dataset by more than one variable?

    I'd like to understand how a dataset can be sorted by more than one variable ( gsort –v1 +v2...). Aren't the observations for all rows tied to each other? e.g. a row will always have the same combination of values regardless of its ranking. Sorting by >1 variable seems like controlling a multi-wheel key where each wheel can move independently, unlike the rows in a dataset. Thank you for your insight.

  • #2
    It's easier than you fear.

    Code:
    sort var1 var2
    means first sort by var1 and then (given that) sort by var2

    which means ... if there are ties on var1, then sort by var2 within each group of observations identical on var1.



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(var1 var2)
    3 2
    3 1
    2 2
    2 1
    1 2
    1 1
    end
    
    . sort var1 var2
    
    
    . list, sepby(var1)
    
         +-------------+
         | var1   var2 |
         |-------------|
      1. |    1      1 |
      2. |    1      2 |
         |-------------|
      3. |    2      1 |
      4. |    2      2 |
         |-------------|
      5. |    3      1 |
      6. |    3      2 |
         +-------------+
    
    . sort var2 var1
    
    . list, sepby(var2)
    
         +-------------+
         | var1   var2 |
         |-------------|
      1. |    1      1 |
      2. |    2      1 |
      3. |    3      1 |
         |-------------|
      4. |    1      2 |
      5. |    2      2 |
      6. |    3      2 |
         +-------------+
    WIth gsort a minus sign means: sort by that variable reversed in order (the variable being negated is good enough as a basis). More variables just mean the same recipe repeated. If there are ties on variables 1 to k, then use variable k + 1 to sort within those groups.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      It's easier than you fear.

      Code:
      sort var1 var2
      means first sort by var1 and then (given that) sort by var2

      which means ... if there are ties on var1, then sort by var2 within each group of observations identical on var1.



      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(var1 var2)
      3 2
      3 1
      2 2
      2 1
      1 2
      1 1
      end
      
      . sort var1 var2
      
      
      . list, sepby(var1)
      
      +-------------+
      | var1 var2 |
      |-------------|
      1. | 1 1 |
      2. | 1 2 |
      |-------------|
      3. | 2 1 |
      4. | 2 2 |
      |-------------|
      5. | 3 1 |
      6. | 3 2 |
      +-------------+
      
      . sort var2 var1
      
      . list, sepby(var2)
      
      +-------------+
      | var1 var2 |
      |-------------|
      1. | 1 1 |
      2. | 2 1 |
      3. | 3 1 |
      |-------------|
      4. | 1 2 |
      5. | 2 2 |
      6. | 3 2 |
      +-------------+
      WIth gsort a minus sign means: sort by that variable reversed in order (the variable being negated is good enough as a basis). More variables just mean the same recipe repeated. If there are ties on variables 1 to k, then use variable k + 1 to sort within those groups.
      Thank you so much. It finally makes sense now!

      Comment

      Working...
      X