Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New usort command - byable sorting with [if] [in] and locale support

    Thanks to Kit Baum, SSC now offers an alternative command to sort and gsort.

    TITLE
    'USORT': module to perform locale-based ascending and descending sort that
    supports conditional statements, observation ranges, and user-defined
    handling of substrings and missing values

    DESCRIPTION/AUTHOR(S)

    This program is a byable sort command, which allows a) custom
    first and last substrings, including system (.) and all remaining
    missing values, b) gsort-like syntax for the ascending and
    descending order, as well as c) conditional [if] and range [in]
    sorting. The program is built around the Stata sort command and
    adds the data-sorted flag (sorted by) to the dataset if all rows
    are selected and applies Mata _collate() otherwise. Sorting large
    datasets might be taxing on machine memory or disk space.

    KW: sort
    KW: conditional
    KW: descending order

    EXAMPLES
    Code:
        Setup:
            . sysuse auto
    
        Sort observations in ascending order by price:
            . usort price
    
        Sort observations in ascending order by make in Czech, grouped by foreign, with VW models placed at
        the top:
            . bysort foreign: usort make, first(VW, pos) loc(cs_CS)
    
        Sort observations in descending order by mpg and price:
            . usort -mpg -price
    
        Sort observations in descending order by price for domestic cars only:
            . usort -mpg -price if ! foreign

  • #2
    If you find a bug in usort, please post it here

    Comment


    • #3
      Two major bugs were fixed in version 1.1.1: there was a problem with natural sorting, and by/bysort produced a descending order. Furthermore, usort now supports wildcards. You can type
      Code:
      . usort -a*
      which will sort all variables beginning with "a" in descending order.

      Comment


      • #4
        Hello Ilya Bolotov. Version 1.1.1 does not seem to be available yet. I have tried updating two different ways, and still have version 1.1.0. See below.

        Code:
        . which usort
        c:\ado\plus\u\usort.ado
        *! version 1.1.0  07oct2024  I I Bolotov
        
        . adoupdate usort, update
        note: ado update updates community-contributed files; type update to check for updates to official Stata.
        
        Checking status of specified packages:
        
          [284] usort at http://fmwww.bc.edu/repec/bocode/u:
                installed package is up to date
        
        (no packages require updating)
        
        . which usort
        c:\ado\plus\u\usort.ado
        *! version 1.1.0  07oct2024  I I Bolotov
        
        . ssc install usort, replace
        checking usort consistency and verifying not already installed...
        all files already exist and are up to date.
        
        . which usort
        c:\ado\plus\u\usort.ado
        *! version 1.1.0  07oct2024  I I Bolotov
        Cheers,
        Bruce
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          Originally posted by Ilya Bolotov View Post
          If you find a bug in usort, please post it here
          Perhaps not a bug, but usort does not seem to recognise Nordic characters, e.g. “ö”, “æ” or “å”.
          Building on the help-file:
          Code:
          sysuse auto
          rename price æprice
          usort æprice
          gives:
          invalid syntax
          r(100);

          Comment


          • #6
            Originally posted by Bruce Weaver View Post
            Hello Ilya Bolotov. Version 1.1.1 does not seem to be available yet. I have tried updating two different ways, and still have version 1.1.0. See below.

            Code:
            . which usort
            c:\ado\plus\u\usort.ado
            *! version 1.1.0 07oct2024 I I Bolotov
            
            . adoupdate usort, update
            note: ado update updates community-contributed files; type update to check for updates to official Stata.
            
            Checking status of specified packages:
            
            [284] usort at http://fmwww.bc.edu/repec/bocode/u:
            installed package is up to date
            
            (no packages require updating)
            
            . which usort
            c:\ado\plus\u\usort.ado
            *! version 1.1.0 07oct2024 I I Bolotov
            
            . ssc install usort, replace
            checking usort consistency and verifying not already installed...
            all files already exist and are up to date.
            
            . which usort
            c:\ado\plus\u\usort.ado
            *! version 1.1.0 07oct2024 I I Bolotov
            Cheers,
            Bruce
            It is already available, but it contains a problem with sorting because of the permutation vector not working properly outside Mata.
            I managed to fix this problem in 1.1.2.

            Comment


            • #7
              Originally posted by Frode Andre View Post

              Perhaps not a bug, but usort does not seem to recognise Nordic characters, e.g. “ö”, “æ” or “å”.
              Building on the help-file:
              Code:
              sysuse auto
              rename price æprice
              usort æprice
              gives:
              Thanks. I've never used variables with non-English names, I'll look into it in future releases.

              Comment


              • #8
                Originally posted by Frode Andre View Post

                Perhaps not a bug, but usort does not seem to recognise Nordic characters, e.g. “ö”, “æ” or “å”.
                Building on the help-file:
                Code:
                sysuse auto
                rename price æprice
                usort æprice
                gives:
                The version 1.1.2 does not seem to produce any syntax errors under Stata 18.
                I presume the problem has been solved.

                Comment


                • #9
                  Originally posted by Ilya Bolotov View Post

                  The version 1.1.2 does not seem to produce any syntax errors under Stata 18.
                  I presume the problem has been solved.
                  Yes, I can confirm that it works now!

                  Comment


                  • #10
                    This command seems highly dangerous.

                    Watch:

                    Code:
                    . which usort
                    c:\ado\plus\u\usort.ado
                    *! version 1.1.2  07oct2024  I I Bolotov
                    
                    . sysuse auto , clear
                    (1978 automobile data)
                    
                    . sort mpg
                    
                    . describe , short
                    
                    Contains data from C:\Program Files\Stata18\ado\base/a/auto.dta
                     Observations:            74                  1978 automobile data
                        Variables:            12                  13 Apr 2022 17:45
                    Sorted by: mpg

                    Notice that the data is sorted on mpg.

                    Code:
                    . usort -weight
                    
                    . describe , short
                    
                    Contains data from C:\Program Files\Stata18\ado\base/a/auto.dta
                     Observations:            74                  1978 automobile data
                        Variables:            12                  13 Apr 2022 17:45
                    Sorted by: mpg
                         Note: Dataset has changed since last saved.
                    Notice that the data is still reported to be sorted on mpg; it isn't! The note at the end hints at the problem. Why is this a problem? Because

                    Code:
                    . by mpg : summarize price
                    
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    -> mpg = 12
                    
                        Variable |        Obs        Mean    Std. dev.       Min        Max
                    -------------+---------------------------------------------------------
                           price |          2     12545.5    1482.803      11497      13594
                    
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    -> mpg = 14
                    
                        Variable |        Obs        Mean    Std. dev.       Min        Max
                    -------------+---------------------------------------------------------
                           price |          1       11385           .      11385      11385
                    
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    -> mpg = 21
                    
                        Variable |        Obs        Mean    Std. dev.       Min        Max
                    -------------+---------------------------------------------------------
                           price |          1       15906           .      15906      15906
                    
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    -> mpg = 14
                    
                        Variable |        Obs        Mean    Std. dev.       Min        Max
                    -------------+---------------------------------------------------------
                           price |          1        6303           .       6303       6303
                    instead of throwing an error, the by prefix now malfunctions. Is this intended behavior?
                    Last edited by daniel klein; 11 Nov 2024, 16:37. Reason: highlight problematic behavior in red

                    Comment


                    • #11
                      Originally posted by daniel klein View Post
                      This command seems highly dangerous.

                      Watch:

                      Code:
                      . which usort
                      c:\ado\plus\u\usort.ado
                      *! version 1.1.2 07oct2024 I I Bolotov
                      
                      . sysuse auto , clear
                      (1978 automobile data)
                      
                      . sort mpg
                      
                      . describe , short
                      
                      Contains data from C:\Program Files\Stata18\ado\base/a/auto.dta
                      Observations: 74 1978 automobile data
                      Variables: 12 13 Apr 2022 17:45
                      Sorted by: mpg

                      Notice that the data is sorted on mpg.

                      Code:
                      . usort -weight
                      
                      . describe , short
                      
                      Contains data from C:\Program Files\Stata18\ado\base/a/auto.dta
                      Observations: 74 1978 automobile data
                      Variables: 12 13 Apr 2022 17:45
                      Sorted by: mpg
                      Note: Dataset has changed since last saved.
                      Notice that the data is still reported to be sorted on mpg; it isn't! The note at the end hints at the problem. Why is this a problem? Because

                      Code:
                      . by mpg : summarize price
                      
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      -> mpg = 12
                      
                      Variable | Obs Mean Std. dev. Min Max
                      -------------+---------------------------------------------------------
                      price | 2 12545.5 1482.803 11497 13594
                      
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      -> mpg = 14
                      
                      Variable | Obs Mean Std. dev. Min Max
                      -------------+---------------------------------------------------------
                      price | 1 11385 . 11385 11385
                      
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      -> mpg = 21
                      
                      Variable | Obs Mean Std. dev. Min Max
                      -------------+---------------------------------------------------------
                      price | 1 15906 . 15906 15906
                      
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      -> mpg = 14
                      
                      Variable | Obs Mean Std. dev. Min Max
                      -------------+---------------------------------------------------------
                      price | 1 6303 . 6303 6303
                      instead of throwing an error, the by prefix now malfunctions. Is this intended behavior?
                      The command adds a sort flag for the sort order it produces, which is demonstrated by the by mpg : summarize price.
                      It circumvents Stata's standard sorting order (consult the help file where it is explicitly stated). It is indeed the intended behavior.
                      Please replace by with bysort to obtain standard results.

                      usort is a niche command to produce non-standard full or partial sorting (it takes both [if] and [in]).

                      PS There was a bug in 1.1.2 with the sort order producing an incorrect flag, which was fixed in 1.1.3
                      Last edited by Ilya Bolotov; 11 Nov 2024, 17:21.

                      Comment


                      • #12
                        Originally posted by Ilya Bolotov View Post
                        PS There was a bug in 1.1.2 with the sort order producing an incorrect flag, which was fixed in 1.1.3
                        Yes, that's what my example shows. You always talk in past tense. I downladed the command from SSC just now and it says

                        Code:
                        . which usort
                        c:\ado\plus\u\usort.ado
                        *! version 1.1.2  07oct2024  I I Bolotov
                        That's fine; I guess the update will be available in a couple of days. Unitl then, users should be careful.

                        Comment

                        Working...
                        X