Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does bysort recognize when a dataset has been previously (or is currently) sorted or not?

    How does bysort actually work? How does it know that a data is *already* sorted by a particular variable or not? I searched for the source code and couldn't find it. Even set trace on and running a command with bysort doesn't give much that's meaningful in terms of this specific question. I believe that this strategy being used in bysort could be particularly useful for us as programmers... perhaps there's a macro being stored somewhere that identifies the level of sort?

  • #2
    Well, I don't know any more than you do about the inner workings of -bysort-. But, whether this is what -bysort- does or not, you can always determine whether the data is sorted, and, if so, how, by looking at `:sortedby'.

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    .
    . display `"`:sortedby'"'
    foreign
    
    .
    . sort mpg
    
    .
    . display `"`:sortedby'"'
    mpg
    
    . gen double shuffle = runiform()
    
    . sort shuffle
    
    . drop shuffle
    
    . display `"`:sortedby'"'
    
    
    .

    Comment


    • #3
      In addition to Clyde’s excellent answer, bysort is nothing more (conceptually) than a shorthand for sort, followed by a by statement. Internally, the details are hidden away as are those of sort, likely because it’s proprietary and faster as compiled code rather than Stata code. We can infer from changelogs that bysort and sort-then-by were not precisely the same because I recall Stata making a change to the sort algorithm of bysort a major version or two ago. Perhaps they are the same now, I can’t say for sure. Either way, the conceptual equality is valid as a way to think about it and even functionally they are equivalent.

      Comment

      Working...
      X