This is more of a gripe than anything else, but I'm not sure why the default behaviour of the -by- prefix is to complain when the data is not sorted, and thus force you to use -bysort- (or pre-sort the data separately). In my experience, the user most of the time does not actually want their data sort order reshuffled, they just care to do some task for each group mentioned. Usually the user either (i) does not care for the data's sort order, in which case this behaviour does not hurt them, or (ii) actually does care for the existing sort order, and so is inconvenienced by this behaviour, and has to take steps to undo the resorting which had to be done for the -by- task. In other words, this behaviour probably makes almost no one better off, leaves some people indifferent, and hurts some people.
Contrast this with the behaviour of the -by- option to the -egen- command for instance, which does not complain about sort order. It might internally need to resort your data, but it does not complain if your data is not sorted in the order it needs, and using it leaves your data sort order unaffected. I find that much better.
Is there a strong internal programming reason why the -by- prefix works the way it does?
Contrast this with the behaviour of the -by- option to the -egen- command for instance, which does not complain about sort order. It might internally need to resort your data, but it does not complain if your data is not sorted in the order it needs, and using it leaves your data sort order unaffected. I find that much better.
Is there a strong internal programming reason why the -by- prefix works the way it does?
Comment