Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen vs. gen: origin story

    How did Stata come to have two different commands for generating new variables -- gen and egen?
    The division between them isn't entirely logical, and the fact that there are two commands seems more like an accident of history than like something that was deliberately designed.

  • #2
    Note that unlike generate, egen is an ado, not a built-in command.
    Code:
    . which generate
    built-in command:  generate
    
    . which egen
    /Applications/Stata/ado/base/e/egen.ado
    *! version 3.4.1  05jun2013
    That hints at its inception as a community-contributed command.

    In the PDF documentation for egen, linked to from the top of the output of help egen, you can see the Acknowledgements section omitted from the help file.

    Acknowledgments

    The mtr() function of egen was written by Timothy J. Schmidt formerly with the Federal Reserve Bank of Kansas City.

    The cut() function was written by David Clayton (retired) of the Cambridge Institute for Medical Research and Michael Hills (retired) of the London School of Hygiene and Tropical Medicine.

    Many of the other egen functions were written by Nicholas J. Cox of the Department of Geography at Durham University, UK, and coeditor of the Stata Journal and author of Speaking Stata Graphics.

    Comment


    • #3
      In essence, all Stata commands that produce new variables are wrappers for generate. clonevar is another and many commands produce variables optionally. Mata is a different story.

      egen is an arbitrary ragbag of bits and pieces, or alternatively a Swiss Army knife that can be mighty useful. I've seen posts asking what is the R equivalent of egen and the answer is that there isn't one, and it would be absurd to create one in R. But people ask that, presumably, because they know Stata and find egen useful.

      Over 35 years and more Stata has developed its share of redundancy and untidiness -- consider basic tabulation commands, for another example -- but there you go. Abolishing commands, or even letting them go undocumented, is not to be undertaken lightly. I have seen suggestions that something like mean() should just be a regular function and that egen should fade away, but then the combination of egen and by: (or by()) retains appeal as well as utility.

      To add to the comments of William Lisowski on the history: egen was an official command before it received community contributions that were folded back into the official version. The peak of that was about 20 years ago, and StataCorp is not really extending egen any more. Nevertheless many, many extra egen functions are out there in the wild to be installed at will. It's notable that egen is the nearest users have, outside Mata, to write their own functions (rather than new commands) and that users who are programmers will find in existing function code ready-made templates for new egen functions. That is no longer fashionable, most positively because such user-programmers will now usually prefer to reach for Mata to add functions, which also applies many times over to StataCorp developers.
      Last edited by Nick Cox; 28 Feb 2020, 02:49.

      Comment


      • #4
        #3 is simpler than the full truth. For example, tabulate has a generate() option and it's C code (possibly Mata code), so I guess it produces new variables directly without using generate. And there may well be other commands similar in that respect.

        Comment

        Working...
        X