Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible bug in -egen- (resetting number of obs)?

    Philippe Van Kerm and I have noticed an issue with -egen- resetting the number of observations after a specific type of error (not providing an argument). Any ideas about what is happening? Is this is a bug? Windows/MP18.5 (updated to 2024-12-18). Examples below (with some counter-examples as part of detective work)

    Code:
    . cscript
    -------------------------------------------------------------------------BEGIN
    
    . set obs 1000
    Number of observations (_N) was 0, now 1,000.
    
    . di _N   // 1000 obs
    1000
    
    . * specific mistake with egen (non-existent egen function - same as misspelt; and no argument
    . cap noisily egen dd = something_that_leads_to_egen_error()
    unknown egen function something_that_leads_to_egen_error()
    
    . di _N    // observations are now down to zero!
    0
    
    .
    . cscript
    -------------------------------------------------------------------------BEGIN
    
    . set obs 1000
    Number of observations (_N) was 0, now 1,000.
    
    . di _N   // 1000 obs
    1000
    
    . cap noisily egen dd = mean() // no argument
    invalid syntax
    
    . di _N     // observations are now down to zero!
    0
    
    .
    . cscript
    -------------------------------------------------------------------------BEGIN
    
    . set obs 1000
    Number of observations (_N) was 0, now 1,000.
    
    . di _N   // 1000 obs
    1000
    
    . ge x = _N
    
    . su x
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
               x |      1,000        1000           0       1000       1000
    
    . cap noisily egen dd = maen(x) // misspelt egen function but with an argument
    unknown egen function maen()
    
    . di _N     // observations are NOT now down to zero!
    1000
    
    .
    . cscript
    -------------------------------------------------------------------------BEGIN
    
    . set obs 1000
    Number of observations (_N) was 0, now 1,000.
    
    . di _N   // 1000 obs
    1000
    
    . cap noisily egen dd = maen() // misspelt egen function & no argument)
    unknown egen function maen()
    
    . di _N     // observations are now down to zero!
    0

  • #2
    I can reproduce the above behavior on my setup. Playing around with this a bit, I find that this only seems to happen when, despite the number of observations having been set to 1000, there is no actual data in active memory. This may explain why the example with -mean()- misspelled does not change _N: there is a variable in memory.

    Contrast the above with:
    Code:
    . cscript
    -------------------------------------------------------------------------BEGIN
    
    .
    . set obs 1000
    Number of observations (_N) was 0, now 1,000.
    
    . gen x = 1
    
    . display _N
    1000
    
    .
    . cap noisily egen dd = something_that_leads_to_egen_error()
    unknown egen function something_that_leads_to_egen_error()
    
    . display _N
    1000
    That said, what is shown in #1 is clearly not the expected or docoumented behavior of -egen- and represents a bug.


    Comment


    • #3
      The problem is with the sortpreserve option that egen uses. I've seen this before but have worked around the issue instead of reporting it. Here is a minimal example
      Code:
      . program mysortpreserve , sortpreserve
        1.     version 18 // <- could be anything here; just need a non-empty program
        2. end
      
      . 
      . clear
      
      . set obs 1000
      Number of observations (_N) was 0, now 1,000.
      
      . display c(N)
      1000
      
      . mysortpreserve
      
      . display c(N)
      0
      
      . 
      end of do-file

      Comment


      • #4
        daniel klein Intriguing! Thanks for the diagnosis.
        Clyde Schechter exactly so -- thanks. (In one of my experiments I had opened the auto dataset and then ran things along the same lines, and got what you did. Sorry, I forgot to report that.)

        Comment


        • #5
          Actually, the "problem" is even deeper than sortpreserve. Stata drops all observations when the last variable is dropped from the dataset. That is, I suppose, what happens with sortpreserve when the temporary variable that holds the sort order is dropped and there is no other variable left in memory.

          Watch:
          Code:
          . clear
          
          . set obs 1000
          Number of observations (_N) was 0, now 1,000.
          
          . generate byte foo = 42
          
          . generate byte bar = 73
          
          . drop foo
          
          . display c(N)
          1000
          
          . drop bar
          
          . display c(N)
          0
          
          .
          end of do-file
          This behavior is kind of documented in drop when discussing the effects of drop _all. I agree that the behavior is unexpected here but I'm not sure it's a bug.

          Comment

          Working...
          X