In Stata15.1, some parts of egen apparently do not consider variable type, which can result in data loss and errors. For example:
We can see that the output variable "group" does not have the requisite levels because floats cannot accurately represent integers past 2^24. The source are lines 35-37 in _ggroup.ado, which basically do
When the user does not specify a type, it detauls to c(type) which in turn defaults to "float". However, the output of "group" will always be integers, hence the following should be more appropriate when the user does not pass a type:
This correctly upgrades the variable type to "int" or "long" and also saves memory if the sum does not overflow. Another example:
In this case, collapse correctly sums x but egen gives y as all missing.
Code:
clear set obs `=2^24 + 10' gen long x = _n egen group = group(x) format %21.0gc x group l in `=_N - 10' / `=_N'
Code:
bys x: gen `type' group = (_n == 1) replace group = sum(group)
Code:
bys x: gen byte group = (_n == 1) replace group = sum(group)
Code:
clear set obs 10 gen double x = 1e50 egen y = sum(x) l collapse (sum) y = x l
Comment