Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Datasignature instability after collapse

    Any tips on how to handle a datasignature that is always changing after a fixed set of commands?

    For example, the following collapse would produce different signatures every time I run it:

    use file.dta, clear
    collapse (mean) var1, by(year)
    datasignature

    I'm having a hard time replicating the error--if I use the auto dataset, or a simulated dataset---no problem.

    If I use my same dataset with this issue, but drop if _n < 337, no problem.

    Drop if _n < 338---problem! There is nothing remarkable about the contents of observation 337 as far as I can see.

    I'm running Stata 17 on a Mac running Big Sur 11.6.2.
    Attached Files

  • #2
    I cannot explain what causes the issue but you can make it stable by sorting first.

    Code:
    use file.dta, clear
    sort year var1, stable
    collapse (mean) var1, by(year)
    datasignature
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Thank you! Oddly, sorting (even at the unit of observation) only works after including the stable option.

      Comment


      • #4
        As I do not risk my computer by downloading files from strangers, I do not know what is going on in your dataset. One possible reason that sorting stabilizes the result even though it looks like it should be sort independent is if var1 extends over a very wide range of values. I don't know how -collapse- goes about calculating the mean, but, at the end of the day, it must process the observations in some order. Computer floating-point addition is not a commutative operation. Depending on the order in which the numbers are added, rounding and truncation errors along the way can differ, and can accumulate to a large error. This is typically only a problem when N is large and when the range of var1 extends over many orders of magnitude. In that situation, the most precise results can be obtained by assuring that the smaller numbers are added together first, and the larger ones added in later. I believe this is what you will get with -sort var1- (though, as I said, I don't know -collapse-'s internals well enough to be sure of this.)

        Comment

        Working...
        X