Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated time values in panel for cross-sectional index via egen group

    I have a dataset that includes information on consumer-, year-, and product-specific spending. Basically summing over different stores in which a particular consumer has purchased the same product in a given year, I collapse the data as follows:

    Code:
    collapse (sum) spending, by(consumer product year)
    I then define an index/label for each consumer-product combination which is supposed to serve as the cross-sectional index in a panel structure:

    Code:
    egen consumer_product = group(consumer product)
    xtset consumer_product year
    When I run this, I get the following error message: repeated time values within panel; r(451)

    I do not quite understand how that can possibly be given that I have collapsed the data beforehand. I interpret this error message to mean that for a given consumer-product combination there are multiple spending observations in a given year.

    Following my first collapse of the data I have roughly 270,000,000 observations. When I subsequently collapse the data via
    Code:
    collapse (sum) spending, by(consumer_product year)
    I end up with about 130,000,000 observations. But I don't see along which dimension I am summing here to end up with the lower number of observations.
    Last edited by Jon Beck; 20 Apr 2024, 18:09.

  • #2
    Following my first collapse of the data I have roughly 270,000,000 observations.
    This is the root of your problem. You have 270,000,000 distinct combinations of consumer, product, and year. That's a very large number, 9 digits long. You then -egen consumer_product = group(product year)- and you probably expected that each consumer#product combination would get a distinct id assigned. But the problem is that there are too many of them: the number of such combinations is so large that it cannot fit inside a float, which is the default storage type for -egen-. So some of the values get some low-order digits chopped off and lose distinctiveness. You need to use a different storage type for consumer_product, one that has enough bits to hold the large numbers that -egen, group()- will create here.

    Code:
    egen `c(obs_t)' consumer_product = group(consumer product)
    will do this for you. `c(obs_t)' is an internal Stata object that knows what storage type is needed for a data set of the current size.

    Comment


    • #3
      Thanks so much! Works like a charm.

      Comment

      Working...
      X