Repeated time values in panel for cross-sectional index via egen group

Jon Beck

Join Date: Feb 2023

Posts: 8
#1

Repeated time values in panel for cross-sectional index via egen group

20 Apr 2024, 16:54

I have a dataset that includes information on consumer-, year-, and product-specific spending. Basically summing over different stores in which a particular consumer has purchased the same product in a given year, I collapse the data as follows:

Code:

collapse (sum) spending, by(consumer product year)

I then define an index/label for each consumer-product combination which is supposed to serve as the cross-sectional index in a panel structure:

Code:

egen consumer_product = group(consumer product) xtset consumer_product year

When I run this, I get the following error message: repeated time values within panel; r(451)

I do not quite understand how that can possibly be given that I have collapsed the data beforehand. I interpret this error message to mean that for a given consumer-product combination there are multiple spending observations in a given year.

Following my first collapse of the data I have roughly 270,000,000 observations. When I subsequently collapse the data via

Code:

collapse (sum) spending, by(consumer_product year)

I end up with about 130,000,000 observations. But I don't see along which dimension I am summing here to end up with the lower number of observations.

Last edited by Jon Beck; 20 Apr 2024, 17:09.
Tags: egen, group, panel, xtset
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

20 Apr 2024, 17:50

Following my first collapse of the data I have roughly 270,000,000 observations.

This is the root of your problem. You have 270,000,000 distinct combinations of consumer, product, and year. That's a very large number, 9 digits long. You then -egen consumer_product = group(product year)- and you probably expected that each consumer#product combination would get a distinct id assigned. But the problem is that there are too many of them: the number of such combinations is so large that it cannot fit inside a float, which is the default storage type for -egen-. So some of the values get some low-order digits chopped off and lose distinctiveness. You need to use a different storage type for consumer_product, one that has enough bits to hold the large numbers that -egen, group()- will create here.

Code:

egen `c(obs_t)' consumer_product = group(consumer product)

will do this for you. `c(obs_t)' is an internal Stata object that knows what storage type is needed for a data set of the current size.
Comment
Jon Beck

Join Date: Feb 2023

Posts: 8
#3

20 Apr 2024, 18:59

Thanks so much! Works like a charm.
Comment

Announcement

Repeated time values in panel for cross-sectional index via egen group

Comment

Comment