Counting Repeated Observations for Each ID

Jenna Kerry

Join Date: Jan 2023
Posts: 44

Counting Repeated Observations for Each ID

12 Oct 2024, 14:47

Dear All,

I have a panel dataset for individual persons that looks like the following:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(ID Year Income)
1 2001 10000
1 2003 20000
1 2005 15000
2 2001  1500
2 2003   100
2 2005   200
2 2007   300
2 2009   400
3 2001  5000
3 2003   600
4 2001   700
4 2003   120
4 2005  1500
4 2007   100
4 2009   200
4 2011   300
5 2001   400
5 2003 20000
5 2005 15000
end

I would like to determine how many times each unique ID is repeated (i.e., how many years of data I have for each ID). The output I am looking for should include an additional column that shows the count of years for each ID, like so:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(ID Year Income Repeat)
1 2001 10000 3
1 2003 20000 3
1 2005 15000 3
2 2001  1500 5
2 2003   100 5
2 2005   200 5
2 2007   300 5
2 2009   400 5
3 2001  5000 2
3 2003   600 2
4 2001   700 6
4 2003   120 6
4 2005  1500 6
4 2007   100 6
4 2009   200 6
4 2011   300 6
5 2001   400 3
5 2003 20000 3
5 2005 15000 3
end

The example above shows one way I could calculate how many years of consistent data I have for each unique ID. However, if you have any other method or more efficient approach to achieve this, please let me know.

Thanks in advance for your help!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

12 Oct 2024, 14:50

Code:

by ID (Year Income), sort: gen repeat = _N
Comment
Jenna Kerry

Join Date: Jan 2023

Posts: 44
#3

12 Oct 2024, 14:54

Thanks a lot! It worked!
However, what if I have many other variables besides income for each observation, will that change the code?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#4

12 Oct 2024, 15:46

Well, it depends. You could do this instead:

Code:

sort ID, stable by ID: gen repeat = _N

The variables other than ID don't really matter in terms of getting the result. The reason I listed them in the code in #2 was to leave the data sorted in a reasonable way. The code here will leave the data sorted by ID and exactly as they already are within IDs, which is also reasonable. You could do just plain old -bysort ID: gen repeat = _N-, but this could leave your data sorted in some bizarre, and irreproducible, order within IDs. Since this looks like panel data, the sort order might ultimately matter, so I'm trying to keep things in order.
1 like
Comment
Jenna Kerry

Join Date: Jan 2023

Posts: 44
#5

14 Oct 2024, 11:10

Thank you very much!
Comment

Announcement

Counting Repeated Observations for Each ID

Comment

Comment

Comment

Comment