Summing over a unique identifier over a panel data of county and year

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

Summing over a unique identifier over a panel data of county and year

23 Sep 2023, 04:42

Hi,

I want to sum the unique identifier in my data (id) over the panel variables ( county and year ).

However, there are some repeated observations of my unique identifier ( id) which I have put it in bold form.

After getting rid of this repeated observation , I can go ahead with summing over the panel variable. I've shown my thought out steps of doing so.

Do you think this is the right approach to address the issue or my intended goal ?

My approach of doing is going to be ( I have not done it yet)

Code:

bysort county year id : gen wanted = _n == 1 
by county year : replace wanted = sum(wanted) 
by county year : replace  wanted = wanted[_N]

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double id float(county sd3_date year)
20010000001  6001 14943 2000
20010000001 17031 14943 2000
20010000003 46135 14945 2000
20010000004 48303 14943 2000
20010000005 34021 14945 2000
20010000005  6059 14945 2000
20010000005 34023 14945 2000
20010000005 34021 14945 2000
20010000005 34021 14945 2000
20010000009 55139 14943 2000
20010000012 41051 14943 2000
20010000018  6085 14945 2000
20010000018  6085 14945 2000
20010000019 42045 14943 2000
20010000019 34025 14943 2000
20010000020  6085 14943 2000
20010000020  6085 14943 2000
20010000020  6081 14943 2000
20010000020  6085 14943 2000
20010000020  6085 14943 2000
20010000020  6001 14943 2000
20010000025  6041 14945 2000
20010000025  6041 14945 2000
20010000025  6085 14945 2000
20010000025  6085 14945 2000
20010000025  6085 14945 2000
end
format %td sd3_date

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

23 Sep 2023, 09:38

I'm not sure I understand what you want to calculate. Here's what the code you show will do. It will calculate, for each country and year, the total number of distinct id's that occur in that country in that year. If that's what you're looking for, your approach is fine. Another, slightly shorter, way to do it is:
[/code]
by country year (id), sort: gen wanted = sum(id != id[_n-1])
by country year (id): replace wanted = wanted[_N]
[/code]

But if that isn't what you' want to do, then please post back with a clearer explanation, or, perhaps better, an example with hand-calculated results to show what is wanted.
2 likes
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#3

24 Sep 2023, 15:31

Originally posted by Clyde Schechter View Post

I'm not sure I understand what you want to calculate. Here's what the code you show will do. It will calculate, for each country and year, the total number of distinct id's that occur in that country in that year. If that's what you're looking for, your approach is fine. Another, slightly shorter, way to do it is:
[/code]
by country year (id), sort: gen wanted = sum(id != id[_n-1])
by country year (id): replace wanted = wanted[_N]
[/code]

But if that isn't what you' want to do, then please post back with a clearer explanation, or, perhaps better, an example with hand-calculated results to show what is wanted.

I forgot to thank you for giving me the perfect code to deal with the issue. I just became busy with dealing with my quirky dataset. Thanks Mr. Schechter for taking your time again to address the issue! Grateful for your time.
Comment

Announcement

Summing over a unique identifier over a panel data of county and year

Comment

Comment