Identifying first observation in panel data

Abdan Syakura

Join Date: Nov 2018
Posts: 58

Identifying first observation in panel data

04 Aug 2020, 12:46

Hi All,

Could you please help how I can identify the first observation for a variable in a panel data? I use the following code:

Code:

by country (quarter), sort: gen byte first = TOT != .
by country (quarter), sort: gen byte firstobs = sum(first) == 1
list country quarter if firstobs==1

It somehow works, but for countries that have no observation at all, it will not be included in the list. How can I include these countries, probably with label N/A. Is there a better way to find the first observation, if I have several variables? Also, is there are a way to identify the last observation? Thank you.

Best,

Abdan

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long country float quarter str6 date str18 Country int CountryCode double TOT byte(first firstobs)
1 116 "1989Q1" "Albania" 914 . 0 0
1 117 "1989Q2" "Albania" 914 . 0 0
1 118 "1989Q3" "Albania" 914 . 0 0
1 119 "1989Q4" "Albania" 914 . 0 0
1 120 "1990Q1" "Albania" 914 . 0 0
end
format %tq quarter
label values country country
label def country 1 "Albania", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

04 Aug 2020, 13:03

but for countries that have no observation at all, it will not be included in the list. How can I include these countries

I think you're asking the impossible. If a country has no observations in your data set, how would Stata possibly know that it even exists. Stata has no built-in encyclopedia that would enable it to conjure up a comprehensive list of countries, nor animals, nor businesses, nor anything else. If it isn't in your data set, or created by your code, to Stata it doesn't exist at all.

The only thing I can suggest is to find a data set listing all the countries of the world in it (there are probably many such on the internet somewhere), and then -merge- that data set with your own data. That will create an observation for every country that didn't already have one. But be careful in doing this: it would be best to use a data set that uses the same country code scheme that yours uses and -merge- on the code. Merging on country names would be hazardous because, for example, "South Korea" would not match up with "S. Korea" or "Republic of Korea," and different data sets tend to be idiosyncratic in how the pick, and abbreviate, country names.
Comment
Abdan Syakura

Join Date: Nov 2018

Posts: 58
#3

04 Aug 2020, 13:51

Thank you Clyde. I meant, in my dataset, I already have all the countries name. However for some countries, their values are ".". By using my code earlier, they will not be included in the "list" command. I hope these countries will still be included in the "list" command, but with "N/A" assigned. Actually I can do it manually using vlookup in excel, so it is not very urgent. Thanks for your time.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#4

04 Aug 2020, 14:25

Code:

bys country (quarter): gen tag=_n==1 list if tag, sep(0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

04 Aug 2020, 14:26

Your example data contains only one country, and in that country TOT is always missing. I have expanded your example with made-up data for "Other Country" which does have some non-missing values of TOT.

Code:

isid CountryCode quarter, sort by CountryCode (quarter): egen first_qtr = min(cond(!missing(TOT)), quarter, .) gen wanted = 1 if first_qtr == quarter by CountryCode (quarter): replace wanted = 0 if missing(first_qtr) & _n == _N label define wanted 0 "N/A" 1 "1" label values wanted wanted list if !missing(wanted)

Unless you are just playing around for fun, you should not do any data analysis manually, nor should you do any in Excel. For serious work, work that you would want other people to find credible you need a complete audit trail of everything you have done. Everything you do needs to be documented in a do-file (or equivalent code from other programs). Neither manual data manipulation nor Excel provide that. Excel is adequate for sharing data sets with people who don't have access to a real statistics package, but it is inappropriate to use it for actual data analysis. As for manual data manipulation, it should only be used for exploration of data or for fun, never for real work.

Added: crossed with #4, where a solution is proposed that, I think, does not answer the question posed. The solution in #4 simply tags the first observation for every country--it does not single out the observation with the earliest non-missing value of TOT when such exists.

Last edited by Clyde Schechter; 04 Aug 2020, 14:28.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#6

05 Aug 2020, 10:53

Added: crossed with #4, where a solution is proposed that, I think, does not answer the question posed. The solution in #4 simply tags the first observation for every country--it does not single out the observation with the earliest non-missing value of TOT when such exists.

Clyde Schechter, correct, I missed the conditional requirement.
Comment

Announcement

Identifying first observation in panel data

Comment

Comment

Comment

Comment

Comment