Keep observations for consecutive years

Nicolo Serpella

Join Date: Oct 2017

Posts: 20
#1

Keep observations for consecutive years

14 Oct 2017, 03:21

Dear Statalist,

I'm new here.
I have a very large panel. The main problem is that I have not a specific start and end date (for example I can have an observation in the dataset from 1978 to 2000 and another from 1985 to 2010.) My purpose is to keep only those observations having all years reported between the start and the end date, whatever these two dates could be. I'm able to compute this syntax if the start date and end dates are the same, but in this case I cannot be able to figure the solution out.

Any thoughts?

Thank you!
Nicolò
Tags: None
Tim Umbach

Join Date: Jun 2017

Posts: 47
#2

14 Oct 2017, 04:40

Hi Nicolo,

if the rest of your data set is complete (i.e. no missing data between start and end date for each var.), the solution is quite simple:

Code:

drop if missing(var1)| missing(var2)| missing(var3)

This drops an observation unless all three vars have non-missing values. If you want too keep some observation with missing values, however, it becomes a bit more complicated, and it is likely easier done by hand. In your example, it would look sth. like this:

Code:

drop if year<1985 | year>2000

.

I hope I could help,

Tim

Last edited by Tim Umbach; 14 Oct 2017, 04:43.
Comment
Nicolo Serpella

Join Date: Oct 2017

Posts: 20
#3

14 Oct 2017, 05:45

Hi tim,

thank you very much for the advise.
However what I mean is different; I did not express myself that clear, sorry.
The idea is that the dataset is composed of workers, their dates of entering the job market (the first date reported) and then info about wages until 2012 . But obviously they enter in different moment during life and working life, and maybe they retired. Therefore I have for every worker different date of entry and different date of "exit" the job market.
But, sometimes the date are not rightly reported; the problem is that I don't have missing value, but simply a jump in the year. Something like this, for example:

Year Worker_ID

1990 1

1991 1

1993 1

1995 1

1996 1

I don't have missing, but the year is not reported. Obviously dates are different, workers by workers. Then I need to keep only the workers having all the years reported, let's suppose, from 1990 to 1996, taking into account that the first and the last years are different for almost every workers.
Maybe is exactly equal to what you said, I'm a new Stata user!

Thank you very much!
Nicolò
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

14 Oct 2017, 11:07

Welcome to Statalist, Nicolò.

This turns out to be easy if you see the shortcut - if a worker has no missing years between the earliest and the last, the worker will have as many observations as the difference between the two years, plus 1. (This does assume that no worker has two or more observations for the same year.) Then you need to know that the automatic variable _N is the number of observations in the dataset, or in the by group when using the by prefix. Below I show code on your example, and a second example where there is no missing year.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte id int year 1 1990 1 1991 1 1993 1 1995 1 1996 2 1991 2 1992 2 1993 2 1994 2 1995 2 1996 2 1997 end by id (year), sort: drop if year[_N]-year[1]+1 != _N list, clean

Code:

. list, clean id year 1. 2 1991 2. 2 1992 3. 2 1993 4. 2 1994 5. 2 1995 6. 2 1996 7. 2 1997

Last edited by William Lisowski; 14 Oct 2017, 11:10.
1 like
Comment
Nicolo Serpella

Join Date: Oct 2017

Posts: 20
#5

17 Oct 2017, 15:45

Hi William,

thank you very much, this is clear and it worked perfectly.

Regards,
Nicolò
Comment

Year	Worker_ID
1990	1
1991	1
1993	1
1995	1
1996	1

Announcement

Keep observations for consecutive years

Comment

Comment

Comment

Comment