Conditional Merging Datasets

carl_pch

Join Date: Dec 2014

Posts: 2
#1

Conditional Merging Datasets

03 Dec 2014, 21:31

Dear all,

I am trying to merge two dataset based on a condition (described below) and was wondering whether this would be possible.

First, my dataset "year-state" has two variables, [YEAR] and [STATE]. Therefore a sample of the data might look like:

YEAR STATE (in numeric code)
1943 01
1950 01
1975 01
1944 09
1950 09
1952 22

Second, my dataset "stepdown" has two variables, [STATE], [STEPDOWNYEAR], which includes every instance of leadership change of a country. A sample of data might look like:

STATE STEPDOWNYEAR
01 1942
01 1950
01 1979
01 1983
01 1987
09 1942
09 1957

(so a country can have several instances of leadership change)

For every YEAR-STATE combination in my first dataset, I would like to add a closest STEPDOWNYEAR that happens after the YEAR variable. So, For [STATE 01 in 1943], although possible STEPDOWNYEAR are 1942, 1950, 1979, 1983, 1987, I would only like to keep "1950," which satisfies the requirement.

Is there any suggestions on how to do it?
Tags: None

Svend Juul

Join Date: Apr 2014
Posts: 515

04 Dec 2014, 03:30

This is a job for joinby. You might be tempted to use merge m:m, but NEVER do that.

Code:

. clear
. input  state stepdownyear

         state  stepdow~r
  1.  01 1942
  2.  01 1950
  3.  01 1979
  4.  01 1983
  5.  01 1987
  6.  09 1942
  7.  09 1957
  8. end

. save stepdown.dta , replace
. clear
. input year state

          year      state
  1.  1943 01
  2.  1950 01
  3.  1975 01
  4.  1944 09
  5.  1950 09
  6.  1952 22
  7. end

. joinby state using stepdown.dta
. list, clean
       year   state   stepdo~r 
  1.   1943       1       1987 
  2.   1943       1       1979 
  3.   1943       1       1942 
  4.   1943       1       1983 
  5.   1943       1       1950 
  6.   1950       1       1979 
  7.   1950       1       1987 
  8.   1950       1       1950 
  9.   1950       1       1942 
 10.   1950       1       1983 
 11.   1975       1       1950 
 12.   1975       1       1979 
 13.   1975       1       1942 
 14.   1975       1       1983 
 15.   1975       1       1987 
 16.   1944       9       1942 
 17.   1944       9       1957 
 18.   1950       9       1957 
 19.   1950       9       1942 

. drop if stepdown<year
(6 observations deleted)

. bysort state year (stepdown): keep if _n==1
(8 observations deleted)

. list, clean
       year   state   stepdo~r 
  1.   1943       1       1950 
  2.   1950       1       1950 
  3.   1975       1       1979 
  4.   1944       9       1957 
  5.   1950       9       1957

Comment

carl_pch

Join Date: Dec 2014

Posts: 2
#3

04 Dec 2014, 05:30

Thanks a lot! I was indeed trying m:m before posting it here and never succeeded
This is also a clever use of _n. I was thinking about using if condition but _n is simpler and more intuitive.

Last edited by carl_pch; 04 Dec 2014, 05:32.
Comment

Announcement

Conditional Merging Datasets

Comment

Comment