Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional Merging Datasets

    Dear all,

    I am trying to merge two dataset based on a condition (described below) and was wondering whether this would be possible.

    First, my dataset "year-state" has two variables, [YEAR] and [STATE]. Therefore a sample of the data might look like:

    YEAR STATE (in numeric code)
    1943 01
    1950 01
    1975 01
    1944 09
    1950 09
    1952 22

    Second, my dataset "stepdown" has two variables, [STATE], [STEPDOWNYEAR], which includes every instance of leadership change of a country. A sample of data might look like:

    STATE STEPDOWNYEAR
    01 1942
    01 1950
    01 1979
    01 1983
    01 1987
    09 1942
    09 1957

    (so a country can have several instances of leadership change)

    For every YEAR-STATE combination in my first dataset, I would like to add a closest STEPDOWNYEAR that happens after the YEAR variable. So, For [STATE 01 in 1943], although possible STEPDOWNYEAR are 1942, 1950, 1979, 1983, 1987, I would only like to keep "1950," which satisfies the requirement.

    Is there any suggestions on how to do it?

  • #2
    This is a job for joinby. You might be tempted to use merge m:m, but NEVER do that.

    Code:
    . clear
    . input  state stepdownyear
    
             state  stepdow~r
      1.  01 1942
      2.  01 1950
      3.  01 1979
      4.  01 1983
      5.  01 1987
      6.  09 1942
      7.  09 1957
      8. end
    
    . save stepdown.dta , replace
    . clear
    . input year state
    
              year      state
      1.  1943 01
      2.  1950 01
      3.  1975 01
      4.  1944 09
      5.  1950 09
      6.  1952 22
      7. end
    
    . joinby state using stepdown.dta
    . list, clean
           year   state   stepdo~r 
      1.   1943       1       1987 
      2.   1943       1       1979 
      3.   1943       1       1942 
      4.   1943       1       1983 
      5.   1943       1       1950 
      6.   1950       1       1979 
      7.   1950       1       1987 
      8.   1950       1       1950 
      9.   1950       1       1942 
     10.   1950       1       1983 
     11.   1975       1       1950 
     12.   1975       1       1979 
     13.   1975       1       1942 
     14.   1975       1       1983 
     15.   1975       1       1987 
     16.   1944       9       1942 
     17.   1944       9       1957 
     18.   1950       9       1957 
     19.   1950       9       1942 
    
    . drop if stepdown<year
    (6 observations deleted)
    
    . bysort state year (stepdown): keep if _n==1
    (8 observations deleted)
    
    . list, clean
           year   state   stepdo~r 
      1.   1943       1       1950 
      2.   1950       1       1950 
      3.   1975       1       1979 
      4.   1944       9       1957 
      5.   1950       9       1957

    Comment


    • #3
      Thanks a lot! I was indeed trying m:m before posting it here and never succeeded
      This is also a clever use of _n. I was thinking about using if condition but _n is simpler and more intuitive.
      Last edited by carl_pch; 04 Dec 2014, 05:32.

      Comment

      Working...
      X