Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rangejoin reversed?

    Hi all,

    Working with two different datasets, and wondering if it is possible to perform rangejoin when the date range is in the using_dataset? Minimal example as follows,

    Code:
    ssc install dataex
    
    clear
    input byte id int(StartYear EndYear EventType) 
    1 1969    1971    1  
    1 1973    1980    1
    2 1950    1970    1
    3 1955    1970    1
    4 1982    1992    1
    end
    save "events.dta", replace
    
    clear
    input byte id int Year 
    1 1970  
    1 1971  
    1 1972
    1 1973 
    3 1970  
    3 1971
    end
    
    ssc install rangejoin
    ssc install rangestat
    rangejoin Year StartYear EndYear using "events.dta", by(id)
    This unfortunately results in the error: was expecting a numeric variable, a number, or a system missing value for the interval low: StartYear.

    Desired output would actually be as follows, in which EventType is a dummy variable indicating whether an event occurred within a given year:

    Code:
         
    
    . list, sepby(id)
    
         +----------------------+
         | id   Year  EventType |
         |----------------------|
      1. |  1   1970          1 |
      2. |  1   1971          1 |
      3. |  1   1972          . |
      4. |  1   1973          1 |
         |----------------------|
      5. |  3   1970          1 |
      6. |  3   1971          . |
         +----------------------+
    I also experimented with joinby but encountered additional errors. Any thoughts or guidance would be greatly appreciated!

  • #2
    Code:
    clear
    input byte id int(StartYear EndYear EventType)
    1 1969    1971    1 
    1 1973    1980    1
    2 1950    1970    1
    3 1955    1970    1
    4 1982    1992    1
    end
    tempfile events
    save `events', replace
    
    clear
    input byte id int Year
    1 1970 
    1 1971 
    1 1972
    1 1973
    3 1970 
    3 1971
    end
    
    
    joinby id using `events', unmatched(master)
    by id Year, sort: egen wanted = max(inrange(Year, StartYear, EndYear) & EventType == 1)
    by id Year, sort: keep if _n == 1
    drop StartYear EndYear EventType _merge
    
    list, sepby(id)
    does what you ask. It creates a 1/0 variable rather than a 1/. variable--but that will work better for almost anything you want to do with this in Stata anyway. 1/. variables are a recipe for trouble in Stata.

    Comment


    • #3
      Hi Clyde,

      As always, thank you for your assistance!

      I can now also see why my attempts at using joinby were unsuccessful -- I had made a mistake and used the following,

      Code:
      gen wanted = inrange(Year, StartYear, EndYear)
      Although admittedly confused on the intuition of how the
      Code:
      max(inrange(Year, StartYear, EndYear) & EventType == 1)
      works in practice.

      Nevertheless, thank you!

      Comment


      • #4
        Although admittedly confused on the intuition of how the
        Code:
        max(inrange(Year, StartYear, EndYear) & EventType == 1)
        works in practice.


        So, the -joinby- command pairs up each observation in the id-Year data set with every observation in the events data set that matches on the ID. In general, there will be several matches for each of the original ID-Year observations. Some of those observations will have Year in the range between StartYear and EndYear, and some will not. Also, there may be some of the ID-Year observations from the original data that don't have any match in the Event date. Finally, even when there is a match in the Event data, I assume that some of them have an EventType other than 1.

        The expression -inrange(Year, StartYear, EndYear) & EventType == 1- is a logical expression whose meaning is self-explanatory. The key to understanding the code is that in Stata (and many other programming languages) logical expressions have a numerical value. In Stata these expressions are numerically evaluated as 1 if true and 0 if false. Now, if we look at all of the Event data observations that matched with a given observation from the original ID-Year data set, we want to know if any of them have Year falling in the StartYear-EndYear range and also have EventType == 1. Well, for any such observation the logical expression evaluates to 1, and for all the others it evaluates to 0. If we now look at the largest such value, it will be 1 if there is any matched observation that satisfies our target condition. If there is no such observation it will be 0.

        Comment


        • #5
          Hi Clyde,

          As always, thank for your time! Very helpful!

          Comment

          Working...
          X