Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Define a dummy with overlapping periods

    Dear All, I have this dataset
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id start end)
    1101 2004 2009
    1101 2018 2023
    1102 2008 2013
    1102 2011 2016
    1102 2011 2016
    1102 2013 2018
    1102 2018 2023
    end
    For each `id', I'd like to define a dummy variable, say `d', which is 1 if the year is between (including) `start' and `end'. For instance, for id=1101 (no overlapping years), d=1 if the year is 2004,2005,...,2009, 2018,2019,...,2013. In contrast, for id=1102, there are some overlapping years. It can be seen that, d=1 for years 2008,2009,...,2023. Any suggestions. Thanks.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    River Huang, I guess you forgot to include a year variable in your example. You could create the full list of id-year combinations and merge back.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id start end) float year
    1101 2004 2009 2013
    1101 2018 2023 2019
    1102 2008 2013 2006
    1102 2011 2016 2015
    1102 2011 2016 2024
    1102 2013 2018 2013
    1102 2018 2023 2008
    end
    
    preserve
    gen duration= end- start+1
    expand duration, g(new)
    bys id end (new): replace start= start[_n-1]+1 if _n>1
    keep id start
    rename start year
    contract id year
    drop _freq
    tempfile years
    save `years'
    restore 
    merge m:1 id year using `years', keep(master match) 
    gen wanted=_merge==3
    drop _merge
    Res.:

    Code:
    . l, sep(10)
    
         +-------------------------------------+
         |   id   start    end   year   wanted |
         |-------------------------------------|
      1. | 1101    2004   2009   2013        0 |
      2. | 1101    2018   2023   2019        1 |
      3. | 1102    2008   2013   2006        0 |
      4. | 1102    2018   2023   2008        1 |
      5. | 1102    2013   2018   2013        1 |
      6. | 1102    2011   2016   2015        1 |
      7. | 1102    2011   2016   2024        0 |
         +-------------------------------------+

    Comment


    • #3
      Dear Andrew, Thanks. I guess I didn't make myself clear. Suppose a similar (simpler) dataset is
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(id start end)
      1101 2005 2007
      1101 2010 2013
      1102 2008 2012
      1102 2011 2014
      end
      What I wanted is something like
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int id float(year d)
      1101 2005 1
      1101 2006 1
      1101 2007 1
      1101 2008 0
      1101 2009 0
      1101 2010 1
      1101 2011 1
      1101 2012 1
      1101 2013 1
      1102 2008 1
      1102 2009 1
      1102 2010 1
      1102 2011 1
      1102 2012 1
      1102 2013 1
      1102 2014 1
      end
      Do you have any further suggestions? Thanks again.
      Last edited by River Huang; 20 Mar 2020, 19:06.
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        I find this solution to my question.
        Code:
        expand end-start+1
        bys id start: gen year = start+_n-1
        duplicates drop id year, force
        gen d = 1
        Ho-Chuan (River) Huang
        Stata 17.0, MP(4)

        Comment

        Working...
        X