Define a dummy with overlapping periods

River Huang

Join Date: Mar 2016

Posts: 1908
#1

Define a dummy with overlapping periods

19 Mar 2020, 21:31

Dear All, I have this dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(id start end) 1101 2004 2009 1101 2018 2023 1102 2008 2013 1102 2011 2016 1102 2011 2016 1102 2013 2018 1102 2018 2023 end

For each `id', I'd like to define a dummy variable, say `d', which is 1 if the year is between (including) `start' and `end'. For instance, for id=1101 (no overlapping years), d=1 if the year is 2004,2005,...,2009, 2018,2019,...,2013. In contrast, for id=1102, there are some overlapping years. It can be seen that, d=1 for years 2008,2009,...,2023. Any suggestions. Thanks.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10195

20 Mar 2020, 03:46

River Huang, I guess you forgot to include a year variable in your example. You could create the full list of id-year combinations and merge back.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id start end) float year
1101 2004 2009 2013
1101 2018 2023 2019
1102 2008 2013 2006
1102 2011 2016 2015
1102 2011 2016 2024
1102 2013 2018 2013
1102 2018 2023 2008
end

preserve
gen duration= end- start+1
expand duration, g(new)
bys id end (new): replace start= start[_n-1]+1 if _n>1
keep id start
rename start year
contract id year
drop _freq
tempfile years
save `years'
restore 
merge m:1 id year using `years', keep(master match) 
gen wanted=_merge==3
drop _merge

Res.:

Code:

. l, sep(10)

     +-------------------------------------+
     |   id   start    end   year   wanted |
     |-------------------------------------|
  1. | 1101    2004   2009   2013        0 |
  2. | 1101    2018   2023   2019        1 |
  3. | 1102    2008   2013   2006        0 |
  4. | 1102    2018   2023   2008        1 |
  5. | 1102    2013   2018   2013        1 |
  6. | 1102    2011   2016   2015        1 |
  7. | 1102    2011   2016   2024        0 |
     +-------------------------------------+

Comment

River Huang

Join Date: Mar 2016
Posts: 1908

20 Mar 2020, 18:30

Dear Andrew, Thanks. I guess I didn't make myself clear. Suppose a similar (simpler) dataset is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id start end)
1101 2005 2007
1101 2010 2013
1102 2008 2012
1102 2011 2014
end

What I wanted is something like

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float(year d)
1101 2005 1
1101 2006 1
1101 2007 1
1101 2008 0
1101 2009 0
1101 2010 1
1101 2011 1
1101 2012 1
1101 2013 1
1102 2008 1
1102 2009 1
1102 2010 1
1102 2011 1
1102 2012 1
1102 2013 1
1102 2014 1
end

Do you have any further suggestions? Thanks again.

Last edited by River Huang; 20 Mar 2020, 19:06.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#4

20 Mar 2020, 19:07

I find this solution to my question.

Code:

expand end-start+1 bys id start: gen year = start+_n-1 duplicates drop id year, force gen d = 1

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Announcement