Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting number of days per year from a time interval

    Dear list

    I've got two variables that records the start time and stop time of an event. I would like to extract the number of days per year in a separate variable for each year. I can think of some ways to do this, but all involves a somewhat long-winded code. Do you have any ideas on how to accomplish this without lengthy and error prone coding?

    Typical data:

    Code:
    input id start stop  
    1   18536   19418  
    2   18536   19418  
    3   18809   19418  
    4   18567   19267  
    5   18567   19267  
    6   18506   19053  
    7   18232   18840  
    8   17806   18506
    
    format start stop %td
    end
    Any suggestions would be very welcome.

    Thomas

  • #2
    The more difficult part of this sounds like what is below. The rest would require a reshape wide, but as that structure may make life more difficult in many ways, I stop here.

    Code:
     
    . clear 
    
    . input id start stop  
    
                id      start       stop
      1. 1   18536   19418  
      2. 2   18536   19418  
      3. 3   18809   19418  
      4. 4   18567   19267  
      5. 5   18567   19267  
      6. 6   18506   19053  
      7. 7   18232   18840  
      8. 8   17806   18506
      9. end
    
    . format start stop %td
    
    . gen y1 = year(start) 
    
    . gen y2 = year(stop) 
    
    . gen y = y2 - y1 + 1 
    
    . expand y 
    (18 observations created)
    
    . bysort id : gen days = mdy(12, 31, y1) - start + 1 if _n == 1
    (18 missing values generated)
    
    . by id: replace days = stop - mdy(1, 1, y2) if _n == _N
    (8 real changes made)
    
    . by id : replace y1 = y1 + _n - 1
    (18 real changes made)
    
    . replace days = mdy(12, 31, y1) - mdy(1, 1, y1) + 1 if missing(days)
    (10 real changes made)
    
    . l, sepby(id) 
    
         +-----------------------------------------------------+
         | id       start        stop     y1     y2   y   days |
         |-----------------------------------------------------|
      1. |  1   01oct2010   01mar2013   2010   2013   4     92 |
      2. |  1   01oct2010   01mar2013   2011   2013   4    365 |
      3. |  1   01oct2010   01mar2013   2012   2013   4    366 |
      4. |  1   01oct2010   01mar2013   2013   2013   4     59 |
         |-----------------------------------------------------|
      5. |  2   01oct2010   01mar2013   2010   2013   4     92 |
      6. |  2   01oct2010   01mar2013   2011   2013   4    365 |
      7. |  2   01oct2010   01mar2013   2012   2013   4    366 |
      8. |  2   01oct2010   01mar2013   2013   2013   4     59 |
         |-----------------------------------------------------|
      9. |  3   01jul2011   01mar2013   2011   2013   3    184 |
     10. |  3   01jul2011   01mar2013   2012   2013   3    366 |
     11. |  3   01jul2011   01mar2013   2013   2013   3     59 |
         |-----------------------------------------------------|
     12. |  4   01nov2010   01oct2012   2010   2012   3     61 |
     13. |  4   01nov2010   01oct2012   2011   2012   3    365 |
     14. |  4   01nov2010   01oct2012   2012   2012   3    274 |
         |-----------------------------------------------------|
     15. |  5   01nov2010   01oct2012   2010   2012   3     61 |
     16. |  5   01nov2010   01oct2012   2011   2012   3    365 |
     17. |  5   01nov2010   01oct2012   2012   2012   3    274 |
         |-----------------------------------------------------|
     18. |  6   01sep2010   01mar2012   2010   2012   3    122 |
     19. |  6   01sep2010   01mar2012   2011   2012   3    365 |
     20. |  6   01sep2010   01mar2012   2012   2012   3     60 |
         |-----------------------------------------------------|
     21. |  7   01dec2009   01aug2011   2009   2011   3     31 |
     22. |  7   01dec2009   01aug2011   2010   2011   3    365 |
     23. |  7   01dec2009   01aug2011   2011   2011   3    212 |
         |-----------------------------------------------------|
     24. |  8   01oct2008   01sep2010   2008   2010   3     92 |
     25. |  8   01oct2008   01sep2010   2009   2010   3    365 |
     26. |  8   01oct2008   01sep2010   2010   2010   3    243 |
         +-----------------------------------------------------+

    Comment


    • #3
      Thanks Nick! This solves it for me. Had to make one small addition to your code:

      Code:
      gen y1 = year(start)
      gen y2 = year(stop)
      gen y = y2 - y1 + 1
      expand y
      bysort id : gen days = mdy(12, 31, y1) - start + 1 if _n == 1
      by id: replace days = stop - mdy(1, 1, y2) if _n == _N
      by id: replace days=stop-start + 1 if y1==y2
      by id : replace y1 = y1 + _n - 1
      replace days = mdy(12, 31, y1) - mdy(1, 1, y1) + 1 if missing(days)

      Comment


      • #4
        Good catch.

        Here is a self-contained script with more careful code.

        Code:
        . input id start stop  
        
                    id      start       stop
          1. 1   18536   19418  
          2. 2   18536   19418  
          3. 3   18809   19418  
          4. 4   18567   19267  
          5. 5   18567   19267  
          6. 6   18506   19053  
          7. 7   18232   18840  
          8. 8   17806   18506
          9. 9   17806   17807 
         10. end
        
        . gen work = year(stop) - year(start) + 1
        
        . expand work 
        (18 observations created)
        
        . drop work 
        
        . bysort id : gen year = year(start[1]) +  _n - 1
        
        . bysort id : gen days = min(stop, mdy(12, 31, year)) - start + 1 if _n == 1
        (18 missing values generated)
        
        . by id: replace days = stop - mdy(1, 1, year) if _n == _N & missing(days) 
        (8 real changes made)
        
        . replace days = mdy(12, 31, year) - mdy(1, 1, year) + 1 if missing(days)
        (10 real changes made)
        
        . list, sepby(id)  
        
             +----------------------------------+
             | id   start    stop   year   days |
             |----------------------------------|
          1. |  1   18536   19418   2010     92 |
          2. |  1   18536   19418   2011    365 |
          3. |  1   18536   19418   2012    366 |
          4. |  1   18536   19418   2013     59 |
             |----------------------------------|
          5. |  2   18536   19418   2010     92 |
          6. |  2   18536   19418   2011    365 |
          7. |  2   18536   19418   2012    366 |
          8. |  2   18536   19418   2013     59 |
             |----------------------------------|
          9. |  3   18809   19418   2011    184 |
         10. |  3   18809   19418   2012    366 |
         11. |  3   18809   19418   2013     59 |
             |----------------------------------|
         12. |  4   18567   19267   2010     61 |
         13. |  4   18567   19267   2011    365 |
         14. |  4   18567   19267   2012    274 |
             |----------------------------------|
         15. |  5   18567   19267   2010     61 |
         16. |  5   18567   19267   2011    365 |
         17. |  5   18567   19267   2012    274 |
             |----------------------------------|
         18. |  6   18506   19053   2010    122 |
         19. |  6   18506   19053   2011    365 |
         20. |  6   18506   19053   2012     60 |
             |----------------------------------|
         21. |  7   18232   18840   2009     31 |
         22. |  7   18232   18840   2010    365 |
         23. |  7   18232   18840   2011    212 |
             |----------------------------------|
         24. |  8   17806   18506   2008     92 |
         25. |  8   17806   18506   2009    365 |
         26. |  8   17806   18506   2010    243 |
             |----------------------------------|
         27. |  9   17806   17807   2008      2 |
             +----------------------------------+
        Last edited by Nick Cox; 29 May 2015, 03:31.

        Comment


        • #5
          As it turns out, I have never had to do this in Stata. But I have some other work in C++ that needs to do this and my code there is infinitely more convoluted than the above. Thanks, Nick and Thomas, for the neat, clean solution--which I will port to my other application.

          Comment


          • #6
            The real white magic here is by:. To paraphrase Hofstadter's Law, its scope for simplifying group calculations is amazing, even when you know that it is amazing.

            The other thing that can bite here is coping with leap years. It's axiomatic that you should make Stata do all the work here. The hidden trick for coping with leap years is that Stata will count 365 or 366 correctly for the number of days in a year using

            Code:
             
            mdy(12, 31, year) - mdy(1, 1, year) + 1
            because mdy() certainly knows the calendar.

            Comment

            Working...
            X