Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify if one range of dates falls within another range of dates

    For each individual in my dataset I have two date variables (date1 and date2). I then have a series of date ranges that I'm interested in (i.e. 5/15-7/14; 7/15-9/14 etc.). I would like to have Stata produce a a yes/no variable for each range of dates that tells me if the range of dates falls within the range of date1 and date2 (Does a range of dates fall within a range of another dates). I have data over two years, but the year is not important for this particular case.

    I was able to have Stata tell me if one date falls within a range of dates using the inrange command, but did not have further success. If anyone has an idea how to do this, I would greatly appreciate the help!

  • #2
    So let's say your range of dates consists of range_start and range_end.

    Code:
    assert date1 <= date2
    assert range_start <= range_end
    gen byte within = inrange(range_start, date1, date2) & inrange(range_end, date1, date2)
    The new variable within will be 1 when the interval range_start to range_end is contained in the interval from date1 to date2, and zero otherwise.

    Notes:

    1. This only works if date1, date2, range_start, and range_end are Stata internal format numeric variables. If they are strings that humans read as dates or if they are dates encoded by some other numeric scheme, they must first be converted using appropriate Stata datetime functions.

    2. This code counts things as within if, for example, range_start is the same date as date1. If you want strictly within, excluding the endpoints, then in the -gen byte within- command replace date1 by date1+1 and date2 by date2-1.

    3. The first two assert statements just verify that date1 always precedes (or is the same as) date2 and range_start precedes (or is the same as ) range_end. The code won't work correctly if those aren't true, so these -asserts- will warn you and stop you from getting wrong answers if your data do not conform to this.

    Comment


    • #3
      Thank you so much! This worked perfectly (preg2 and bob are are two date variables)

      gen byte within = inrange(mdy(5,15,2014), preg2, bdob) & inrange(mdy(7,14,2014), preg2, bdob)

      Is there additionally a way to ask stata to ignore the year (I would like to know if 5/15-7/14 is within the range of my two dates regardless of what year the two dates are in), or is it easier to just replace my within variable=1, if for example 5/15/2015-7/4/2015 is also within the range of dates?

      Comment


      • #4
        So this is a little complicated, because, generically, the interval you are testing could span many years and you would have to test this for each possible year. Doable, but kind of a mess.

        But I'm going to speculate that preg2 is the date of onset of a human pregnancy and bdob is the date the pregnancy ends with the birth of the baby. In that case, the duration of the interval preg2 to bdob is going to be typically 9 months or less, and never more than 10. Now, you could have a pregnancy that begins in late 2014 but covers 5/15/2015-7/14/2015, or one that begins in early May 2015 and extends into calendar year 2016. But even in these "worst case" scenarios, the midpoint of the pregnancy will always be in the calendar year for which the 5/15-7/14 inclusion is possible. So this should work:

        Code:
        gen midpoint = year(int((preg2+bdob)/2))
        gen byte within = inrange(mdy, 5, 15, midpoint), preg2, bdob) & inrange(mdy(7, 14, midpoint), preg2, bdob)

        Comment


        • #5
          Preg2 is the midpoint of pregnant and bdob is the date of birth of the infant. This code, however, seems to do the trick! I really appreciate the help!

          Comment


          • #6
            Hi again,
            I would greatly appreciate your assistance again with a little quirk..

            Upon closer examination, it seems the above code is not correctly identifying certain cases that it should indentify as in range.

            i.e. part of the 5/15-7/14 date range falls within the range 04jun2014(preg2) and 30sep2014 (bdob), but it is not being correctly identified as being within the range. I tried adding variations of

            inrange(month(6), preg2, bdob)

            to the command, but still was not able to get stata to correctly identify it. Any ideas?

            Comment


            • #7
              Oh, you misstated the problem!

              part of the 5/15-7/14 date range falls within the range 04jun2014(preg2) and 30sep2014 (bdob), but it is not being correctly identified as being within the range.
              In that example, the 5/15-7/14 date range does not fall within the range 04jun2014 and 30sep2014. So it seems what you want is not when 5/15-7/14 lies within those dates but rather whether it overlaps those dates. The code for that is different.

              Code:
              gen midpoint =year( int((preg2 + bdob)/2))
              gen byte overlap = inrange(preg2, mdy(5, 15, midpoint), mdy(7, 14, midpoint)) | inrange(bdob, mdy(5, 15, midpoint), mdy(7, 14, midpoint))

              Comment


              • #8
                Thank you! I apologize that I was not clear about what I was trying to do! Once the overlap (Y/N) of the dates has been established, is it also possible to extract the number of days of overlap between the two date ranges?

                Comment


                • #9
                  Yes.
                  Code:
                  gen days_overlap = min(mdy(7, 14, midpoint), bdob) - max(mdy(5, 14, midpoint), preg2) + 1

                  Comment


                  • #10
                    Hi again,

                    Just when I think that the code is giving me what I hope, it turns out it is still not working exactly right. For each infant I have two dates as described previously (preg2 bdob). Then I have defined seasons of the year as Summer (5/15-7/14), Rainy (7/15-9/14), Autumn (9/15-11/14), Early winter (11/15-1/14) etc (total of 6 seasons)

                    I'm trying to create a series of 6 season exposure variables in which each variable is coded as 1=yes; infant had some exposure to this season between preg2 and bdob, or 0=infant had no exposure to this season during the two dates

                    So for example if preg2= 20 Jul and bdob=15 Nov, then the infant had some exposure time to the rainy season, to the Autumn season, and to the early winter season (1 day).

                    This is the code I am using to generate a variable that is exposure to the summer season:

                    gen midpoint =year( int((preg2+bdob)/2))
                    gen byte preg2_sum = inrange(preg2, mdy(5, 15, midpoint), mdy(7, 14, midpoint)) | inrange(bdob, mdy(5, 15, midpoint), mdy(7, 14, midpoint))


                    This code is not correctly identifying exposure correctly across all cases and nothing I try seems to work out quite right.

                    Once I get this to work properly, I would then like to generate additional variables that tell me the number of days the infant was exposed to each season between the dates preg2 and bdob.

                    Comment


                    • #11
                      The problem is with the midpoint calculation to pick out the year. That trick works for the summer season. But it won't work properly for some of the others because it can give the wrong year to compare with. And it certainly won't work for the Early Winter season where you definitely need to use a different year for the beginning and endpoints.

                      I think rather than trying to figure out an appropriate "midpoint" trick for each season, it is probably simpler now to do it by looping over years. So, for the sake of discussion, let's say your data range from 2010 through 2016. Then the code would look like this:

                      Code:
                      gen byte overlaps_summer  = 0
                      gen days_summer = .
                      
                      gen byte overlaps_rainy = 0
                      gen days_rainy = .
                      
                      gen byte overlaps_autumn = 0
                      gen days_autumn = .
                      
                      gen byte overlaps_early_winter = 0
                      gen days_early_winter = .
                      
                      // AND SIMILARLY FOR WHATEVER THE OTHER TWO SEASONS ARE
                      
                      forvalues y = 2010/2016 {
                          replace overlaps_summer = 1 if inrange(preg2, mdy(5, 15, `y'), mdy(7, 14, `y')) | inrange(bdob, mdy(5, 15, `y'), mdy(7, 14, `y'))
                          replace days_summer = min(bdob, mdy(7, 14, `y') - max(preg2, mdy(5, 15, `y') if overlaps_summer
                      
                          replace overlaps_rainy = 1 if inrange(preg2, mdy(7, 15, `y'), mdy(9, 14, `y')) | inrange(bdob, mdy(7, 15, `y'), mdy(9, 14, `y'))
                          replace days_rainy = min(bdob, mdy(9,  14, `y') - max(preg2, mdy(7, 15, `y') if overlaps_rainy
                      
                          replace overlaps_autumn = 1 if inrange(preg2, mdy(9, 15, `y'), mdy(11, 14, `y')) | inrange(bdob, mdy(9 15, `y', mdy(11, 14, `y'))     
                          replace days_autumn = min(bdob, mdy(11, 14, `y') - max(preg2, mdy(9, 15, `y') if overlaps_autumn 
                      
                          replace overlaps_early_winter = 1 if inrange(preg2, mdy(11, 15, `y'), mdy(1, 14, `=`y'+1') | inrange(bdob, mdy(11, 15, `y'), mdy(1, 14, `=`y'+1')
                          replace days_early_winter = min(bdob, mdy(1, 14, `=`y'+1') - max(preg2, mdy(11, 15, `y') if overlaps_early_winter
                       
                          // AND SIMILARLY FOR THE REMAINING TWO SEASONS
                      }
                      Notes:

                      For the remaining two seasons, you don't need the `=`y'+1', you can just use `y' because the beginning and end of the season are in the same calendar year.

                      This is not tested. I am confident of the logic, but it may be riddled with typos.

                      It is possible to streamline this code by creating some local macros with the season names and some others with their start and end dates and then looping over the seasons, but while it would shorten the code, I think it would make it pretty opaque.

                      In retrospect, it would have been a more efficient use of time for both of us had you set out the problem fully and correctly from the start.




                      Comment


                      • #12
                        Thank you, and I apologize for not including more detail initially. I was thinking this would be a generic data fix, but it seems I was wrong!

                        I ran the below code (wk20_2= former preg2 and bdob37wk=former bdob--- apologies needed to make a few adjustments to these dates). This is the code you kindly shared with me with the added seasons and corrections of typos and missing ( ).

                        gen byte overlaps_summer = 0
                        gen days_summer = .

                        gen byte overlaps_rainy = 0
                        gen days_rainy = .

                        gen byte overlaps_autumn = 0
                        gen days_autumn = .

                        gen byte overlaps_early_winter = 0
                        gen days_early_winter = .

                        gen byte overlaps_late_winter = 0
                        gen days_late_winter = .

                        gen byte overlaps_spring = 0
                        gen days_spring = .



                        forvalues y = 2014/2016 {
                        replace overlaps_summer = 1 if inrange(wk20_2, mdy(5, 15, `y'), mdy(7, 14, `y')) | inrange(bdob37wk, mdy(5, 15, `y'), mdy(7, 14, `y'))
                        replace days_summer = (min(bdob37wk, mdy(7, 14, `y'))) - (max(wk20_2, mdy(5, 15, `y'))) if overlaps_summer

                        replace overlaps_rainy = 1 if inrange(wk20_2, mdy(7, 15, `y'), mdy(9, 14, `y')) | inrange(bdob37wk, mdy(7, 15, `y'), mdy(9, 14, `y'))
                        replace days_rainy = (min(bdob37wk, mdy(9, 14, `y'))) - (max(wk20_2, mdy(7, 15, `y'))) if overlaps_rainy

                        replace overlaps_autumn = 1 if inrange(wk20_2, mdy(9, 15, `y'), mdy(11, 14, `y')) | inrange(bdob37wk, mdy(9, 15, `y'), mdy(11, 14, `y'))
                        replace days_autumn = (min(bdob37wk, mdy(11, 14, `y'))) - (max(wk20_2, mdy(9, 15, `y'))) if overlaps_autumn

                        replace overlaps_early_winter = 1 if inrange(wk20_2, mdy(11, 15, `y'), mdy(1, 14, `=`y'+1')) | inrange(bdob37wk, mdy(11, 15, `y'), mdy(1, 14, `=`y'+1'))
                        replace days_early_winter = (min(bdob37wk, mdy(1, 14, `=`y'+1'))) - (max(wk20_2, mdy(11, 15, `y'))) if overlaps_early_winter

                        replace overlaps_late_winter = 1 if inrange(wk20_2, mdy(1, 15, `y'), mdy(3, 14, `y')) | inrange(bdob37wk, mdy(1, 15, `y'), mdy(3, 14, `y'))
                        replace days_late_winter = (min(bdob37wk, mdy(3, 14, `y'))) - (max(wk20_2, mdy(1, 15, `y'))) if overlaps_autumn

                        replace overlaps_spring = 1 if inrange(wk20_2, mdy(3, 15, `y'), mdy(5, 14, `y')) | inrange(bdob37wk, mdy(3, 15, `y'), mdy(5, 14, `y'))
                        replace days_spring = (min(bdob37wk, mdy(5, 14, `y'))) - (max(wk20_2, mdy(3, 15, `y'))) if overlaps_autumn
                        }


                        ** There are still a string of values for each season that are not being identified correctly (in these cases here the range of dates: 5/15-7/14 are within wk20_2 and bdob37wk, but are being coded as "0").

                        i.e.
                        wk20_2 bdob37wk overlaps_summer
                        05apr2014 01aug2014 0
                        05apr2014 01aug2014 0
                        05apr2014 01aug2014 0



                        ** For most cases, the identification of overlaps is correct, but then the number of days is incorrect (total number of days is negative)

                        i.e.
                        wk20_2 bdob37wk overlaps_summer days_summer
                        15may2014 10sep2014 1 -673
                        18may2014 13sep2014 1 -670
                        20may2014 15sep2014 1 -668


                        I'm sorry I have taken so much of your time! Thank you for your help and patience!

                        Comment


                        • #13
                          OK. My errors, sorry. I did have the logic wrong for identifying overlaps. As for the number of days of overlap, it was a matter of the # of days being overwritten repeatedly in each year. So here is the corrected code for summer. You can make the analogous corrections for the other seasons:

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float(wk20_2 bdob37wk)
                          19818 19936
                          19818 19936
                          19818 19936
                          19858 19976
                          19861 19979
                          19863 19981
                          end
                          format %td wk20_2
                          format %td bdob37wk
                          
                          gen byte overlaps_summer = 0
                          gen days_summer = .
                          
                          forvalues y = 2014/2016 {
                              replace overlaps_summer = 1 if !((bdob37wk < mdy(5, 15, `y')) | (wk20_2 > mdy(7, 14, `y')))
                              replace days_summer = (min(bdob37wk, mdy(7, 14, `y'))) - (max(wk20_2, mdy(5, 15, `y')))  ///
                                  if overlaps_summer & missing(days_summer)
                          
                          }
                          Remember to use the `=`y'+1' when the calendar year changes over the course of the "season."

                          In the future, when posting example data, please use the -datatex- command, as I have done above. The example data you showed was difficult to import into Stata and required considerable "massaging." You can install the -dataex- command by running -ssc install dataex-. The instructions for using it are in -help dataex-. By using -dataex-, you make it possible for others to create a completely faithful replica of your example with a simple copy-paste operation.

                          Comment


                          • #14
                            This seems to have done the trick! I have been at this for days with some very clunky code so this is a complete lifesaver! Can't say thanks enough! I have installed the dataex package and will use it for future posts. Thank you again!

                            Comment


                            • #15
                              Hi, unfortunately, the code didn't work in my case. Any idea why this happened?

                              Code:
                              code :
                               gen byte within = inrange(range_start, date1, date2) & inrange(range_end, date1, date2)
                              My case:

                              range_start = 01/01/2012
                              range_end=31/12/2012
                              date1 (open firm) =19apr2012
                              date2 (close firm) = 1oct2019
                              within (2012) = 0


                              Why didn't Stata put 1 instead of 0?
                              Thanks.

                              Last edited by Gustavo Locatelli; 03 Feb 2022, 01:58.

                              Comment

                              Working...
                              X