Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the duration of episodes in two different time periods using date variables

    Hello everyone
    I'm trying to compare the duration of episodes (actually, duration of job contracts) in one period of time (2005-2010) to another (2010-2015). The reason is that in 2010 and in 2012 there were two policies implemented that might have affected the duration of the contracts, so I want to check if it was the case.

    I have two variables that let me know when each contract starts and finishes. This is an example of the date variable for when the contract started:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long F_ALTA
    19820301
    19881103
    20120213
    20120501
    19760501
    19790808
    19761102
    19990901
    19950101
    19971001
    19860414
    19900110
    19920528
    19970301
    19880314
    19891001
    19891112
    19870501
    19801020
    19810421
    20140501
    20140902
    20050601
    19941004
    20091101
    20140902
    19811210
    19821201
    19810331
    19900502
    19901001
    19931213
    19930911
    20130813
    20140801
    19910408
    19990419
    20020515
    19880707
    19960801
    20111202
    19920525
    19910712
    19990705
    20020902
    19920504
    19960223
    19961023
    19970519
    20040714
    20040913
    20060904
    20071203
    20030201
    20030908
    20091118
    20100104
    20110117
    20120220
    20120625
    19920111
    19961001
    20020406
    20030712
    20040404
    20040601
    20081205
    20100206
    20101223
    20110924
    20111214
    20120126
    20120201
    20120405
    19901018
    19901022
    19880926
    19891128
    19960101
    19860701
    19720327
    19760901
    19800501
    19861001
    19891010
    19910501
    19730201
    19851104
    19860415
    19870107
    19880913
    19900801
    19791015
    19801203
    19840201
    19910501
    19960101
    19871123
    19810114
    19740129
    end


    As you can see, it is in the format of yyyymmdd. The variable for the date the contract finishes is in the same format. I would like to be able to generate another variable with the duration of contracts in days for the period 2005-2010 and for the 2010-2015 period, but I can't seem to come up with a good answer. Do you guys know of any way this could be done?

    One idea I was thinking about is to first filter contracts if they were active during the 05-10 period. Then substract the date the contract ended from the date the contract started, but the problem is that many contracts started before 05 or continue beyond 2010, and I want to only get the duration of contracts for *within* that time window, not get days of duration that took place before 05. And same situation with the 10-15 time window, I want to get the duration of episodes only within it. So I would need to find a way to make all contract start dates prior to 05 (and to 10 for the second calculation) equal to 1-1-05, and contract end dates beyond 2009 equal to 31-12-09; I think that would mean creating extra variables of course, since I want to keep the original dates, as I need to repeat the process for the 2010-2015 time period.
    I don't know how to proceed or how to make the calculations in days.

    Thank yo so much for your help.
    Last edited by Eduard López; 28 May 2023, 20:54.

  • #2
    Your date variable is not fit for purpose, as can be seen by imagining that you subtract 19991231 from 20000101 or even 20000201 from 20000131.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long F_ALTA
    19820301
    19881103
    20120213
    20120501
    19760501
    19790808
    19761102
    19990901
    19950101
    19971001
    end
    
    gen F_ALTA_date = daily(strofreal(F_ALTA, "%8.0f"), "YMD") 
    format F_ALTA_date %td 
    
    list in 1/10 
    
         +----------------------+
         |   F_ALTA   F_ALTA_~e |
         |----------------------|
      1. | 19820301   01mar1982 |
      2. | 19881103   03nov1988 |
      3. | 20120213   13feb2012 |
      4. | 20120501   01may2012 |
      5. | 19760501   01may1976 |
         |----------------------|
      6. | 19790808   08aug1979 |
      7. | 19761102   02nov1976 |
      8. | 19990901   01sep1999 |
      9. | 19950101   01jan1995 |
     10. | 19971001   01oct1997 |
         +----------------------+
    If you convert the other variable then subtraction will work as desired. It seems that you should want to specify something like

    Code:
    ... if inrange(F_ALTA_date, mdy(1,1,2005), mdy(12,31,2010) & inrange(whatever, mdy(1,1,2005), mdy(12,31,2010)
    and similarly for the other period. You are unclear about whether the first period really ended on mdy(12,31,2009) but the principle is the same.

    There is no good understanding of date variables in Stata that doesn't pass through study of help datetime.


    Comment


    • #3
      Thank you so much Nick. Just a quick question:

      I want to calculate duration of job contracts within the two time periods. But I don't want to just calculate duration of contracts which both start and end within the time period. What I want is that:

      1) If the job contract starts and ends within the 2005-2010 (eg. someone gets hired during 2005 and fired in 2007), then that's the duration of the contract, calculated as you explained.
      2) If a job contract ends within the 2005-2010 period but it had started before (eg. someone who had been hired during 2001 and who is fired in 2007), I only want the variable to have the value of the number of days that that person worked within the time window of 2005-2010. I don't want the variable to count days "before" the time period.
      3) Similarly, if a job contract starts within the 2005-2010 time period but finishes later (eg. someone hired in 2007 who is fired in 2015) I only want the variable to include the days that that person worked within the time period, in this case, from 2007 to 2010 (or from 2007 to December 31 2009, rather).
      4) If the contract both started and ended before and after the time period (eg. someone hired in 1998 who worked until much later, for instance 2017) I want the variable to include all days within the time period, that is, the variable would have a maximum value of 365*5.

      When using the inrange specification, I get many missing values, since many job contracts either started before or after the period, and thus are not included in the calculation. Do you know of any way to do what I explained?
      Thank you so much!

      Comment


      • #4
        This could do with a fuller data example, but some small tricks may help.

        Let's suppose you have Stata daily dates entry and exit as calculated in #2. Then one kind of date has flavour

        Code:
        gen duration1 = min(exit, mdy(12,31,2009)) - max(entry, mdy(1,1,2005))
        and another has flavour

        Code:
        gen duration2 = min(exit, mdy(12,31,2014)) - max(entry, mdy(1,1,2010)) 
        as min() here yields the earlier of two dates and max() the later of two dates.

        That doesn't cover all the details, but the rest should yield to similar devicees.

        Comment


        • #5
          Thank you so much Nick. I ran the code and it worked perfectly. I got variables where some observations were negative, not sure why. What I did to try to get rid of them was simply to run the following code:

          Code:
          local x duration1
          
          foreach x in `x'{
          
              replace `x' = . if `x' < 0
          
          }
          This made negative values in all those observations turn to missing values.

          Comment


          • #6
            Your loop boils down to

            Code:
            replace duration1 = . if duration1 < 0
            Presumably you should

            Code:
            list entry exit if duration1 == . 

            Comment


            • #7
              You are right Nick, that syntax is even better.

              I understand that the rationale to use the list command is to observe the episodes that had their duration replaced with a missing value.

              Using the code also gave me many maximum values for duration1 and duration2 because those subjects (contracts) had a missing value for entry or exit. I used the following code, I'm hoping correctly, in order to restrict duration variables to those contracts that had an entry or exit date (and thus not skew the analyses with many contracts that were assigned maximum duration):

              Code:
              replace duration1= . if entry_date == .
              
              replace duration2= . if entry_date == .
              
              replace duration1= . if exit_date == .
              
              replace duration2= . if exit_date == .
              My question now would be: now that I have two variables, duration1 and duration2 for both periods, is there a way to perform a linear regression or analyze the difference between them? Ideally I would like to observe if there has been a change in the duration of episodes before and after the event that took place in 2010. Other analyses I performed in the past using other (non longitudinal) databases could have a dummy variable for the year that worked as the independent variable, with a single "duration" variable as the dependent one.

              Thanks a lot for your help and time.

              Comment

              Working...
              X