Calculating the duration of episodes in two different time periods using date variables

Eduard López

Join Date: Dec 2014

Posts: 48
#1

Calculating the duration of episodes in two different time periods using date variables

28 May 2023, 20:12

Hello everyone
I'm trying to compare the duration of episodes (actually, duration of job contracts) in one period of time (2005-2010) to another (2010-2015). The reason is that in 2010 and in 2012 there were two policies implemented that might have affected the duration of the contracts, so I want to check if it was the case.

I have two variables that let me know when each contract starts and finishes. This is an example of the date variable for when the contract started:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long F_ALTA 19820301 19881103 20120213 20120501 19760501 19790808 19761102 19990901 19950101 19971001 19860414 19900110 19920528 19970301 19880314 19891001 19891112 19870501 19801020 19810421 20140501 20140902 20050601 19941004 20091101 20140902 19811210 19821201 19810331 19900502 19901001 19931213 19930911 20130813 20140801 19910408 19990419 20020515 19880707 19960801 20111202 19920525 19910712 19990705 20020902 19920504 19960223 19961023 19970519 20040714 20040913 20060904 20071203 20030201 20030908 20091118 20100104 20110117 20120220 20120625 19920111 19961001 20020406 20030712 20040404 20040601 20081205 20100206 20101223 20110924 20111214 20120126 20120201 20120405 19901018 19901022 19880926 19891128 19960101 19860701 19720327 19760901 19800501 19861001 19891010 19910501 19730201 19851104 19860415 19870107 19880913 19900801 19791015 19801203 19840201 19910501 19960101 19871123 19810114 19740129 end

As you can see, it is in the format of yyyymmdd. The variable for the date the contract finishes is in the same format. I would like to be able to generate another variable with the duration of contracts in days for the period 2005-2010 and for the 2010-2015 period, but I can't seem to come up with a good answer. Do you guys know of any way this could be done?

One idea I was thinking about is to first filter contracts if they were active during the 05-10 period. Then substract the date the contract ended from the date the contract started, but the problem is that many contracts started before 05 or continue beyond 2010, and I want to only get the duration of contracts for *within* that time window, not get days of duration that took place before 05. And same situation with the 10-15 time window, I want to get the duration of episodes only within it. So I would need to find a way to make all contract start dates prior to 05 (and to 10 for the second calculation) equal to 1-1-05, and contract end dates beyond 2009 equal to 31-12-09; I think that would mean creating extra variables of course, since I want to keep the original dates, as I need to repeat the process for the 2010-2015 time period.
I don't know how to proceed or how to make the calculations in days.

Thank yo so much for your help.

Last edited by Eduard López; 28 May 2023, 20:54.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35444

29 May 2023, 00:57

Your date variable is not fit for purpose, as can be seen by imagining that you subtract 19991231 from 20000101 or even 20000201 from 20000131.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long F_ALTA
19820301
19881103
20120213
20120501
19760501
19790808
19761102
19990901
19950101
19971001
end

gen F_ALTA_date = daily(strofreal(F_ALTA, "%8.0f"), "YMD") 
format F_ALTA_date %td 

list in 1/10 

     +----------------------+
     |   F_ALTA   F_ALTA_~e |
     |----------------------|
  1. | 19820301   01mar1982 |
  2. | 19881103   03nov1988 |
  3. | 20120213   13feb2012 |
  4. | 20120501   01may2012 |
  5. | 19760501   01may1976 |
     |----------------------|
  6. | 19790808   08aug1979 |
  7. | 19761102   02nov1976 |
  8. | 19990901   01sep1999 |
  9. | 19950101   01jan1995 |
 10. | 19971001   01oct1997 |
     +----------------------+

If you convert the other variable then subtraction will work as desired. It seems that you should want to specify something like

Code:

... if inrange(F_ALTA_date, mdy(1,1,2005), mdy(12,31,2010) & inrange(whatever, mdy(1,1,2005), mdy(12,31,2010)

and similarly for the other period. You are unclear about whether the first period really ended on mdy(12,31,2009) but the principle is the same.

There is no good understanding of date variables in Stata that doesn't pass through study of help datetime.

Comment

Eduard López

Join Date: Dec 2014

Posts: 48
#3

29 May 2023, 15:37

Thank you so much Nick. Just a quick question:

I want to calculate duration of job contracts within the two time periods. But I don't want to just calculate duration of contracts which both start and end within the time period. What I want is that:

1) If the job contract starts and ends within the 2005-2010 (eg. someone gets hired during 2005 and fired in 2007), then that's the duration of the contract, calculated as you explained.
2) If a job contract ends within the 2005-2010 period but it had started before (eg. someone who had been hired during 2001 and who is fired in 2007), I only want the variable to have the value of the number of days that that person worked within the time window of 2005-2010. I don't want the variable to count days "before" the time period.
3) Similarly, if a job contract starts within the 2005-2010 time period but finishes later (eg. someone hired in 2007 who is fired in 2015) I only want the variable to include the days that that person worked within the time period, in this case, from 2007 to 2010 (or from 2007 to December 31 2009, rather).
4) If the contract both started and ended before and after the time period (eg. someone hired in 1998 who worked until much later, for instance 2017) I want the variable to include all days within the time period, that is, the variable would have a maximum value of 365*5.

When using the inrange specification, I get many missing values, since many job contracts either started before or after the period, and thus are not included in the calculation. Do you know of any way to do what I explained?
Thank you so much!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35444
#4

29 May 2023, 17:42

This could do with a fuller data example, but some small tricks may help.

Let's suppose you have Stata daily dates entry and exit as calculated in #2. Then one kind of date has flavour

Code:

gen duration1 = min(exit, mdy(12,31,2009)) - max(entry, mdy(1,1,2005))

and another has flavour

Code:

gen duration2 = min(exit, mdy(12,31,2014)) - max(entry, mdy(1,1,2010))

as min() here yields the earlier of two dates and max() the later of two dates.

That doesn't cover all the details, but the rest should yield to similar devicees.
Comment
Eduard López

Join Date: Dec 2014

Posts: 48
#5

30 May 2023, 17:20

Thank you so much Nick. I ran the code and it worked perfectly. I got variables where some observations were negative, not sure why. What I did to try to get rid of them was simply to run the following code:

Code:

local x duration1 foreach x in `x'{ replace `x' = . if `x' < 0 }

This made negative values in all those observations turn to missing values.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35444
#6

31 May 2023, 01:40

Your loop boils down to

Code:

replace duration1 = . if duration1 < 0

Presumably you should

Code:

list entry exit if duration1 == .
Comment
Eduard López

Join Date: Dec 2014

Posts: 48
#7

02 Jun 2023, 17:28

You are right Nick, that syntax is even better.

I understand that the rationale to use the list command is to observe the episodes that had their duration replaced with a missing value.

Using the code also gave me many maximum values for duration1 and duration2 because those subjects (contracts) had a missing value for entry or exit. I used the following code, I'm hoping correctly, in order to restrict duration variables to those contracts that had an entry or exit date (and thus not skew the analyses with many contracts that were assigned maximum duration):

Code:

replace duration1= . if entry_date == . replace duration2= . if entry_date == . replace duration1= . if exit_date == . replace duration2= . if exit_date == .

My question now would be: now that I have two variables, duration1 and duration2 for both periods, is there a way to perform a linear regression or analyze the difference between them? Ideally I would like to observe if there has been a change in the duration of episodes before and after the event that took place in 2010. Other analyses I performed in the past using other (non longitudinal) databases could have a dummy variable for the year that worked as the independent variable, with a single "duration" variable as the dependent one.

Thanks a lot for your help and time.
Comment

Announcement