Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Build a panel from employment spell start and end dates

    Hello, I have a dataset that looks like the toy example below: each observation represents an employment spell of a given individual (id), where by employment spell I mean the length of time (defined by start_date and end_date) the individual has worked for a given firm. Periods of time that are unaccounted for (as is the case for id 1 between ending his work at firm B and starting at C) are assumed to be unemployment spells. I am also assuming that the first employment spell for an individual is the start of his work life.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str1 firm float(start_date end_date)
    1 "A" 18995 19205
    1 "B" 19206 21870
    1 "C" 22071     .
    2 "D" 14432 22578
    3 "A" 20953 22280
    3 "D" 22281     .
    end
    format %td start_date
    format %td end_date
    I need to be working with a monthly panel, so I want to reshape my dataset so that it looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str3 firm float(time employed)
    1 "A" 624 1
    1 "A" 625 1
    1 "A" 626 1
    . "."   . .
    1 "A" 630 1
    1 "B" 631 1
    . "."   . .
    1 "B" 718 1
    1 ""  719 0
    1 ""  720 0
    . "."   . .
    1 ""  724 0
    1 "C" 725 1
    2 "D" 474 1
    . "."   . .
    2 "D" 741 1
    end
    format %tm time
    Essentially I want a panel where time is my monthly date, and each observation tracks the status of each individual (employed being 1 if the individual is working in a given firm, and 0 if that month the individual is assumed to be unemployed). How can I do so in an economical way? I can't seem to be able to figure out a straightforward solution. I am also not sure if I should aim for a balanced panel (i.e. tracking all individuals from a given starting date, let's say January 1990, regardless of when their work life actually started) or not.

    Any suggestion on how to proceed?

  • #2
    So, id 1's end_date for firm B is 17nov2019. So how do we characterize him/her for the month of November 2019? Is he/she employed or unemployed for that month? What's the rule for deciding this?

    As for the question of a balanced panel, let me throw that question back at you: why do you think that would be better? What would you want to do where having a balanced panel that is choc-a-block with missing data would be advantageous? There are hardly any panel data commands that require or even vaguely prefer balanced data. Unless you know you will be using one that does, it doesn't really make sense to balance the panel. And if you are going to use one, the fact that much of the data will be missing is a major problem--so much so that you may well need to go back to square one and rethink the entire analysis.

    Comment


    • #3
      In a situation like that I would consider id 1 for Nov 2019 as employed; not sure how I would treat it if the end_date was, say, 05nov2019... I need to study my dataset a bit more in depth. But in the meantime let's say my cut-off is the middle of the month: past the 15th of the month the individual is employed, otherwise he's unemployed.

      That's a good point to raise. The reason why I'm building this panel is that I need it to compute a monthly count of the number of employees a given firm has -- because I don't have a direct report, I can only infer the totals by looking at the employment spells of the individuals. So yes, I guess a balanced panel wouldn't make sense in my case.

      I also am aware there might be a faster way to get to my end point (i.e. without having to create the panel), but again, I am not good at coming up with economical solutions when I'm using Stata.

      Comment


      • #4
        Try this:
        Code:
        set seed 1234
        set sortseed 314159
        
        //    VERIFY DATES ARE IN ORDER AND SPELLS DO NOT OVERLAP
        by id (start_date), sort: assert end_date >= start_date ///
            & start_date[_n+1] >= end_date + 1
            
        //    ASSUME MISSING END DATE MEANS EMPLOYMENT STILL ONGOING AS OF STUDY DATE
        local study_date = td(29feb2024)
        replace end_date = `study_date' if missing(end_date)
        
        foreach x in start end {
            gen `x'_month = mofd(`x'_date)
            format `x'_month %tm
        }
        
        expand end_month - start_month + 1
        by id start_date, sort: gen mdate = start_month[1] + _n -1
        format mdate %tm
        
        gen overlap_this_month = min(end_date, lastdayofmonth(dofm(mdate))) ///
            - max(start_date, firstdayofmonth(dofm(mdate))) + 1
        by id mdate (overlap_this_month), sort: keep if _n == _N
        
        xtset id mdate
        tsfill
        label define employment_status    0    "Unemployed"    1    "Employed"
        gen employment_status:employment_status = 0 if missing(firm)
        by id start_date (mdate), sort: replace employment_status = !missing(firm)
        by id start_date (mdate): replace employment_status = 0 if _n == 1 ///
            & 2*overlap_this_month <= daysinmonth(start_date) & !missing(firm)
        by id start_date (mdate): replace employment_status = 0 if _n == _N ///
            & 2*overlap_this_month <= daysinmonth(end_date) & !missing(firm)
        by id (mdate), sort: replace employment_status = 1 if _n == 1
        Note: I have slightly generalized your response to my question about how to count a partial month. In this code, if an employee's month is partly spent in two or more firms, or one firm and unemployment, that month will be allocated to whichever part of the month is the largest number of days. If there is a tie, it is broken randomly.

        Comment


        • #5
          Thank you for providing this solution, I will definitely try it! By looking at it, the only issue I can see is that I don't have Stata 17 and therefore I don't have access to the lastdayofmonth and firstdayofmonth functions. That's on me of course, I should have specified I'm working with Stata 15! Is there maybe a way to install those functions (or anything that would work in the same way) in older Stata versions?

          Comment


          • #6
            OK. I don't think -daysinmonth()- was available in Stata 15 either. These things are all easily replaced:

            Code:
            daysinmonth(date_var) = dofm(mofd(date_var)+1)-dofm(mofd(date_var))
            
            lastdayofmonth(date_var) = dofm(mofd(date_var)+1) - 1
            
            firstsdayofmonth(date_var) = dofm(mofd(date_var))
            Now, in your code, mofd(date_var) has already been calculated, so that shortens the code changes a bit.

            Code:
            set seed 1234
            set sortseed 314159
            
            //    VERIFY DATES ARE IN ORDER AND SPELLS DO NOT OVERLAP
            by id (start_date), sort: assert end_date >= start_date ///
                & start_date[_n+1] >= end_date + 1
                
            //    ASSUME MISSING END DATE MEANS EMPLOYMENT STILL ONGOING AS OF STUDY DATE
            local study_date = td(29feb2024)
            replace end_date = `study_date' if missing(end_date)
            
            foreach x in start end {
                gen `x'_month = mofd(`x'_date)
                format `x'_month %tm
            }
            
            expand end_month - start_month + 1
            by id start_date, sort: gen mdate = start_month[1] + _n -1
            format mdate %tm
            
            gen overlap_this_month = min(end_date, (dofm(mdate+1)-1)) ///
                - max(start_date, (dofm(mdate))) + 1
            by id mdate (overlap_this_month), sort: keep if _n == _N
            
            xtset id mdate
            tsfill
            label define employment_status    0    "Unemployed"    1    "Employed"
            gen employment_status:employment_status = 0 if missing(firm)
            by id start_date (mdate), sort: replace employment_status = !missing(firm)
            by id start_date (mdate): replace employment_status = 0 if _n == 1 ///
                & 2*overlap_this_month <= dofm(start_month+1)-dofm(start_month) & !missing(firm)
            by id start_date (mdate): replace employment_status = 0 if _n == _N ///
                & 2*overlap_this_month <= dofm(end_month+1)-dofm(end_month) & !missing(firm)
            by id (mdate), sort: replace employment_status = 1 if _n == 1

            Comment


            • #7
              By quickly running this code on my toy dataset it seems to be working like a charm, thanks a lot Clyde Schechter!

              Comment

              Working...
              X