Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Monthly labor force status stored as a string variable

    Hello,

    I am working with a data set which storing monthly labour force status during the survey reference period as a 24-digit string variable.

    For simplicity, let’s say 1 is for employed; 2 is for unemployed; 0 is not covered in the survey.

    For example, the data looks liked:
    pid dv
    1001 111111111111222221111000
    1002 222111111111111221111111
    1003 111111112111111111111000
    1004 111122221111111122221111
    ...

    I’d like to clean this, so that I can run a hazard analysis. So, initial state, durations of employment, durations of unemployment, date of a spell started, etc.

    What would be the best approach/reference to start?





  • #2
    Here's one approach:

    Code:
    forvalues i = 1/24 {
       gen byte molaborstat`i' = real(substr(dv,`i',1))
       label var molaborstat`i' "Labor force status in month `i'"
    }
    label define lflbl 0 "NA" 1 "employed" 2 "unemployed"
    label values molaborstat* lflbl
    // If you are going to do a discrete time analysis,
    // the long format will be useful
    reshape long molaborstat, i(pid) j(month)

    Comment


    • #3
      Welcome to Statalist.

      The following code may start you in a useful direction.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int pid str24 dv
      1001 "111111111111222221111000"
      1002 "222111111111111221111111"
      1003 "111111112111111111111000"
      1004 "111122221111111122221111"
      end
      // replace each digit with itself and a following space
      generate dv2 = dv
      replace dv2 = subinstr(dv2,"0","0 ",.)
      replace dv2 = subinstr(dv2,"1","1 ",.)
      replace dv2 = subinstr(dv2,"2","2 ",.)
      // split it into status1...status 24
      split dv2, generate(status) destring
      drop dv dv2
      // reshape it into one observation per month
      reshape long status, i(pid) j(month)
      list if pid==1001, clean noobs
      Code:
      . // replace each digit with itself and a following space
      . generate dv2 = dv
      
      . replace dv2 = subinstr(dv2,"0","0 ",.)
      variable dv2 was str24 now str27
      (2 real changes made)
      
      . replace dv2 = subinstr(dv2,"1","1 ",.)
      variable dv2 was str27 now str47
      (4 real changes made)
      
      . replace dv2 = subinstr(dv2,"2","2 ",.)
      variable dv2 was str47 now str48
      (4 real changes made)
      
      . // split it into status1...status 24
      . split dv2, generate(status) destring
      variables born as string:
      status1   status4   status7   status10  status13  status16  status19  status22
      status2   status5   status8   status11  status14  status17  status20  status23
      status3   status6   status9   status12  status15  status18  status21  status24
      status1: all characters numeric; replaced as byte
      status2: all characters numeric; replaced as byte
      ...
      status24: all characters numeric; replaced as byte
      
      
      . drop dv dv2
      
      . // reshape it into one observation per month
      . reshape long status, i(pid) j(month)
      (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                        4   ->      96
      Number of variables                  25   ->       3
      j variable (24 values)                    ->   month
      xij variables:
                 status1 status2 ... status24   ->   status
      -----------------------------------------------------------------------------
      
      . list if pid==1001, clean noobs
      
           pid   month   status  
          1001       1        1  
          1001       2        1  
          1001       3        1  
          1001       4        1  
          1001       5        1  
          1001       6        1  
          1001       7        1  
          1001       8        1  
          1001       9        1  
          1001      10        1  
          1001      11        1  
          1001      12        1  
          1001      13        2  
          1001      14        2  
          1001      15        2  
          1001      16        2  
          1001      17        2  
          1001      18        1  
          1001      19        1  
          1001      20        1  
          1001      21        1  
          1001      22        0  
          1001      23        0  
          1001      24        0
      I will note that the construction of dv2 could have been reduced from 4 very obvious commands to a single command using Stata's unicode regular expression function ustrregexra() at a cost of complete incomprehensibility to anyone not familiar with regular expressions. But if confronted with a larger number of potential characters, it would be less repetitive coding.
      Code:
      generate dv2 = ustrregexra(dv,"(.)","$1 ")
      Last edited by William Lisowski; 23 Feb 2020, 08:54.

      Comment

      Working...
      X