Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flip order of one variable within nested data

    Dear Stata Community,

    I have run into a problem with my dataset as it seems that some of my variables are referring to jobs in chronological order whereas others do the reverse. What I am trying to do with the example data below is have the first instance of hrp refer to the last instance of tenure. My data are referring to jobs held by various people over the years (so the combination of caseid, year, and jobnumber serves as a unique identifier). The changes should occur within people and years. To illustrate what I mean: For the first instance (caseid=14, year=1979), the data is fine as is. There is one job which is matched with the correct values of tenure and hrp. For the second instance (caseid=14, year=1984), i need the value of hrp (650) to refer to the most recent job, jobnumber=7. In the third case the same needs to happen such that the hrp levels get matched in reverse order to the most recent instances, such that 850 would appear in the last line, the two 650s in the two lines before, and the 400 would be matched with jobnumer=6.

    If anybody could offer any help, that would be very much appreciated.

    P.S.: The example here only contains information on one person, so as to keep the output manageable, but my data obviously contains more. So the changes are supposed to occur by caseid and year.





    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(caseid year) byte jobnumber int tenure long hrp
    14 1979 1  9 237
    14 1979 2  .   .
    14 1979 3  .   .
    14 1979 4  .   .
    14 1979 5  .   .
    14 1979 6  .   .
    14 1979 7  .   .
    14 1979 8  .   .
    14 1979 9  .   .
    14 1985 1  . 650
    14 1985 2  .   .
    14 1985 3  .   .
    14 1985 4  .   .
    14 1985 5 20   .
    14 1985 6 73   .
    14 1985 7 41   .
    14 1985 8  .   .
    14 1985 9  .   .
    14 1986 1  . 855
    14 1986 2  . 650
    14 1986 3  . 650
    14 1986 4  . 400
    14 1986 5  .   .
    14 1986 6 87   .
    14 1986 7 55   .
    14 1986 8 15   .
    14 1986 9 14   .
    end
    label values tenure vlX0155800


  • #2
    Starting with the data in post #1, I think this code does what you want.
    Code:
    tempfile main
    save `main', replace
    
    bysort caseid year (jobnumber): egen recent = max(cond(tenure!=.,jobnumber,.))
    bysort caseid year (jobnumber): generate reverse = tenure[1]==. & recent!=.
    bysort caseid year (jobnumber): replace jobnumber = recent+1-jobnumber if reverse
    drop tenure recent reverse
    drop if hrp==.
    rename hrp new_hrp
    
    merge 1:1 caseid year jobnumber using `main'
    sort caseid year jobnumber
    drop _merge 
    list, sepby(caseid year) noobs
    Code:
    . list, sepby(caseid year) noobs
    
      +---------------------------------------------------+
      | caseid   year   jobnum~r   new_hrp   tenure   hrp |
      |---------------------------------------------------|
      |     14   1979          1       237        9   237 |
      |     14   1979          2         .        .     . |
      |     14   1979          3         .        .     . |
      |     14   1979          4         .        .     . |
      |     14   1979          5         .        .     . |
      |     14   1979          6         .        .     . |
      |     14   1979          7         .        .     . |
      |     14   1979          8         .        .     . |
      |     14   1979          9         .        .     . |
      |---------------------------------------------------|
      |     14   1985          1         .        .   650 |
      |     14   1985          2         .        .     . |
      |     14   1985          3         .        .     . |
      |     14   1985          4         .        .     . |
      |     14   1985          5         .       20     . |
      |     14   1985          6         .       73     . |
      |     14   1985          7       650       41     . |
      |     14   1985          8         .        .     . |
      |     14   1985          9         .        .     . |
      |---------------------------------------------------|
      |     14   1986          1         .        .   855 |
      |     14   1986          2         .        .   650 |
      |     14   1986          3         .        .   650 |
      |     14   1986          4         .        .   400 |
      |     14   1986          5         .        .     . |
      |     14   1986          6       400       87     . |
      |     14   1986          7       650       55     . |
      |     14   1986          8       650       15     . |
      |     14   1986          9       855       14     . |
      +---------------------------------------------------+

    Comment


    • #3
      Here's a more direct approach that produces the same result as the code in post #2.
      Code:
      bysort caseid year (jobnumber): egen recent = max(cond(tenure!=.,jobnumber,.))
      bysort caseid year (jobnumber): generate reverse = tenure[1]==. & recent!=.
      generate new_hrp = hrp if !reverse
      bysort caseid year (jobnumber): replace new_hrp = hrp[recent+1-jobnumber] ///
          if reverse & !missing(hrp[recent+1-jobnumber])
      drop recent reverse
      list, sepby(caseid year) noobs
      Code:
      . list, sepby(caseid year) noobs
      
        +---------------------------------------------------+
        | caseid   year   jobnum~r   tenure   hrp   new_hrp |
        |---------------------------------------------------|
        |     14   1979          1        9   237       237 |
        |     14   1979          2        .     .         . |
        |     14   1979          3        .     .         . |
        |     14   1979          4        .     .         . |
        |     14   1979          5        .     .         . |
        |     14   1979          6        .     .         . |
        |     14   1979          7        .     .         . |
        |     14   1979          8        .     .         . |
        |     14   1979          9        .     .         . |
        |---------------------------------------------------|
        |     14   1985          1        .   650         . |
        |     14   1985          2        .     .         . |
        |     14   1985          3        .     .         . |
        |     14   1985          4        .     .         . |
        |     14   1985          5       20     .         . |
        |     14   1985          6       73     .         . |
        |     14   1985          7       41     .       650 |
        |     14   1985          8        .     .         . |
        |     14   1985          9        .     .         . |
        |---------------------------------------------------|
        |     14   1986          1        .   855         . |
        |     14   1986          2        .   650         . |
        |     14   1986          3        .   650         . |
        |     14   1986          4        .   400         . |
        |     14   1986          5        .     .         . |
        |     14   1986          6       87     .       400 |
        |     14   1986          7       55     .       650 |
        |     14   1986          8       15     .       650 |
        |     14   1986          9       14     .       855 |
        +---------------------------------------------------+

      Comment


      • #4
        Thank you so much William, that helps a lot!

        It just made me think of a follow-up question as well: How do amend that code to ensure that missing values for tenure don't mix up my order. I have other variables, which are in the same order as tenure, so I would like to make sure that the code works even if one value of that precise variable is missing.

        Not sure if necessary again, but just in case I've added the data below and amended it so that there is now also hrspweek which is in the same order as tenure and I've deleted the last tenure observation, so that the old code would return the hrp values a line too high.

        I assume you could add an or("|") statement to the first line of code, but I'm not sure how one would go about amending the second.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(caseid year) byte jobnumber int(tenure hrspweek) long hrp
        14 1979 1  9 10 237
        14 1979 2  .  .   .
        14 1979 3  .  .   .
        14 1979 4  .  .   .
        14 1979 5  .  .   .
        14 1979 6  .  .   .
        14 1979 7  .  .   .
        14 1979 8  .  .   .
        14 1979 9  .  .   .
        14 1985 1  .  . 650
        14 1985 2  .  .   .
        14 1985 3  .  .   .
        14 1985 4  .  .   .
        14 1985 5 20 12   .
        14 1985 6 73 10   .
        14 1985 7 41 18   .
        14 1985 8  .  .   .
        14 1985 9  .  .   .
        14 1986 1  .  . 855
        14 1986 2  .  . 650
        14 1986 3  .  . 650
        14 1986 4  .  . 400
        14 1986 5  .  .   .
        14 1986 6 87 10   .
        14 1986 7 55 20   .
        14 1986 8 15 30   .
        14 1986 9  . 36   .
        end
        label values tenure vlX0155800
        label values hrspweek vlX0177100

        Comment


        • #5
          Let me start by thanking you for the presentations of sample data. They exercise the code nicely.

          It's probably easiest to generate one more variable to keep track of whether an observation is "good" in that at least one of a list of variables (tenure and hrspweek in this example) is non-missing. That way, only one line changes if you add more variables to the list.
          Code:
          egen good = rownonmiss(tenure hrspweek)
          bysort caseid year (jobnumber): egen recent = max(cond(good,jobnumber,.))
          bysort caseid year (jobnumber): generate reverse = good[1]==0 & recent!=.
          generate new_hrp = hrp if !reverse
          bysort caseid year (jobnumber): replace new_hrp = hrp[recent+1-jobnumber] ///
              if reverse & !missing(hrp[recent+1-jobnumber])
          drop good recent reverse
          list, sepby(caseid year) noobs
          Code:
          . list, sepby(caseid year) noobs
          
            +--------------------------------------------------------------+
            | caseid   year   jobnum~r   tenure   hrspweek   hrp   new_hrp |
            |--------------------------------------------------------------|
            |     14   1979          1        9         10   237       237 |
            |     14   1979          2        .          .     .         . |
            |     14   1979          3        .          .     .         . |
            |     14   1979          4        .          .     .         . |
            |     14   1979          5        .          .     .         . |
            |     14   1979          6        .          .     .         . |
            |     14   1979          7        .          .     .         . |
            |     14   1979          8        .          .     .         . |
            |     14   1979          9        .          .     .         . |
            |--------------------------------------------------------------|
            |     14   1985          1        .          .   650         . |
            |     14   1985          2        .          .     .         . |
            |     14   1985          3        .          .     .         . |
            |     14   1985          4        .          .     .         . |
            |     14   1985          5       20         12     .         . |
            |     14   1985          6       73         10     .         . |
            |     14   1985          7       41         18     .       650 |
            |     14   1985          8        .          .     .         . |
            |     14   1985          9        .          .     .         . |
            |--------------------------------------------------------------|
            |     14   1986          1        .          .   855         . |
            |     14   1986          2        .          .   650         . |
            |     14   1986          3        .          .   650         . |
            |     14   1986          4        .          .   400         . |
            |     14   1986          5        .          .     .         . |
            |     14   1986          6       87         10     .       400 |
            |     14   1986          7       55         20     .       650 |
            |     14   1986          8       15         30     .       650 |
            |     14   1986          9        .         36     .       855 |
            +--------------------------------------------------------------+

          Comment


          • #6
            Thank you so much William, that helps a ton. I appreciate it.

            Comment

            Working...
            X