Flip order of one variable within nested data

John Kirk

Join Date: Apr 2017

Posts: 57
#1

Flip order of one variable within nested data

21 May 2018, 10:11

Dear Stata Community,

I have run into a problem with my dataset as it seems that some of my variables are referring to jobs in chronological order whereas others do the reverse. What I am trying to do with the example data below is have the first instance of hrp refer to the last instance of tenure. My data are referring to jobs held by various people over the years (so the combination of caseid, year, and jobnumber serves as a unique identifier). The changes should occur within people and years. To illustrate what I mean: For the first instance (caseid=14, year=1979), the data is fine as is. There is one job which is matched with the correct values of tenure and hrp. For the second instance (caseid=14, year=1984), i need the value of hrp (650) to refer to the most recent job, jobnumber=7. In the third case the same needs to happen such that the hrp levels get matched in reverse order to the most recent instances, such that 850 would appear in the last line, the two 650s in the two lines before, and the 400 would be matched with jobnumer=6.

If anybody could offer any help, that would be very much appreciated.

P.S.: The example here only contains information on one person, so as to keep the output manageable, but my data obviously contains more. So the changes are supposed to occur by caseid and year.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(caseid year) byte jobnumber int tenure long hrp 14 1979 1 9 237 14 1979 2 . . 14 1979 3 . . 14 1979 4 . . 14 1979 5 . . 14 1979 6 . . 14 1979 7 . . 14 1979 8 . . 14 1979 9 . . 14 1985 1 . 650 14 1985 2 . . 14 1985 3 . . 14 1985 4 . . 14 1985 5 20 . 14 1985 6 73 . 14 1985 7 41 . 14 1985 8 . . 14 1985 9 . . 14 1986 1 . 855 14 1986 2 . 650 14 1986 3 . 650 14 1986 4 . 400 14 1986 5 . . 14 1986 6 87 . 14 1986 7 55 . 14 1986 8 15 . 14 1986 9 14 . end label values tenure vlX0155800
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

21 May 2018, 10:31

Starting with the data in post #1, I think this code does what you want.

Code:

tempfile main
save `main', replace

bysort caseid year (jobnumber): egen recent = max(cond(tenure!=.,jobnumber,.))
bysort caseid year (jobnumber): generate reverse = tenure[1]==. & recent!=.
bysort caseid year (jobnumber): replace jobnumber = recent+1-jobnumber if reverse
drop tenure recent reverse
drop if hrp==.
rename hrp new_hrp

merge 1:1 caseid year jobnumber using `main'
sort caseid year jobnumber
drop _merge 
list, sepby(caseid year) noobs

Code:

. list, sepby(caseid year) noobs

  +---------------------------------------------------+
  | caseid   year   jobnum~r   new_hrp   tenure   hrp |
  |---------------------------------------------------|
  |     14   1979          1       237        9   237 |
  |     14   1979          2         .        .     . |
  |     14   1979          3         .        .     . |
  |     14   1979          4         .        .     . |
  |     14   1979          5         .        .     . |
  |     14   1979          6         .        .     . |
  |     14   1979          7         .        .     . |
  |     14   1979          8         .        .     . |
  |     14   1979          9         .        .     . |
  |---------------------------------------------------|
  |     14   1985          1         .        .   650 |
  |     14   1985          2         .        .     . |
  |     14   1985          3         .        .     . |
  |     14   1985          4         .        .     . |
  |     14   1985          5         .       20     . |
  |     14   1985          6         .       73     . |
  |     14   1985          7       650       41     . |
  |     14   1985          8         .        .     . |
  |     14   1985          9         .        .     . |
  |---------------------------------------------------|
  |     14   1986          1         .        .   855 |
  |     14   1986          2         .        .   650 |
  |     14   1986          3         .        .   650 |
  |     14   1986          4         .        .   400 |
  |     14   1986          5         .        .     . |
  |     14   1986          6       400       87     . |
  |     14   1986          7       650       55     . |
  |     14   1986          8       650       15     . |
  |     14   1986          9       855       14     . |
  +---------------------------------------------------+

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

21 May 2018, 10:40

Here's a more direct approach that produces the same result as the code in post #2.

Code:

bysort caseid year (jobnumber): egen recent = max(cond(tenure!=.,jobnumber,.))
bysort caseid year (jobnumber): generate reverse = tenure[1]==. & recent!=.
generate new_hrp = hrp if !reverse
bysort caseid year (jobnumber): replace new_hrp = hrp[recent+1-jobnumber] ///
    if reverse & !missing(hrp[recent+1-jobnumber])
drop recent reverse
list, sepby(caseid year) noobs

Code:

. list, sepby(caseid year) noobs

  +---------------------------------------------------+
  | caseid   year   jobnum~r   tenure   hrp   new_hrp |
  |---------------------------------------------------|
  |     14   1979          1        9   237       237 |
  |     14   1979          2        .     .         . |
  |     14   1979          3        .     .         . |
  |     14   1979          4        .     .         . |
  |     14   1979          5        .     .         . |
  |     14   1979          6        .     .         . |
  |     14   1979          7        .     .         . |
  |     14   1979          8        .     .         . |
  |     14   1979          9        .     .         . |
  |---------------------------------------------------|
  |     14   1985          1        .   650         . |
  |     14   1985          2        .     .         . |
  |     14   1985          3        .     .         . |
  |     14   1985          4        .     .         . |
  |     14   1985          5       20     .         . |
  |     14   1985          6       73     .         . |
  |     14   1985          7       41     .       650 |
  |     14   1985          8        .     .         . |
  |     14   1985          9        .     .         . |
  |---------------------------------------------------|
  |     14   1986          1        .   855         . |
  |     14   1986          2        .   650         . |
  |     14   1986          3        .   650         . |
  |     14   1986          4        .   400         . |
  |     14   1986          5        .     .         . |
  |     14   1986          6       87     .       400 |
  |     14   1986          7       55     .       650 |
  |     14   1986          8       15     .       650 |
  |     14   1986          9       14     .       855 |
  +---------------------------------------------------+

Comment

John Kirk

Join Date: Apr 2017

Posts: 57
#4

21 May 2018, 10:57

Thank you so much William, that helps a lot!

It just made me think of a follow-up question as well: How do amend that code to ensure that missing values for tenure don't mix up my order. I have other variables, which are in the same order as tenure, so I would like to make sure that the code works even if one value of that precise variable is missing.

Not sure if necessary again, but just in case I've added the data below and amended it so that there is now also hrspweek which is in the same order as tenure and I've deleted the last tenure observation, so that the old code would return the hrp values a line too high.

I assume you could add an or("|") statement to the first line of code, but I'm not sure how one would go about amending the second.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(caseid year) byte jobnumber int(tenure hrspweek) long hrp 14 1979 1 9 10 237 14 1979 2 . . . 14 1979 3 . . . 14 1979 4 . . . 14 1979 5 . . . 14 1979 6 . . . 14 1979 7 . . . 14 1979 8 . . . 14 1979 9 . . . 14 1985 1 . . 650 14 1985 2 . . . 14 1985 3 . . . 14 1985 4 . . . 14 1985 5 20 12 . 14 1985 6 73 10 . 14 1985 7 41 18 . 14 1985 8 . . . 14 1985 9 . . . 14 1986 1 . . 855 14 1986 2 . . 650 14 1986 3 . . 650 14 1986 4 . . 400 14 1986 5 . . . 14 1986 6 87 10 . 14 1986 7 55 20 . 14 1986 8 15 30 . 14 1986 9 . 36 . end label values tenure vlX0155800 label values hrspweek vlX0177100
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

21 May 2018, 11:16

Let me start by thanking you for the presentations of sample data. They exercise the code nicely.

It's probably easiest to generate one more variable to keep track of whether an observation is "good" in that at least one of a list of variables (tenure and hrspweek in this example) is non-missing. That way, only one line changes if you add more variables to the list.

Code:

egen good = rownonmiss(tenure hrspweek)
bysort caseid year (jobnumber): egen recent = max(cond(good,jobnumber,.))
bysort caseid year (jobnumber): generate reverse = good[1]==0 & recent!=.
generate new_hrp = hrp if !reverse
bysort caseid year (jobnumber): replace new_hrp = hrp[recent+1-jobnumber] ///
    if reverse & !missing(hrp[recent+1-jobnumber])
drop good recent reverse
list, sepby(caseid year) noobs

Code:

. list, sepby(caseid year) noobs

  +--------------------------------------------------------------+
  | caseid   year   jobnum~r   tenure   hrspweek   hrp   new_hrp |
  |--------------------------------------------------------------|
  |     14   1979          1        9         10   237       237 |
  |     14   1979          2        .          .     .         . |
  |     14   1979          3        .          .     .         . |
  |     14   1979          4        .          .     .         . |
  |     14   1979          5        .          .     .         . |
  |     14   1979          6        .          .     .         . |
  |     14   1979          7        .          .     .         . |
  |     14   1979          8        .          .     .         . |
  |     14   1979          9        .          .     .         . |
  |--------------------------------------------------------------|
  |     14   1985          1        .          .   650         . |
  |     14   1985          2        .          .     .         . |
  |     14   1985          3        .          .     .         . |
  |     14   1985          4        .          .     .         . |
  |     14   1985          5       20         12     .         . |
  |     14   1985          6       73         10     .         . |
  |     14   1985          7       41         18     .       650 |
  |     14   1985          8        .          .     .         . |
  |     14   1985          9        .          .     .         . |
  |--------------------------------------------------------------|
  |     14   1986          1        .          .   855         . |
  |     14   1986          2        .          .   650         . |
  |     14   1986          3        .          .   650         . |
  |     14   1986          4        .          .   400         . |
  |     14   1986          5        .          .     .         . |
  |     14   1986          6       87         10     .       400 |
  |     14   1986          7       55         20     .       650 |
  |     14   1986          8       15         30     .       650 |
  |     14   1986          9        .         36     .       855 |
  +--------------------------------------------------------------+

Comment

John Kirk

Join Date: Apr 2017

Posts: 57
#6

22 May 2018, 03:41

Thank you so much William, that helps a ton. I appreciate it.
Comment

Announcement