I have a dataset that has information on different age 30 earnings among siblings. I want to generate a "family_id" variable that is stable over time. The dataset has a family_id variable that can be used to see siblings in a given year, but the family ID changes every year. I can only observe people in this dataset between ages 6 and 17.
The dataset looks like this:
I want the dataset to end up looking like this:
That is, I want to create a "family" variable that is a string of oldest-sibling/next-oldest-sibling/next-oldest sibling.
Does anyone know how I could do this? If so, it would be much appreciated!
The dataset looks like this:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str13 name str9 id int year str8 family_id_in_year byte age long age30_income "Mike Johnson" "929371031" 1993 "10392842" 15 30000 "Mike Johnson" "929371031" 1994 "13928401" 16 30000 "Sally Johnson" "918302912" 1994 "13928401" 6 50000 "Mike Johnson" "929371031" 1995 "38103927" 17 30000 "Sally Johnson" "918302912" 1995 "38103927" 7 50000 "Jane Johnson" "917374820" 1995 "38103927" 6 40000 "Sally Johnson" "918302912" 1996 "23145565" 8 50000 "Jane Johnson" "917374820" 1996 "23145565" 7 40000 "Tyler Bates" "910393920" 1996 "10392911" 10 23000 end
I want the dataset to end up looking like this:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str13 name str9 id byte sibling_position str9(sibling1_id sibling2_id sibling3_id) str29 family long age30_income "Mike Johnson" "929371031" 1 "929371031" "918302912" "917374820" "929371031-918302912-917374820" 30000 "Sally Johnson" "918302912" 2 "929371031" "918302912" "917374820" "929371031-918302912-917374820" 50000 "Jane Johnson" "917374820" 3 "929371031" "918302912" "917374820" "929371031-918302912-917374820" 40000 "Tyler Bates" "910393920" 1 "910393920" "" "" "910393920" 23000 end
Does anyone know how I could do this? If so, it would be much appreciated!
Comment