Hi
I'm attempting to clean and organise a large set of data that has been presented in an unusual way, i've included an example. It's slightly difficult to explain, thanks for your help!
I have an identifying code for a person "identity", and multiple rows of data for each person which are shown by the "count" variable. For each person there is an "event" code. I would like to fill in the missing data in the event column, using the event code which corresponds to the same person.
Each "event" is unique to one person (you'll never have the same event code for a different person), but a person can have more than one event (more than one event code corresponding to a person).
If they have more than one event all the remaining data for the corresponding identity should be duplicated.
The position of the event code should also be noted as an additional variable. To help make it clearer i've also included an example of the outcome i'm looking for!
So it should go on to look like the data below.
Thank you.
I'm attempting to clean and organise a large set of data that has been presented in an unusual way, i've included an example. It's slightly difficult to explain, thanks for your help!
I have an identifying code for a person "identity", and multiple rows of data for each person which are shown by the "count" variable. For each person there is an "event" code. I would like to fill in the missing data in the event column, using the event code which corresponds to the same person.
Each "event" is unique to one person (you'll never have the same event code for a different person), but a person can have more than one event (more than one event code corresponding to a person).
If they have more than one event all the remaining data for the corresponding identity should be duplicated.
The position of the event code should also be noted as an additional variable. To help make it clearer i've also included an example of the outcome i'm looking for!
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(event count) str8 identity float(var4 var5) . 1 "g19937g" 37 67 . 2 "g19937g" 54 3832 256789 3 "g19937g" 23 653 . 4 "g19937g" 34 3 . 5 "g19937g" 35 45 . 6 "g19937g" 14 6 347652 1 "d18793" 75 67 . 2 "d18793" 56 34 . 3 "d18793" 23 1 . 4 "d18793" 55 568 . 6 "d18793" 87 134 456736 7 "d18793" 3 . . 8 "d18793" 58 76 . 9 "d18793" 2337 567 . 1 "e34863" 23 223 . 2 "e34863" 14 787 . 3 "e34863" 76 342 156782 4 "e34863" 745 753 . 5 "e34863" 135 26 end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(newevent event count) str8 identity float(var4 var5) 0 256789 1 "g19937g" 37 67 0 256789 2 "g19937g" 54 3832 1 256789 3 "g19937g" 23 653 0 256789 4 "g19937g" 34 3 0 256789 5 "g19937g" 35 45 0 256789 6 "g19937g" 14 6 1 347652 1 "d18793" 75 67 0 347652 2 "d18793" 56 34 0 347652 3 "d18793" 23 1 0 347652 4 "d18793" 55 568 0 347652 6 "d18793" 87 134 0 347652 7 "d18793" 3 . 0 347652 8 "d18793" 58 76 0 347652 9 "d18793" 2337 567 0 456736 1 "d18793" 75 67 0 456736 2 "d18793" 56 34 0 456736 3 "d18793" 23 1 0 456736 4 "d18793" 55 568 0 456736 6 "d18793" 87 134 1 456736 7 "d18793" 3 . 0 456736 8 "d18793" 58 76 0 456736 9 "d18793" 2337 567 0 156782 1 "e34863" 23 223 0 156782 2 "e34863" 14 787 0 156782 3 "e34863" 76 342 1 156782 4 "e34863" 745 753 0 156782 5 "e34863" 135 26 end
Comment