Hi,
I am attempting to merge automotive retail sales data with automotive service data so that I can see how buyers act post purchase. The only unique identifier between the service data and the sales data is the VIN. The issue is that many of the buyers/owners have come in for service on more than one occasion so there are multiple records with the same VIN.
I can (and already have) performed some analysis by removing duplicate VINs and drawing some basic insights into those who have come in for service (at least once) and those who have not come in at all.
However, I really would like to re-organize the data so that I can get more clear insights into those who have come in for service (ie. avg number of service visits (conditional on numerous factors). the amount of time between their visits (conditional on numerous factors), mileage between services, etc. etc. etc.). Further, this would allow me to run logistic regressions estimating odds of coming in for a next service based on difference variables.
Currently the data is organized vertically, so that it is all within a few columns. I would like to change this structure so that second service visits (and third, fourth, fifth, etc. etc.) and their corresponding information are moved to new columns and in a row that corresponds to the VIN.
Below is a basic example of what I would like to accomplish:
I want to turn this:
Into This:
Thank you in advance of any help you can provide!
Best,
Eric
I am attempting to merge automotive retail sales data with automotive service data so that I can see how buyers act post purchase. The only unique identifier between the service data and the sales data is the VIN. The issue is that many of the buyers/owners have come in for service on more than one occasion so there are multiple records with the same VIN.
I can (and already have) performed some analysis by removing duplicate VINs and drawing some basic insights into those who have come in for service (at least once) and those who have not come in at all.
However, I really would like to re-organize the data so that I can get more clear insights into those who have come in for service (ie. avg number of service visits (conditional on numerous factors). the amount of time between their visits (conditional on numerous factors), mileage between services, etc. etc. etc.). Further, this would allow me to run logistic regressions estimating odds of coming in for a next service based on difference variables.
Currently the data is organized vertically, so that it is all within a few columns. I would like to change this structure so that second service visits (and third, fourth, fifth, etc. etc.) and their corresponding information are moved to new columns and in a row that corresponds to the VIN.
Below is a basic example of what I would like to accomplish:
I want to turn this:
veh_id | mileage | service_date |
ABC | 8000 | 14Jan14 |
DEF | 7500 | 15Jan14 |
GHI | 6900 | 19Jan14 |
JKL | 8200 | 22Jan14 |
MNO | 7900 | 30Jan14 |
ABC | 14000 | 23Jun14 |
DEF | 13000 | 28Aug14 |
GHI | 14150 | 23Jul14 |
JKL | 16200 | 27May14 |
MNO | 13625 | 6Nov14 |
ABC | 20000 | 30Nov14 |
DEF | 18500 | 10Apr15 |
GHI | 21400 | 24Jan15 |
JKL | 24200 | 29Sep14 |
MNO | 19350 | 13Aug15 |
Into This:
veh_id | mileage1 | service_date1 | mileage2 | service_date2 | Mileage3 | service_date3 |
ABC | 8000 | 14Jan14 | 14000 | 23Jun14 | 20000 | 30Nov14 |
DEF | 7500 | 15Jan14 | 13000 | 28Aug14 | 18500 | 10Apr15 |
GHI | 6900 | 19Jan14 | 14150 | 23Jul14 | 21400 | 24Jan15 |
JKL | 8200 | 22Jan14 | 16200 | 27May14 | 24200 | 29Sep14 |
MNO | 7900 | 30Jan14 | 13625 | 6Nov14 | 19350 | 13Aug15 |
Thank you in advance of any help you can provide!
Best,
Eric
Comment