Vehicle tracking with panel dataset: Finding the difference between years.

Adam Gallaher

Join Date: Feb 2021

Posts: 41
#1

Vehicle tracking with panel dataset: Finding the difference between years.

21 Feb 2022, 09:58

Hello, I believe there is a similar post out there related to first-difference of variable in panel data. I have a similar inquire with some slight differences, I am hoping to get some help with this.

I am looking to track which vehicles are entering and leaving my dataset from year to year within the same zip code (ZCTA). Basically, this is the question I am trying to answer. Within this zip code at year 2015 there are 10 specific vehicles (let's say they are a 2012 Tesla Model S), the next year (2016) that same zip code now has 15 vehicles (2012 Tesla Model S). From 2015 to 2016 there was a gain of 5 vehicles. The idea is this will go up and down for specific vehicles over different zip codes throughout time. As you can see from the dataset, in 2015 there is 2 2012 Nissan Leaf's and then again 2 in 2016 and then -1 in 2017 and so on. I am looking to generate a variable that captures this in a way that is generic for all zip codes in the dataset. Below is an example of my dataset. Thank you in advance for your help and suggestions.

dataex datayear zcta CT_Vehicle_ID total in 1/30

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(datayear zcta) str53 CT_Vehicle_ID double total 2015 6029 "nissan,leaf,2012" 2 2015 6029 "nissan,leaf,2015" 1 2015 6029 "tesla,model s,2014" 1 2016 6029 "nissan,leaf,2015" 2 2016 6029 "tesla,model s,2015" 1 2016 6029 "nissan,leaf,2012" 2 2016 6029 "tesla,model s,2014" 1 2017 6029 "nissan,leaf,2013" 1 2017 6029 "nissan,leaf,2012" 1 2017 6029 "nissan,leaf,2015" 3 2017 6029 "tesla,model s,2015" 1 2018 6029 "chevrolet,bolt ev,2017" 4 2018 6029 "kia,soul ev,2017" 1 2018 6029 "tesla,model s,2016" 1 2018 6029 "nissan,leaf,2015" 3 2018 6029 "nissan,leaf,2013" 1 2018 6029 "nissan,leaf,2017" 2 2018 6029 "tesla,model s,2015" 1 2019 6029 "nissan,leaf,2017" 2 2019 6029 "tesla,model s,2018" 1 2019 6029 "tesla,model s,2017" 1 2019 6029 "tesla,model s,2015" 1 2019 6029 "chevrolet,bolt ev,2017" 5 2019 6029 "tesla,model x,2018" 1 2019 6029 "tesla,model s,2016" 1 2019 6029 "tesla,model 3,2018" 4 2019 6029 "nissan,leaf,2015" 3 2020 6029 "tesla,model x,2018" 1 2020 6029 "tesla,model s,2018" 1 2020 6029 "tesla,model 3,2019" 8 end

------------------ copy up to and including the previous line ------------------

Listed 30 out of 40 observations

Thank you,

Adam
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#2

21 Feb 2022, 10:10

Code:

by zcta CT_Vehicle_ID (datayear), sort: gen wanted = total - total[_n-1]

In your example data, there are no gaps in years. It is not clear what you want to do if the full data set has some gaps. The above code will simply show the change from the most recent preceding year for which data is present, even if that were several years earlier. If, however, you want to show a missing value when there is no data for the immediately preceding year, you could modify that code by adding -if datayear[_n-1] == datayear-1- to the command.
1 like
Comment
Adam Gallaher

Join Date: Feb 2021

Posts: 41
#3

21 Feb 2022, 12:26

Hello Clyde,

Thank you for this, it works just how I wanted. I added the -if datayear[_n-1] == datayear-1- to the command as well, because as you correctly suspected this would correct for vehicles that enter the dataset later. The full data runs from 2013 to 2020 and as you can see a lot of the vehicles in my example are newer ones that were not present for the entire study time.

Thank you again.

Adam
Comment

Adam Gallaher

Join Date: Feb 2021
Posts: 41

01 Mar 2022, 12:00

Back again with another question. Similar to before only in reverse. I am looking to determine the cumulative number of vehicles per zip code per month per year in this dataset. I attempted to alter the solution code above with some success. The issue is that it is looking at the difference or summation between each row in the panel and not adding them one after the next. Included is an example of the dataset and the field with the altered above solution (by zip_code year (month), sort: gen wanted = electric_vehicles + electric_vehicles[_n+1])

Thank you,

Adam

dataex zip_code month year electric_vehicles wanted in 1/50

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int zip_code byte month int year byte electric_vehicles float wanted
6001 12 2015 1 .
6001  4 2016 2 3
6001  6 2016 1 2
6001  7 2016 1 2
6001  8 2016 1 2
6001  9 2016 1 4
6001 11 2016 3 .
6001  1 2017 1 2
6001  3 2017 1 3
6001  4 2017 2 6
6001  5 2017 4 5
6001  6 2017 1 2
6001  7 2017 1 2
6001 10 2017 1 6
6001 12 2017 5 .
6001  1 2018 1 2
6001  3 2018 1 8
6001  4 2018 7 9
6001  5 2018 2 4
6001  6 2018 2 8
6001  7 2018 6 8
6001  8 2018 2 3
6001  9 2018 1 2
6001 10 2018 1 2
6001 11 2018 1 3
6001 12 2018 2 .
6001  2 2019 1 4
6001  3 2019 3 5
6001  4 2019 2 4
6001  5 2019 2 3
6001  6 2019 1 3
6001  7 2019 2 4
6001  9 2019 2 4
6001 10 2019 2 .
6001  2 2020 1 2
6001  9 2020 1 3
6001 12 2020 2 .
6002  8 2015 2 .
6002  1 2016 1 4
6002  3 2016 3 4
6002  4 2016 1 2
6002 10 2016 1 2
6002 11 2016 1 .
6002  3 2017 2 3
6002  5 2017 1 4
6002  8 2017 3 4
6002 11 2017 1 2
6002 12 2017 1 .
6002  3 2018 1 3
6002  7 2018 2 3
end

------------------ copy up to and including the previous line ------------------

Listed 50 out of 4343 observations

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#5

01 Mar 2022, 12:12

Similar to before only in reverse. I am looking to determine the cumulative number of vehicles per zip code per month per year in this dataset. I attempted to alter the solution code above with some success. The issue is that it is looking at the difference or summation between each row in the panel and not adding them one after the next. Included is an example of the dataset and the field with the altered above solution (by zip_code year (month), sort: gen wanted = electric_vehicles + electric_vehicles[_n+1])

Sorry, I don't understand what you want here. So I also don't understand how what you want is different from what your code produces.
Comment
Adam Gallaher

Join Date: Feb 2021

Posts: 41
#6

01 Mar 2022, 14:36

Perhaps the code is not the best way to describe what I am looking for. In the example dataset above, we have zip code, month, year, electric vehicles (or the number of rebates for incentives which can be thought of as electric vehicle sales), and the (wanted) variable. You can see that for 2015 in zip code 6001 there is 1 electric vehicle. We then move to 2016, same zip code in month 4 we have 2 electric vehicles (so technically three overall), in month 6 we have 1 electric vehicle so that is 3 overall in 2016 and 4 overall in the dataset, this will continue. What I am looking to have is for each zip code and each year the cumulative number of electric vehicle rebates (sales) for the entire dataset. So that by the end of the time period (2020) I have the cumulative number of EVs for that zip code for each month and year. I realize I can just collapse but I need the data to stay in the format it is in for modeling.

I hope this helps clarify what I am looking for.

Thank you,

Adam
Comment
Adam Gallaher

Join Date: Feb 2021

Posts: 41
#7

01 Mar 2022, 15:22

Hello,

After searching google and statalist more I found a similar problem with the solution I was looking for. Thank you for your time and help!

The solution code is below:

bysort zip_code year (month): gen cumulative_evs = sum(electric_vehicles)

Adam
Comment

Announcement