Panel Data formatting

Will Hickman

Join Date: Mar 2022

Posts: 2
#1

Panel Data formatting

30 Mar 2022, 08:32

Hi!
I am trying to work with some panel data and am having some issues getting it into the format I would like. The data currently looks like:

state_id,state,transportation_type,date21927,date2 1928,date21929
1,Alabama,driving,100,102.9,103.51
1,Alabama,walking,100,99.93,100.28
1,Alabama,transit,100,105.66,105.66
2,Alaska,driving,100,104.69,104.64
2,Alaska,walking,100,100.7,99.12
2,Alaska,transit,100,87.49,84.6

Note that: dateXXXXX is an index of Apple Maps usage for a particular type of transportation (driving, walking, transit) on a particular day.

and I want it to look like: (I can do the transformation by hand, but the dataset is large and doing it by hand isn't feasible)
state_id,state,day,driving,walking,transit
1,Alabama,21927,100,100,100
1,Alabama,21928,102.9,99.93,105.66
1,Alabama,21929,103.51,100.28,105.66
2,Alaska,21927,100,100,100
2,Alaska,21928,104.69,100.7,87.49
2,Alaska,21929,104.64,99.12,84.6

I have used reshape long date, i(state_id transportation_type) j(day), but this gives me 3 observations per state (each observation has the transportation_type and the transportation index for a particular day, so with 3 types of transportation, there are 3 observations per state per day).

reshape long date, i(state_id) j(day) does not work because state_id is not unique to each observation

If anyone could help me out with this, I would appreciate it!

-William Hickman
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

30 Mar 2022, 08:54

It isn't a unique ID because you have different indices. driving,walking,transit also ID your state observations in your wide dataset.

EDIT: I can help further, but please use dataex to show me your real data. Welcome to Statalist, Will.

Last edited by Jared Greathouse; 30 Mar 2022, 08:57.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

30 Mar 2022, 09:36

In this case, you need two reshape commands - one to get from wide-by-date to long and another to get from long to wide-by-transporation_type.

My presentation of your example data using dataex may not match the actual characteristics of your example data. If your transportation_type is a numeric variable with value labels, the code will require adaptation. Hence the importance of using dataex to present example data. So please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte state_id str7(state transportation_type) byte date21927 float(date21928 date21929)
1 "Alabama" "driving" 100  102.9 103.51
1 "Alabama" "walking" 100  99.93 100.28
1 "Alabama" "transit" 100 105.66 105.66
2 "Alaska"  "driving" 100 104.69 104.64
2 "Alaska"  "walking" 100  100.7  99.12
2 "Alaska"  "transit" 100  87.49   84.6
end

reshape long date, i(state_id transportation_type) j(day)
reshape wide date, i(state_id day) j(transportation_type) string
rename (date*) (*)
order state_id state day
list, sepby(state_id)

Code:

. list, sepby(state_id)

     +----------------------------------------------------------+
     | state_id     state     day   driving   transit   walking |
     |----------------------------------------------------------|
  1. |        1   Alabama   21927       100       100       100 |
  2. |        1   Alabama   21928     102.9    105.66     99.93 |
  3. |        1   Alabama   21929    103.51    105.66    100.28 |
     |----------------------------------------------------------|
  4. |        2    Alaska   21927       100       100       100 |
  5. |        2    Alaska   21928    104.69     87.49     100.7 |
  6. |        2    Alaska   21929    104.64      84.6     99.12 |
     +----------------------------------------------------------+

Comment

Will Hickman

Join Date: Mar 2022

Posts: 2
#4

30 Mar 2022, 09:47

This code worked (with some minor modifications that likely wouldn't have been needed if I had used dataex)! Thank you for your help. In the future, I'll be sure to use dataex and pay attention to the details. Thanks again!
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#5

30 Mar 2022, 21:03

Are we essentially tricking Stata into thinking that a wide dataset is long?

As it stands, we've reshaped to wide, but.... it looks like a long dataset, since time and panel variables ID are both columns, instead of one of either being a column and the other a row. William Lisowski
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

31 Mar 2022, 08:53

Are we essentially tricking Stata into thinking that a wide dataset is long?

There is no absolute standard for a dataset to be "wide" or "long" - the reshape commands would more pedantically be reshape wider and reshape longer to better reflect that the resulting dataset is described relative to the input dataset.

In post #2 observations in the initial dataset are identified by state_id and transportation_type, and the index values are contained in separate variables whose names identify the day they were observed.

After the reshape longer the observations are identified by state_id, transportation_type, and day - and we have a panel dataset where the panels are identified by state_id and transportation_type, and the time is identified by day, and the index values are contained in a single variable.

After the reshape wider the observations are identifed by state_id and day - and we now have a panel dataset where the panels are identified by state_id, and the time is identified by day, and the index values are contained in separate variables whose names identify the transportation_type they represent.
Comment

Announcement