Reshape takes really long

Marc Pelow

Join Date: Jul 2021

Posts: 85
#1

Reshape takes really long

31 Mar 2022, 09:20

Hi,

I am using -reshape for the first time and did stumble on a problem that I couldn't solve yet.

My data has the following form:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float Date double(ID130399 ID130517 ID130568 ID130581 ID130P86) 13879 5.25 69.81 25.24 12.06 9.88 13880 5.25 69.81 25.24 12.06 9.88 13881 5.25 70.13 24.64 12.06 10.63 13884 5.5 68.75 24.05 12 11.75 13885 5.13 69.75 23.81 12 11.44 13886 5.25 69.88 23.81 12 11.06 13887 5.13 68.69 25.24 11.94 11.13 13888 5.13 67.44 24.32 12 10.63 13891 5.13 67.19 24.58 12 10 13892 5.13 67.06 25.03 12.31 10.5 end format %tdDD/NN/CCYY Date

Each variable represents a company with its share price at a given point in time.

I aim to get a structure that looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str8 ID str9 Date double Price "ID937433" "9/23/2008" 10.5 "ID937433" "9/24/2008" 11.5 "ID937433" "9/25/2008" 12.5 "ID937433" "9/26/2008" 9 "ID314883" "9/23/2008" 8 "ID314883" "9/24/2008" 8 "ID314883" "9/25/2008" 5 "ID314883" "9/26/2008" 10 end

I used the following command

Code:

reshape long ID, i(Date) j(Company) string //Var names may contain letters rename ID Price

As I use reshape for the first time I would like to ask whether this code is correct for what I am trying to achieve. The result looks good to me based on a small subsample but as my final dataset is very large I just would like someone to confirm what I have just done.

My second question relates to the performance of reshape. My not-reshaped dataset has around 1300 variables with 5,500 observations each.
I had a look at https://www.statalist.org/forums/for...taking-forever
as well as https://back.nber.org/stata/efficient/reshape.html
The latter link states that reshape takes about 20 seconds for one million observations. So I am afraid that the size of my data doesn't really explain the time it takes (30 minutes and still running) to run reshape.
The second link refers to the variables names and suggest to rename them when they contain underscores or leading "0". Both isn't the case in my dataset.
Do you have any ideas what can cause reshape to take more than 30 minutes?

Thank you very much
Kind regards
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

31 Mar 2022, 09:40

Your command appears to be correct.

When -reshape-ing wide to long, I think that the execution time depends more on the number of variables than the number of observations. 1,300 variables is a lot. I am not surprised it is taking more than 30 minutes.

That said, if you would like something that does it faster, there are user-written commands that speed it up considerably. You can use -tolong-, available from SSC, or -greshape- available at github.com/mcaceresb/stata-gtools
1 like
Comment

Announcement

Reshape takes really long

Comment