Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape takes really long

    Hi,

    I am using -reshape for the first time and did stumble on a problem that I couldn't solve yet.

    My data has the following form:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float Date double(ID130399 ID130517 ID130568 ID130581 ID130P86)
    13879 5.25 69.81 25.24 12.06  9.88
    13880 5.25 69.81 25.24 12.06  9.88
    13881 5.25 70.13 24.64 12.06 10.63
    13884  5.5 68.75 24.05    12 11.75
    13885 5.13 69.75 23.81    12 11.44
    13886 5.25 69.88 23.81    12 11.06
    13887 5.13 68.69 25.24 11.94 11.13
    13888 5.13 67.44 24.32    12 10.63
    13891 5.13 67.19 24.58    12    10
    13892 5.13 67.06 25.03 12.31  10.5
    end
    format %tdDD/NN/CCYY Date

    Each variable represents a company with its share price at a given point in time.

    I aim to get a structure that looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 ID str9 Date double Price
    "ID937433" "9/23/2008" 10.5
    "ID937433" "9/24/2008" 11.5
    "ID937433" "9/25/2008" 12.5
    "ID937433" "9/26/2008"    9
    "ID314883" "9/23/2008"    8
    "ID314883" "9/24/2008"    8
    "ID314883" "9/25/2008"    5
    "ID314883" "9/26/2008"   10
    end
    I used the following command

    Code:
    reshape long ID, i(Date) j(Company) string //Var names may contain letters
    rename ID Price
    As I use reshape for the first time I would like to ask whether this code is correct for what I am trying to achieve. The result looks good to me based on a small subsample but as my final dataset is very large I just would like someone to confirm what I have just done.

    My second question relates to the performance of reshape. My not-reshaped dataset has around 1300 variables with 5,500 observations each.
    I had a look at https://www.statalist.org/forums/for...taking-forever
    as well as https://back.nber.org/stata/efficient/reshape.html
    The latter link states that reshape takes about 20 seconds for one million observations. So I am afraid that the size of my data doesn't really explain the time it takes (30 minutes and still running) to run reshape.
    The second link refers to the variables names and suggest to rename them when they contain underscores or leading "0". Both isn't the case in my dataset.
    Do you have any ideas what can cause reshape to take more than 30 minutes?

    Thank you very much
    Kind regards

  • #2
    Your command appears to be correct.

    When -reshape-ing wide to long, I think that the execution time depends more on the number of variables than the number of observations. 1,300 variables is a lot. I am not surprised it is taking more than 30 minutes.

    That said, if you would like something that does it faster, there are user-written commands that speed it up considerably. You can use -tolong-, available from SSC, or -greshape- available at github.com/mcaceresb/stata-gtools

    Comment

    Working...
    X