Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape is taking forever

    Dear Stata Users,
    I have the following question: I need to reshape a dataset using the reshape long command. The variables have the following name structure: caj01_04252010 where the term 01 refers to the number of the variable (I have 12 in total) and the sequence of numbers after "_" is a date (for this particular example we have 25-April_2010). Of course I have a unique id for each one of the subjects in the survey. The data was collected 3 times per month during 15 years (on the same days every month). The problem is that Stata is taking hours and hours to perform this task (reshape long), by now 10 hours and it has not finished and I don't know how much longer it is going to take.

    10 hours ago I thought that, probably, it would be a good idea to split the job between two computers, so I divided the data set in four different subsets (caj01-caj03 in one subset, caj04-caj07 in another and so on) and made each computer to reshape two of those subsets (in order to merge those subsets later), however it did not work. Is there any advice about how to make Stata to work faster?

    Thanks,
    Diego
    Last edited by Diego Salazar; 17 Sep 2015, 23:04.

  • #2
    Maybe this will help:

    http://www.nber.org/stata/efficient/reshape.html
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Jorge gives good advice. In addition, it is possible that your variable names are causing problems with the reshape. If you have not done so, you should consider trying your rehape after keep in 1/10 to confirm that it is giving you the expected results on 10 observations, which it should be able to complete in a reasonable time (and if not, then there really is a problem!).

      The example below worked as expected when the variables were named x1 and x7, but renaming them with leading zeroes (as your dates have) apparently confused reshape. Adding the string option to reshape solved the problem.

      Code:
      . input id x01 x07
      
                  id        x01        x07
        1. 1 11 17
        2. 2 21 27
        3. end
      
      . reshape long x, i(id) j(var)
      (note: j = 1 7)
      (note: x1 not found)
      (note: x7 not found)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                        2   ->       4
      Number of variables                   3   ->       5
      j variable (2 values)                     ->   var
      xij variables:
                                        x1 x7   ->   x
      -----------------------------------------------------------------------------
      
      . list
      
           +--------------------------+
           | id   var   x01   x07   x |
           |--------------------------|
        1. |  1     1    11    17   . |
        2. |  1     7    11    17   . |
        3. |  2     1    21    27   . |
        4. |  2     7    21    27   . |
           +--------------------------+

      Comment


      • #4
        I guess William said is fine. just rename your variables and try again. It should not take more that 30 sec. You rename your variables as caj01 for caj01_04252010.

        Comment


        • #5
          I had a similar issue a few days ago and one of the remedies was to rename the variable by removing the underscore on the variables before you reshape.

          Comment

          Working...
          X