Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape long to wide

    Hello,

    Here is the data that I want to reshape to wide format, my sample is around 7.5 million observations as each ID has about 229 observations.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long
    ID int consyear byte consmonth str15 type float consumption
    500001 2007 11 "G11A"  54.00725
    500001 2007 10 "G11A"    39.278
    500001 2007  5 "E11"       2810
    500001 2007  5 "G11A"  38.29605
    500001 2007  4 "G11A"  28.47655
    500001 2007  7 "E11"       1499
    500001 2007  3 "E11"        756
    500001 2007  3 "G11A"   72.6643
    500001 2007  1 "G11A"   96.2311
    500001 2007  1 "E11"       1745
    500001 2007  7 "G11A"   4.90975
    500001 2007  8 "E11"       5125
    500001 2007  4 "E11"        460
    500001 2007  2 "G11A"  69.71845
    500001 2007 10 "E11"       2837
    500001 2007 11 "E11"       2048
    500001 2007  2 "E11"       3785
    500001 2007  8 "G11A"  10.80145
    500001 2008  9 "E11"       2027
    500001 2008  3 "G11A"  57.03082
    500001 2008  7 "E11"       3197
    500001 2008  4 "G11A"  57.03082
    500001 2008  6 "E11"       1489
    500001 2008 10 "E11"       2719
    500001 2008  1 "E11"       4684
    500001 2008  3 "E11"       1851
    500001 2008  5 "E11"       1578
    
    end

    I am trying to have the data as following but I am stuck :
    ID consyear consmonth E11 G11A

    500001 2007 1 1745 4.90975
    500001 2007 2 3785 69.71845
    500001 2007 3 756 72.6643
    500001 2007 4 460 57.03082
    500001 2007 5 2810 38.29605


    Thanks in advance




  • #2
    Code:
    reshape wide consumption, i(ID consyear consmonth) j(type) string
    rename consumption* *
    Note: This is going to be rather slow on a data set of the size you describe. You may want to install the gtools package, by Mauricioi Caceres, available from SSC. Then you can use -greshape-, which accepts the exact same syntax.

    Thank you for using -dataex-. I will point out that it got a little mangled: somehow a newline got inserted after -input long-.

    Added: The code assumes, without verification, that all of the values of the variable type are allowable variable names. That is necessary in order to use them as the names of the variables in your widened data layout. If however -reshape- finds something that violates that condition, it will halt with an error message. In that case, use the -strtoname()- function to change the values of type, and then try again. See -help strtoname()- for details.
    Last edited by Clyde Schechter; 18 Feb 2022, 16:08.

    Comment


    • #3
      Thank you, Clyde, for your prompt and informative reply.
      I was able to reshape the data using the command you suggested.

      Comment

      Working...
      X