Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping data

    Hi there,
    I am looking for a code to reshape my data from this:
    id Option
    A 1
    A 2
    B 1
    C 2
    C 3
    To this:
    id Option_1 Option_2 Option_3
    A 1 1 0
    B 1 0 0
    C 0 1 1
    I would greatly appreciate your help.

    Thanks!

  • #2
    This works:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 id byte option
    "A" 1
    "A" 2
    "B" 1
    "C" 2
    "C" 3
    end
    
    tab option, gen(option_)
    
    drop option
    
    collapse (sum) option_?, by(id)
    
    list
    
         +-------------------------------------+
         | id   option_1   option_2   option_3 |
         |-------------------------------------|
      1. |  A          1          1          0 |
      2. |  B          1          0          0 |
      3. |  C          0          1          1 |
         +-------------------------------------+

    as does this

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 id byte option
    "A" 1
    "A" 2
    "B" 1
    "C" 2
    "C" 3
    end
    
    clonevar which = option
    reshape wide option, i(id) j(which)
    
    forval j = 1/3 {
        replace option`j' = !missing(option`j')
    }
    Last edited by Nick Cox; 07 Feb 2022, 09:32.

    Comment


    • #3
      A less efficient code follows:
      Code:
      . g Answer=_n
      
      . order Answer, first
      
      . encode id, g(new_id)
      
      . reshape wide Answer, j(Option) i(new_id)
      (j = 1 2 3)
      
      Data                               Long   ->   Wide
      -----------------------------------------------------------------------------
      Number of observations                5   ->   3          
      Number of variables                   4   ->   5          
      j variable (3 values)            Option   ->   (dropped)
      xij variables:
                                       Answer   ->   Answer1 Answer2 Answer3
      -----------------------------------------------------------------------------
      
      
      . foreach var of varlist Answer* {
        2. replace `var'=1 if `var'<.
        3. replace `var'=0 if `var'==.
        4.  }
      (1 real change made)
      (1 real change made)
      (2 real changes made)
      (1 real change made)
      (1 real change made)
      (2 real changes made)
      
      
      . list
      
           +--------------------------------------+
           | new_id   Answer1   Answer2   Answer3 |
           |--------------------------------------|
        1. |      A         1         1         0 |
        2. |      B         1         0         0 |
        3. |      C         0         1         1 |
           +--------------------------------------+
      
      .
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Nick Cox Thank you for that. However, the table I input was just an example. My dataset includes around 8000 observations with over 500 id's. id is also a string variable. Apologies, I should have specified this in my question. Do you have any other suggestions on how I can adjust the code accordingly?

        Carlo Lazzaro I tried the code you suggested. However it does not seem to work.

        Code:
        g Group=_n
        order Group, first
        
        encode id, g(new_id)
        
        reshape wide Group, j(Option) i(new_id)
        
        
        foreach var of varlist Entry* {
          replace `var'=1 if `var'<.
          replace `var'=0 if `var'==.
           }
        The reshape leads me to have data which looks like this (example only):
        new_id Option1 Option2 Option3
        A 1 2 3
        B 4 5 6
        C 7 8 9
        Thanks

        Comment


        • #5
          Janet Lewis #4

          With regard to Nick Cox's solution in #2, I see nothing there which would make the code, or its performance, depend on the number of observations in the data set unless there is some id whose list of options extends longer than the maximum number of allowed variables in a Stata data set (minus a handful). Even in Stata BE, you get 2,048 variables; and SE allows 32,767, and MP 120,000. (And if you do run up against that limit, then your problem is unsolvable anyway because you need that many variables to hold the final results!) Nor would it make any difference whether id is string or numeric. Did you try it? It runs correctly on my setup. If it didn't work, you should post back showing the exact code you ran, and the exact output you got from Stata in the Results window (any messages), and show how the end result differed from what you wanted.

          With regard to Carlo Lazzaro's solution in #3, your code is not parallel to his. When you got to the -foreach var of varlist Entry* {- command, you referred to non-existent variables Entry*. The actual variable names at that point are Group*. So the -foreach- command breaks and complains that Entry* is not found, and that loop, which finishes the job, never gets executed. If you properly parallel Carlo's code, you will see that it works.

          Comment


          • #6
            Thanks Clyde!
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Clyde Schechter I think explained the key points. At this point I just want to underline that the occurrence of 3 in my second block of code in #3 is a response to the original example and doesn't purport to be general.

              As Clyde said, number of observations is not an issue, but you may need to rewrite the code to cope with your real dataset, and if that gets difficult, a more realistic data example would allow advice there.

              Comment


              • #8
                Dear all,

                I usually have to work with this kind of data:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(time a b c d)
                1 3 2 3 4
                2 2 1 2 3
                3 1 3 5 3
                4 3 4 1 2
                5 5 3 2 4
                6 5 2 4 1
                end


                However, I couldn't find a convenient way to reshape this data into following [panel] structure. So, I could use some help

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte time str1 id float var1
                1 "a" 3
                2 "a" 2
                3 "a" 1
                4 "a" 3
                5 "a" 5
                6 "a" 5
                1 "b" 2
                2 "b" 1
                3 "b" 3
                4 "b" 4
                5 "b" 3
                6 "b" 2
                1 "c" 3
                2 "c" 2
                3 "c" 5
                4 "c" 1
                5 "c" 2
                6 "c" 4
                1 "d" 4
                2 "d" 3
                3 "d" 3
                4 "d" 2
                5 "d" 4
                6 "d" 1
                end

                I would appreciate any help!
                Thanks!

                Comment


                • #9
                  Code:
                  rename (a b c d) var1=
                  reshape long var1, i(time) j(id) string
                  list, noobs clean

                  Comment


                  • #10
                    Thank you very much Dr. Schechter! Works like a charm!

                    Comment

                    Working...
                    X