Reshaping data

Janet Lewis

Join Date: Mar 2020

Posts: 45
#1

Reshaping data

07 Feb 2022, 09:17

Hi there,
I am looking for a code to reshape my data from this:
id Option

A 1

A 2

B 1

C 2

C 3

To this:

id Option_1 Option_2 Option_3

A 1 1 0

B 1 0 0

C 0 1 1

I would greatly appreciate your help.

Thanks!
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35213

07 Feb 2022, 09:27

This works:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 id byte option
"A" 1
"A" 2
"B" 1
"C" 2
"C" 3
end

tab option, gen(option_)

drop option

collapse (sum) option_?, by(id)

list

     +-------------------------------------+
     | id   option_1   option_2   option_3 |
     |-------------------------------------|
  1. |  A          1          1          0 |
  2. |  B          1          0          0 |
  3. |  C          0          1          1 |
     +-------------------------------------+

as does this

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 id byte option
"A" 1
"A" 2
"B" 1
"C" 2
"C" 3
end

clonevar which = option
reshape wide option, i(id) j(which)

forval j = 1/3 {
    replace option`j' = !missing(option`j')
}

Last edited by Nick Cox; 07 Feb 2022, 09:32.

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17608

07 Feb 2022, 09:47

A less efficient code follows:

Code:

. g Answer=_n

. order Answer, first

. encode id, g(new_id)

. reshape wide Answer, j(Option) i(new_id)
(j = 1 2 3)

Data                               Long   ->   Wide
-----------------------------------------------------------------------------
Number of observations                5   ->   3          
Number of variables                   4   ->   5          
j variable (3 values)            Option   ->   (dropped)
xij variables:
                                 Answer   ->   Answer1 Answer2 Answer3
-----------------------------------------------------------------------------


. foreach var of varlist Answer* {
  2. replace `var'=1 if `var'<.
  3. replace `var'=0 if `var'==.
  4.  }
(1 real change made)
(1 real change made)
(2 real changes made)
(1 real change made)
(1 real change made)
(2 real changes made)


. list

     +--------------------------------------+
     | new_id   Answer1   Answer2   Answer3 |
     |--------------------------------------|
  1. |      A         1         1         0 |
  2. |      B         1         0         0 |
  3. |      C         0         1         1 |
     +--------------------------------------+

.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Janet Lewis

Join Date: Mar 2020

Posts: 45
#4

07 Feb 2022, 22:29

Nick Cox Thank you for that. However, the table I input was just an example. My dataset includes around 8000 observations with over 500 id's. id is also a string variable. Apologies, I should have specified this in my question. Do you have any other suggestions on how I can adjust the code accordingly?

Carlo Lazzaro I tried the code you suggested. However it does not seem to work.

Code:

g Group=_n order Group, first encode id, g(new_id) reshape wide Group, j(Option) i(new_id) foreach var of varlist Entry* { replace `var'=1 if `var'<. replace `var'=0 if `var'==. }

The reshape leads me to have data which looks like this (example only):
new_id Option1 Option2 Option3

A 1 2 3

B 4 5 6

C 7 8 9

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#5

07 Feb 2022, 22:46

Janet Lewis #4

With regard to Nick Cox's solution in #2, I see nothing there which would make the code, or its performance, depend on the number of observations in the data set unless there is some id whose list of options extends longer than the maximum number of allowed variables in a Stata data set (minus a handful). Even in Stata BE, you get 2,048 variables; and SE allows 32,767, and MP 120,000. (And if you do run up against that limit, then your problem is unsolvable anyway because you need that many variables to hold the final results!) Nor would it make any difference whether id is string or numeric. Did you try it? It runs correctly on my setup. If it didn't work, you should post back showing the exact code you ran, and the exact output you got from Stata in the Results window (any messages), and show how the end result differed from what you wanted.

With regard to Carlo Lazzaro's solution in #3, your code is not parallel to his. When you got to the -foreach var of varlist Entry* {- command, you referred to non-existent variables Entry*. The actual variable names at that point are Group*. So the -foreach- command breaks and complains that Entry* is not found, and that loop, which finishes the job, never gets executed. If you properly parallel Carlo's code, you will see that it works.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17608
#6

08 Feb 2022, 01:53

Thanks Clyde!

Kind regards,
Carlo
(StataNow 18.5)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#7

08 Feb 2022, 03:31

Clyde Schechter I think explained the key points. At this point I just want to underline that the occurrence of 3 in my second block of code in #3 is a response to the original example and doesn't purport to be general.

As Clyde said, number of observations is not an issue, but you may need to rewrite the code to cope with your real dataset, and if that gets difficult, a more realistic data example would allow advice there.
1 like
Comment

Samil Pismaf

Join Date: Mar 2022
Posts: 2

11 Mar 2022, 08:58

Dear all,

I usually have to work with this kind of data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(time a b c d)
1 3 2 3 4
2 2 1 2 3
3 1 3 5 3
4 3 4 1 2
5 5 3 2 4
6 5 2 4 1
end

However, I couldn't find a convenient way to reshape this data into following [panel] structure. So, I could use some help

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte time str1 id float var1
1 "a" 3
2 "a" 2
3 "a" 1
4 "a" 3
5 "a" 5
6 "a" 5
1 "b" 2
2 "b" 1
3 "b" 3
4 "b" 4
5 "b" 3
6 "b" 2
1 "c" 3
2 "c" 2
3 "c" 5
4 "c" 1
5 "c" 2
6 "c" 4
1 "d" 4
2 "d" 3
3 "d" 3
4 "d" 2
5 "d" 4
6 "d" 1
end

I would appreciate any help!
Thanks!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#9

11 Mar 2022, 12:06

Code:

rename (a b c d) var1= reshape long var1, i(time) j(id) string list, noobs clean
1 like
Comment
Samil Pismaf

Join Date: Mar 2022

Posts: 2
#10

14 Mar 2022, 03:39

Thank you very much Dr. Schechter! Works like a charm!
Comment

id	Option_1	Option_2	Option_3
A	1	1	0
B	1	0	0
C	0	1	1

new_id	Option1	Option2	Option3
A	1	2	3
B	4	5	6
C	7	8	9

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment