reshape with dummy variables for values?

Nereida GHeller

Join Date: Jul 2024
Posts: 2

reshape with dummy variables for values?

04 Jul 2024, 06:58

Hi there, thanks all in advance for your help!

I have a simple dataset that looks like this:
* Example generated by -dataex-. For more info, type help dataex
clear
input str14 id str15 County
"1003027012" "Kern"
"1003068206" "Los Angeles"
"1003223124" "San Diego"
"1003223124" "Los Angeles"
"1003223124" "Orange"
"1003316290" "Sacramento"
"1003351610" "San Diego"
"1003351610" "Riverside"
"1003351610" "Orange"
"1003352220" "Los Angeles"
"1003412743" "Los Angeles"
"1003413600" "Fresno"
"1003413600" "Madera"
"1003413600" "Kings"
"1003413600" "Tulare"
end
[/CODE]

I've been asked to reorganize the data such that each row is a unique id, and each column is a dummy variable for whether that county is associated with that id (see below). Reshaping wide is close to what I want, but that command simply creates new variables with the first county that appears for each id variable, rather than creating dummies. I could just create the county variables one by one, but I'd also love to learn about a better solution if it's out there.

This is what I want the data to look like eventually, in case I haven't been clear:

ID	Kern	Los Angeles	Orange	San Diego	Sacramento	Riverside	Fresno	Kings	Madera	Tulare
1003027012	1
1003068206		1
1003223124		1
1003223124			1
1003223124				1
1003316290					1
1003351610			1
1003351610						1
1003351610				1
1003352220		1
1003412743		1
1003413600							1
1003413600
1003413600									1
1003413600										1

Thanks so much! Any suggestion welcome and appreciated, and feel free to redirect me to a pre-existing thread if I've overlooked it.

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

04 Jul 2024, 07:21

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str14 id str15 County
"1003027012" "Kern"       
"1003068206" "Los Angeles"
"1003223124" "San Diego"  
"1003223124" "Los Angeles"
"1003223124" "Orange"     
"1003316290" "Sacramento" 
"1003351610" "San Diego"  
"1003351610" "Riverside"  
"1003351610" "Orange"     
"1003352220" "Los Angeles"
"1003412743" "Los Angeles"
"1003413600" "Fresno"     
"1003413600" "Madera"     
"1003413600" "Kings"      
"1003413600" "Tulare"     
end

fillin id County
replace _fillin=!_fillin
replace County= strtoname(County)
reshape wide _fillin, i(id) j(County) string
rename _fillin* *

Res.:

Code:

. l, sep(0)

     +-----------------------------------------------------------------------------------------------------------+
     |         id   Fresno   Kern   Kings   Los_An~s   Madera   Orange   Rivers~e   Sacram~o   San_Di~o   Tulare |
     |-----------------------------------------------------------------------------------------------------------|
  1. | 1003027012        0      1       0          0        0        0          0          0          0        0 |
  2. | 1003068206        0      0       0          1        0        0          0          0          0        0 |
  3. | 1003223124        0      0       0          1        0        1          0          0          1        0 |
  4. | 1003316290        0      0       0          0        0        0          0          1          0        0 |
  5. | 1003351610        0      0       0          0        0        1          1          0          1        0 |
  6. | 1003352220        0      0       0          1        0        0          0          0          0        0 |
  7. | 1003412743        0      0       0          1        0        0          0          0          0        0 |
  8. | 1003413600        1      0       1          0        1        0          0          0          0        1 |
     +-----------------------------------------------------------------------------------------------------------+

Comment

Nereida GHeller

Join Date: Jul 2024

Posts: 2
#3

04 Jul 2024, 07:53

This is so elegant and works beautifully! Thank you so much, Andrew. Much appreciated.
Comment

Announcement

reshape with dummy variables for values?

Comment

Comment