Hi there, thanks all in advance for your help!
I have a simple dataset that looks like this:
* Example generated by -dataex-. For more info, type help dataex
clear
input str14 id str15 County
"1003027012" "Kern"
"1003068206" "Los Angeles"
"1003223124" "San Diego"
"1003223124" "Los Angeles"
"1003223124" "Orange"
"1003316290" "Sacramento"
"1003351610" "San Diego"
"1003351610" "Riverside"
"1003351610" "Orange"
"1003352220" "Los Angeles"
"1003412743" "Los Angeles"
"1003413600" "Fresno"
"1003413600" "Madera"
"1003413600" "Kings"
"1003413600" "Tulare"
end
[/CODE]
I've been asked to reorganize the data such that each row is a unique id, and each column is a dummy variable for whether that county is associated with that id (see below). Reshaping wide is close to what I want, but that command simply creates new variables with the first county that appears for each id variable, rather than creating dummies. I could just create the county variables one by one, but I'd also love to learn about a better solution if it's out there.
This is what I want the data to look like eventually, in case I haven't been clear:
Thanks so much! Any suggestion welcome and appreciated, and feel free to redirect me to a pre-existing thread if I've overlooked it.
I have a simple dataset that looks like this:
* Example generated by -dataex-. For more info, type help dataex
clear
input str14 id str15 County
"1003027012" "Kern"
"1003068206" "Los Angeles"
"1003223124" "San Diego"
"1003223124" "Los Angeles"
"1003223124" "Orange"
"1003316290" "Sacramento"
"1003351610" "San Diego"
"1003351610" "Riverside"
"1003351610" "Orange"
"1003352220" "Los Angeles"
"1003412743" "Los Angeles"
"1003413600" "Fresno"
"1003413600" "Madera"
"1003413600" "Kings"
"1003413600" "Tulare"
end
[/CODE]
I've been asked to reorganize the data such that each row is a unique id, and each column is a dummy variable for whether that county is associated with that id (see below). Reshaping wide is close to what I want, but that command simply creates new variables with the first county that appears for each id variable, rather than creating dummies. I could just create the county variables one by one, but I'd also love to learn about a better solution if it's out there.
This is what I want the data to look like eventually, in case I haven't been clear:
ID | Kern | Los Angeles | Orange | San Diego | Sacramento | Riverside | Fresno | Kings | Madera | Tulare |
1003027012 | 1 | |||||||||
1003068206 | 1 | |||||||||
1003223124 | 1 | |||||||||
1003223124 | 1 | |||||||||
1003223124 | 1 | |||||||||
1003316290 | 1 | |||||||||
1003351610 | 1 | |||||||||
1003351610 | 1 | |||||||||
1003351610 | 1 | |||||||||
1003352220 | 1 | |||||||||
1003412743 | 1 | |||||||||
1003413600 | 1 | |||||||||
1003413600 | ||||||||||
1003413600 | 1 | |||||||||
1003413600 | 1 |
Thanks so much! Any suggestion welcome and appreciated, and feel free to redirect me to a pre-existing thread if I've overlooked it.
Comment