Help reshaping data from wide to long

Sara Saltzer

Join Date: Mar 2022
Posts: 6

Help reshaping data from wide to long

21 Mar 2022, 13:12

I have the following dataset that I would like to reshape from wide to long. I want the final dataset to contain a row for each respondent's ranking of each individual candidate in terms of favorability--i.e. for respondent 1, a row with fav_biden_2019Nov, fav_sanders_2019Nov, etc. I tried -reshape long fav, i(id) j(candidate)- and received the following 498 error: "variable candidate contains all missing values." Is there something I am missing?

I would also like the wide data to contain a dummy indicating whether that particular candidate was the respondent's choice for magicdempres_2019Nov, and I would appreciate any advice on how to go about that!

Thank you in advance!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(fav_biden_2019Nov fav_sanders_2019Nov fav_warren_2019Nov fav_harris_2019Nov) float id int magicdempres_2019Nov
2 2 1 2  1   4
1 3 1 2  2   2
2 2 1 1  3   1
1 1 1 2  4 999
2 3 1 1  5   1
1 1 1 1  6   1
3 2 1 2  7   1
2 2 1 2  8  16
4 4 5 5  9 999
1 1 1 2 10   2
4 2 2 4 11 999
4 4 4 4 12 999
4 4 4 4 13 999
4 4 4 4 14 999
4 3 4 4 15 999
5 2 1 2 16   1
2 2 1 1 17   1
3 3 2 3 18   4
4 4 4 4 19 999
2 2 1 3 20   4
end
label values fav_biden_2019Nov Q8_f_2019Nov
label def Q8_f_2019Nov 1 "Very favorable", modify
label def Q8_f_2019Nov 2 "Somewhat favorable", modify
label def Q8_f_2019Nov 3 "Somewhat unfavorable", modify
label def Q8_f_2019Nov 4 "Very unfavorable", modify
label def Q8_f_2019Nov 5 "Don't know", modify
label values fav_sanders_2019Nov Q8_g_2019Nov
label def Q8_g_2019Nov 1 "Very favorable", modify
label def Q8_g_2019Nov 2 "Somewhat favorable", modify
label def Q8_g_2019Nov 3 "Somewhat unfavorable", modify
label def Q8_g_2019Nov 4 "Very unfavorable", modify
label values fav_warren_2019Nov Q8_h_2019Nov
label def Q8_h_2019Nov 1 "Very favorable", modify
label def Q8_h_2019Nov 2 "Somewhat favorable", modify
label def Q8_h_2019Nov 4 "Very unfavorable", modify
label def Q8_h_2019Nov 5 "Don't know", modify
label values fav_harris_2019Nov Q8_i_2019Nov
label def Q8_i_2019Nov 1 "Very favorable", modify
label def Q8_i_2019Nov 2 "Somewhat favorable", modify
label def Q8_i_2019Nov 3 "Somewhat unfavorable", modify
label def Q8_i_2019Nov 4 "Very unfavorable", modify
label def Q8_i_2019Nov 5 "Don't know", modify
label values magicdempres_2019Nov Q16_2019Nov
label def Q16_2019Nov 1 "Elizabeth Warren", modify
label def Q16_2019Nov 2 "Joe Biden", modify
label def Q16_2019Nov 4 "Pete Buttigieg", modify
label def Q16_2019Nov 16 "Joe Sestak", modify
label def Q16_2019Nov 999 "not asked", modify

Last edited by Sara Saltzer; 21 Mar 2022, 13:17. Reason: Edited to clarify wording.

Tags: None

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

21 Mar 2022, 13:43

Fair?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(fav_biden_2019Nov fav_sanders_2019Nov fav_warren_2019Nov fav_harris_2019Nov) float id int magicdempres_2019Nov
2 2 1 2  1   4
1 3 1 2  2   2
2 2 1 1  3   1
1 1 1 2  4 999
2 3 1 1  5   1
1 1 1 1  6   1
3 2 1 2  7   1
2 2 1 2  8  16
4 4 5 5  9 999
1 1 1 2 10   2
4 2 2 4 11 999
4 4 4 4 12 999
4 4 4 4 13 999
4 4 4 4 14 999
4 3 4 4 15 999
5 2 1 2 16   1
2 2 1 1 17   1
3 3 2 3 18   4
4 4 4 4 19 999
2 2 1 3 20   4
end
label values fav_biden_2019Nov Q8_f_2019Nov
label def Q8_f_2019Nov 1 "Very favorable", modify
label def Q8_f_2019Nov 2 "Somewhat favorable", modify
label def Q8_f_2019Nov 3 "Somewhat unfavorable", modify
label def Q8_f_2019Nov 4 "Very unfavorable", modify
label def Q8_f_2019Nov 5 "Don't know", modify
label values fav_sanders_2019Nov Q8_g_2019Nov
label def Q8_g_2019Nov 1 "Very favorable", modify
label def Q8_g_2019Nov 2 "Somewhat favorable", modify
label def Q8_g_2019Nov 3 "Somewhat unfavorable", modify
label def Q8_g_2019Nov 4 "Very unfavorable", modify
label values fav_warren_2019Nov Q8_h_2019Nov
label def Q8_h_2019Nov 1 "Very favorable", modify
label def Q8_h_2019Nov 2 "Somewhat favorable", modify
label def Q8_h_2019Nov 4 "Very unfavorable", modify
label def Q8_h_2019Nov 5 "Don't know", modify
label values fav_harris_2019Nov Q8_i_2019Nov
label def Q8_i_2019Nov 1 "Very favorable", modify
label def Q8_i_2019Nov 2 "Somewhat favorable", modify
label def Q8_i_2019Nov 3 "Somewhat unfavorable", modify
label def Q8_i_2019Nov 4 "Very unfavorable", modify
label def Q8_i_2019Nov 5 "Don't know", modify
label values magicdempres_2019Nov Q16_2019Nov
label def Q16_2019Nov 1 "Elizabeth Warren", modify
label def Q16_2019Nov 2 "Joe Biden", modify
label def Q16_2019Nov 4 "Pete Buttigieg", modify
label def Q16_2019Nov 16 "Joe Sestak", modify
label def Q16_2019Nov 999 "not asked", modify


rename (*) (Biden_2019 Sanders_2019 Warren_2019 Harris_2019 id Magic_2019)

greshape long Biden_ Sanders_ Warren_ Harris_ Magic_, i(id) j(year)

I also don't know what the last question means, can you clarify that please?

NOTE that I use greshape (ssc inst gtools, replace), but regular reshape works too without any changes to syntax.

Comment

Ali Atia

Join Date: May 2020

Posts: 737
#3

21 Mar 2022, 13:44

Admittedly, the error "variable candidate contains all missing values" is a little unclear. What it indicates is that an attempt to generate a numeric variable using the values in your variable names would lead to an empty variable, because the variable names are non-numeric. Therefore, you just need to specify the string option in your reshape command (see line 1 of below code).

To generate your desired dummy, just check if the last name of the candidate in the magicdempres variable is equal to the first word after the underscore in the candidate variable (see lines 2-3 of below code).

Code:

reshape long fav, i(id) j(candidate) string decode magicdempres_2019, gen(magicdempres_2019_str) gen wanted = lower(word(magicdempres_2019_str,wordcount(magicdempres_2019_str))) == ustrregexra(candidate,"_(.*?)_.*","$1")
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

21 Mar 2022, 13:46

Code:

reshape long fav, i(id) j(candidate) string

As an aside: the use of numerical codes (5 = Don't know", 999 = "not asked") is a recipe for trouble in Stata. I strongly urge you to replace those with Stata system missing or extended missing values. Read -help missing- if you are not familiar with these. If you keep these coded numerically, at some point you are likely to try to calculate an average or a total and forget to include -if variable != 5- or -if variable != 999- in the command, and then those 5's and 999's will get included in the calculation as if those were true numeric values.

Added: Crossed with #2 and #3.

Further addition: Both of the questions here were previously asked by O.P. and answered at https://www.statalist.org/forums/for...an-observation

Last edited by Clyde Schechter; 21 Mar 2022, 13:56.
1 like
Comment

Announcement

Help reshaping data from wide to long

Comment

Comment

Comment