Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help reshaping data from wide to long

    I have the following dataset that I would like to reshape from wide to long. I want the final dataset to contain a row for each respondent's ranking of each individual candidate in terms of favorability--i.e. for respondent 1, a row with fav_biden_2019Nov, fav_sanders_2019Nov, etc. I tried -reshape long fav, i(id) j(candidate)- and received the following 498 error: "variable candidate contains all missing values." Is there something I am missing?

    I would also like the wide data to contain a dummy indicating whether that particular candidate was the respondent's choice for magicdempres_2019Nov, and I would appreciate any advice on how to go about that!

    Thank you in advance!

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(fav_biden_2019Nov fav_sanders_2019Nov fav_warren_2019Nov fav_harris_2019Nov) float id int magicdempres_2019Nov
    2 2 1 2  1   4
    1 3 1 2  2   2
    2 2 1 1  3   1
    1 1 1 2  4 999
    2 3 1 1  5   1
    1 1 1 1  6   1
    3 2 1 2  7   1
    2 2 1 2  8  16
    4 4 5 5  9 999
    1 1 1 2 10   2
    4 2 2 4 11 999
    4 4 4 4 12 999
    4 4 4 4 13 999
    4 4 4 4 14 999
    4 3 4 4 15 999
    5 2 1 2 16   1
    2 2 1 1 17   1
    3 3 2 3 18   4
    4 4 4 4 19 999
    2 2 1 3 20   4
    end
    label values fav_biden_2019Nov Q8_f_2019Nov
    label def Q8_f_2019Nov 1 "Very favorable", modify
    label def Q8_f_2019Nov 2 "Somewhat favorable", modify
    label def Q8_f_2019Nov 3 "Somewhat unfavorable", modify
    label def Q8_f_2019Nov 4 "Very unfavorable", modify
    label def Q8_f_2019Nov 5 "Don't know", modify
    label values fav_sanders_2019Nov Q8_g_2019Nov
    label def Q8_g_2019Nov 1 "Very favorable", modify
    label def Q8_g_2019Nov 2 "Somewhat favorable", modify
    label def Q8_g_2019Nov 3 "Somewhat unfavorable", modify
    label def Q8_g_2019Nov 4 "Very unfavorable", modify
    label values fav_warren_2019Nov Q8_h_2019Nov
    label def Q8_h_2019Nov 1 "Very favorable", modify
    label def Q8_h_2019Nov 2 "Somewhat favorable", modify
    label def Q8_h_2019Nov 4 "Very unfavorable", modify
    label def Q8_h_2019Nov 5 "Don't know", modify
    label values fav_harris_2019Nov Q8_i_2019Nov
    label def Q8_i_2019Nov 1 "Very favorable", modify
    label def Q8_i_2019Nov 2 "Somewhat favorable", modify
    label def Q8_i_2019Nov 3 "Somewhat unfavorable", modify
    label def Q8_i_2019Nov 4 "Very unfavorable", modify
    label def Q8_i_2019Nov 5 "Don't know", modify
    label values magicdempres_2019Nov Q16_2019Nov
    label def Q16_2019Nov 1 "Elizabeth Warren", modify
    label def Q16_2019Nov 2 "Joe Biden", modify
    label def Q16_2019Nov 4 "Pete Buttigieg", modify
    label def Q16_2019Nov 16 "Joe Sestak", modify
    label def Q16_2019Nov 999 "not asked", modify
    Last edited by Sara Saltzer; 21 Mar 2022, 14:17. Reason: Edited to clarify wording.

  • #2
    Fair?
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(fav_biden_2019Nov fav_sanders_2019Nov fav_warren_2019Nov fav_harris_2019Nov) float id int magicdempres_2019Nov
    2 2 1 2  1   4
    1 3 1 2  2   2
    2 2 1 1  3   1
    1 1 1 2  4 999
    2 3 1 1  5   1
    1 1 1 1  6   1
    3 2 1 2  7   1
    2 2 1 2  8  16
    4 4 5 5  9 999
    1 1 1 2 10   2
    4 2 2 4 11 999
    4 4 4 4 12 999
    4 4 4 4 13 999
    4 4 4 4 14 999
    4 3 4 4 15 999
    5 2 1 2 16   1
    2 2 1 1 17   1
    3 3 2 3 18   4
    4 4 4 4 19 999
    2 2 1 3 20   4
    end
    label values fav_biden_2019Nov Q8_f_2019Nov
    label def Q8_f_2019Nov 1 "Very favorable", modify
    label def Q8_f_2019Nov 2 "Somewhat favorable", modify
    label def Q8_f_2019Nov 3 "Somewhat unfavorable", modify
    label def Q8_f_2019Nov 4 "Very unfavorable", modify
    label def Q8_f_2019Nov 5 "Don't know", modify
    label values fav_sanders_2019Nov Q8_g_2019Nov
    label def Q8_g_2019Nov 1 "Very favorable", modify
    label def Q8_g_2019Nov 2 "Somewhat favorable", modify
    label def Q8_g_2019Nov 3 "Somewhat unfavorable", modify
    label def Q8_g_2019Nov 4 "Very unfavorable", modify
    label values fav_warren_2019Nov Q8_h_2019Nov
    label def Q8_h_2019Nov 1 "Very favorable", modify
    label def Q8_h_2019Nov 2 "Somewhat favorable", modify
    label def Q8_h_2019Nov 4 "Very unfavorable", modify
    label def Q8_h_2019Nov 5 "Don't know", modify
    label values fav_harris_2019Nov Q8_i_2019Nov
    label def Q8_i_2019Nov 1 "Very favorable", modify
    label def Q8_i_2019Nov 2 "Somewhat favorable", modify
    label def Q8_i_2019Nov 3 "Somewhat unfavorable", modify
    label def Q8_i_2019Nov 4 "Very unfavorable", modify
    label def Q8_i_2019Nov 5 "Don't know", modify
    label values magicdempres_2019Nov Q16_2019Nov
    label def Q16_2019Nov 1 "Elizabeth Warren", modify
    label def Q16_2019Nov 2 "Joe Biden", modify
    label def Q16_2019Nov 4 "Pete Buttigieg", modify
    label def Q16_2019Nov 16 "Joe Sestak", modify
    label def Q16_2019Nov 999 "not asked", modify
    
    
    rename (*) (Biden_2019 Sanders_2019 Warren_2019 Harris_2019 id Magic_2019)
    
    greshape long Biden_ Sanders_ Warren_ Harris_ Magic_, i(id) j(year)
    I also don't know what the last question means, can you clarify that please?

    NOTE that I use greshape (ssc inst gtools, replace), but regular reshape works too without any changes to syntax.

    Comment


    • #3
      Admittedly, the error "variable candidate contains all missing values" is a little unclear. What it indicates is that an attempt to generate a numeric variable using the values in your variable names would lead to an empty variable, because the variable names are non-numeric. Therefore, you just need to specify the string option in your reshape command (see line 1 of below code).

      To generate your desired dummy, just check if the last name of the candidate in the magicdempres variable is equal to the first word after the underscore in the candidate variable (see lines 2-3 of below code).

      Code:
      reshape long fav, i(id) j(candidate) string
      decode magicdempres_2019, gen(magicdempres_2019_str)
      gen wanted = lower(word(magicdempres_2019_str,wordcount(magicdempres_2019_str))) == ustrregexra(candidate,"_(.*?)_.*","$1")

      Comment


      • #4
        Code:
        reshape long fav, i(id) j(candidate) string
        As an aside: the use of numerical codes (5 = Don't know", 999 = "not asked") is a recipe for trouble in Stata. I strongly urge you to replace those with Stata system missing or extended missing values. Read -help missing- if you are not familiar with these. If you keep these coded numerically, at some point you are likely to try to calculate an average or a total and forget to include -if variable != 5- or -if variable != 999- in the command, and then those 5's and 999's will get included in the calculation as if those were true numeric values.

        Added: Crossed with #2 and #3.

        Further addition: Both of the questions here were previously asked by O.P. and answered at https://www.statalist.org/forums/for...an-observation
        Last edited by Clyde Schechter; 21 Mar 2022, 14:56.

        Comment

        Working...
        X