Help with reshape long

Amal Khanolkar

Join Date: Feb 2015

Posts: 142
#1

Help with reshape long

07 Nov 2019, 10:30

Hi All

I need some help & advice with using the reshape command:

I have a longitudinal dataset in the wide format. I've done considerable cleaning and housekeeping to prep my variables for analysis. I now need to reshape my dataset into long format for my analysis:

Code:

reshape long diab anaemia bmi, i(id) j(wave)

However, I didn't realise that reshape orders the variables numerically after running the command above, which has altered the ordering of the newly created j (wave) variable.

For example, the diab variable (which indicates if a subject has diabetes or not 0/1), was initially 5 variables: diab82 diab89 diab99 diab09 & diab15, where the suffix numbers indicate wave year (1982, 1989, 1999 and so on). After running reshape, Stata orders the 'wave' or j variable in ascending order of 9, 15, 82, 89, 99. Is there anyway to take care of this with the reshape command or do I need to rename all my variables?

example dataset (in wide format):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int id float(diab99 diab99f diab09 diab15 diab15f anaemia99 anaemia09 anaemia15 bmi82 bmi89 bmi99 bmi09 bmi15) 1 0 0 0 0 0 0 0 1 26.28196 27.85201 30.710836 27.91919 25.007694 2 0 0 0 0 0 0 0 0 28.37953 28.451317 28.066423 31.02016 29.96841 3 . . 0 0 0 . 0 . . . . . . 4 . . . 0 0 . . . . . . . . 5 0 0 0 0 0 0 0 0 25.03128 27.769075 29.470186 34.019962 34.648438 end

Thanks!
/Amal
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4408
#2

07 Nov 2019, 10:43

you neither say what you want the result to be, nor what the problem with the current result is; please clarify
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#3

07 Nov 2019, 10:46

The option string will preserve the original suffixes (including leading zeros in 09) if that is what I want.

I wouldn't do that myself. The suffixes are numeric and just need to be mapped to better numeric values.

I can't see any real gain in using 82 to 15 over 1982 to 2015 for any purpose, e,g. tabular or graphical. Manifestly, there is an order there but you can't expect Stata to know what it is.

That is easy enough to fix after the reshape:

Code:

gen year = cond(wave < 20, wave + 2000, wave + 1900)

Otherwise I agree with @Rich Goldstein: this is not very clear.

Last edited by Nick Cox; 07 Nov 2019, 11:10.
Comment
Amal Khanolkar

Join Date: Feb 2015

Posts: 142
#4

07 Nov 2019, 11:07

Hi both

As an example I would like the dataset to look like this after reshaping (from wide to long):

Id wave diab anaemia bmi 1 82 0 0 23.4 1 89 0 0 24.5 1 99 0 1 36 1 09 0 1 37 2 82 1 1 29 2 89 1 1 31 2 99 1 0 32 3 82 1 1 42 3 89 1 1 43 4 82 0 0 23 4 89 0 0 22 4 99 0 0 22 The problem is that in the wide format diabetes is five variables (for the five waves diab82 diab89 diab99 diab09 & diab15). After using reshape (wide to long), the above 5 diabetes variables get ordered in the wrong format in the longitudinal dataset, as Stata orders the suffixes numerically). Nick - I tried reshape with the string option, but the problem persists. Thanks! /Amal
Comment
Amal Khanolkar

Join Date: Feb 2015

Posts: 142
#5

07 Nov 2019, 11:13

Apologies for the previous reply - but for some reason - my last message was formatted totally wrong!
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

07 Nov 2019, 11:56

There's a problem allowing the suffixes to be converted to numeric - reshape then looks for diab9 rather than diab09, for example. This I would argue is something of a failure of the reshape command.

So we have to leave the suffixes as string. And in doing so, we first need to do something about diab99f and dab15f. I've assumed you want them in the 1999 and 2015 waves.

Code:

rename (diab*f) (f_diab*)
reshape long diab f_diab anaemia bmi, i(id) j(wave) string
generate year = real(wave)
replace  year = cond(year < 20, year + 2000, year + 1900)
sort id year
list, sepby(id) noobs

Code:

. list, sepby(id) noobs

  +-------------------------------------------------------+
  | id   wave   diab   f_diab   anaemia        bmi   year |
  |-------------------------------------------------------|
  |  1     82      .        .         .   26.28196   1982 |
  |  1     89      .        .         .   27.85201   1989 |
  |  1     99      0        0         0   30.71084   1999 |
  |  1     09      0        .         0   27.91919   2009 |
  |  1     15      0        0         1   25.00769   2015 |
  |-------------------------------------------------------|
  |  2     82      .        .         .   28.37953   1982 |
  |  2     89      .        .         .   28.45132   1989 |
  |  2     99      0        0         0   28.06642   1999 |
  |  2     09      0        .         0   31.02016   2009 |
  |  2     15      0        0         0   29.96841   2015 |
  |-------------------------------------------------------|
  |  3     82      .        .         .          .   1982 |
  |  3     89      .        .         .          .   1989 |
  |  3     99      .        .         .          .   1999 |
  |  3     09      0        .         0          .   2009 |
  |  3     15      0        0         .          .   2015 |
  |-------------------------------------------------------|
  |  4     82      .        .         .          .   1982 |
  |  4     89      .        .         .          .   1989 |
  |  4     99      .        .         .          .   1999 |
  |  4     09      .        .         .          .   2009 |
  |  4     15      0        0         .          .   2015 |
  |-------------------------------------------------------|
  |  5     82      .        .         .   25.03128   1982 |
  |  5     89      .        .         .   27.76908   1989 |
  |  5     99      0        0         0   29.47019   1999 |
  |  5     09      0        .         0   34.01996   2009 |
  |  5     15      0        0         0   34.64844   2015 |
  +-------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

07 Nov 2019, 12:10

Looking at this again in more detail, and trying your code, it becomes clearer that the reshape code in #1 isn't good enough to match the complexities of your data.

Conversely, the order of observations being not what you expect is, if I understand correctly, something that itself only requires a new sort order.

As you don't show your different reshape code in #4 I can't comment on it.

First up are the *f variables, which I don't understand. It may be the wrong way to go, but I renamed them first.

Then the awkward suffixes 82 to 15, although clear enough to a researcher, need more careful treatment.

I offer this as possibly closer to what you want.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id float(diab99 diab99f diab09 diab15 diab15f anaemia99 anaemia09 anaemia15 bmi82 bmi89 bmi99 bmi09 bmi15)
1 0 0 0 0 0 0 0 1 26.28196  27.85201 30.710836  27.91919 25.007694
2 0 0 0 0 0 0 0 0 28.37953 28.451317 28.066423  31.02016  29.96841
3 . . 0 0 0 . 0 .        .         .         .         .         .
4 . . . 0 0 . . .        .         .         .         .         .
5 0 0 0 0 0 0 0 0 25.03128 27.769075 29.470186 34.019962 34.648438
end

rename (diab99f diab15f) (fdiab99 fdiab15)

reshape long diab fdiab anaemia bmi, i(id) j(wave) string

gen year = cond(wave < "20", 2000 + real(wave), 1900 + real(wave))

sort id year

list, sepby(id)

     +------------------------------------------------------+
     | id   wave   diab   fdiab   anaemia        bmi   year |
     |------------------------------------------------------|
  1. |  1     82      .       .         .   26.28196   1982 |
  2. |  1     89      .       .         .   27.85201   1989 |
  3. |  1     99      0       0         0   30.71084   1999 |
  4. |  1     09      0       .         0   27.91919   2009 |
  5. |  1     15      0       0         1   25.00769   2015 |
     |------------------------------------------------------|
  6. |  2     82      .       .         .   28.37953   1982 |
  7. |  2     89      .       .         .   28.45132   1989 |
  8. |  2     99      0       0         0   28.06642   1999 |
  9. |  2     09      0       .         0   31.02016   2009 |
 10. |  2     15      0       0         0   29.96841   2015 |
     |------------------------------------------------------|
 11. |  3     82      .       .         .          .   1982 |
 12. |  3     89      .       .         .          .   1989 |
 13. |  3     99      .       .         .          .   1999 |
 14. |  3     09      0       .         0          .   2009 |
 15. |  3     15      0       0         .          .   2015 |
     |------------------------------------------------------|
 16. |  4     82      .       .         .          .   1982 |
 17. |  4     89      .       .         .          .   1989 |
 18. |  4     99      .       .         .          .   1999 |
 19. |  4     09      .       .         .          .   2009 |
 20. |  4     15      0       0         .          .   2015 |
     |------------------------------------------------------|
 21. |  5     82      .       .         .   25.03128   1982 |
 22. |  5     89      .       .         .   27.76908   1989 |
 23. |  5     99      0       0         0   29.47019   1999 |
 24. |  5     09      0       .         0   34.01996   2009 |
 25. |  5     15      0       0         0   34.64844   2015 |
     +------------------------------------------------------+

EDIT: I was distracted by an urgent task, but good news in that William Lisowski and I have essentially identical diagnoses.

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

07 Nov 2019, 12:37

I'm gratified that Nick Cox did not present an obvious way to improve on my handling of the 09 variables, which I was afraid was overlooking something obvious.

But with greater focus on the rename command and the output of help rename group and help reshape, I've found a not-entirely-obvious way, and at the same time, taken care of the *f variables more cleanly. I'm posting it because it does show how to avoid the problem of leading zeroes in the j() value.

Code:

rename (*(##)*) (*19#*)
reshape long diab diab@f anaemia bmi, i(id) j(year)
replace year = year+100 if year<=1920
sort id year
list, sepby(id) noobs

Code:

. list, sepby(id) noobs

  +-----------------------------------------------+
  | id   year   diab   diabf   anaemia        bmi |
  |-----------------------------------------------|
  |  1   1982      .       .         .   26.28196 |
  |  1   1989      .       .         .   27.85201 |
  |  1   1999      0       0         0   30.71084 |
  |  1   2009      0       .         0   27.91919 |
  |  1   2015      0       0         1   25.00769 |
  |-----------------------------------------------|
  |  2   1982      .       .         .   28.37953 |
  |  2   1989      .       .         .   28.45132 |
  |  2   1999      0       0         0   28.06642 |
  |  2   2009      0       .         0   31.02016 |
  |  2   2015      0       0         0   29.96841 |
  |-----------------------------------------------|
  |  3   1982      .       .         .          . |
  |  3   1989      .       .         .          . |
  |  3   1999      .       .         .          . |
  |  3   2009      0       .         0          . |
  |  3   2015      0       0         .          . |
  |-----------------------------------------------|
  |  4   1982      .       .         .          . |
  |  4   1989      .       .         .          . |
  |  4   1999      .       .         .          . |
  |  4   2009      .       .         .          . |
  |  4   2015      0       0         .          . |
  |-----------------------------------------------|
  |  5   1982      .       .         .   25.03128 |
  |  5   1989      .       .         .   27.76908 |
  |  5   1999      0       0         0   29.47019 |
  |  5   2009      0       .         0   34.01996 |
  |  5   2015      0       0         0   34.64844 |
  +-----------------------------------------------+

Last edited by William Lisowski; 07 Nov 2019, 12:39.

Announcement