Reshaping wide to long data

Anete Kaldal

Join Date: Oct 2020

Posts: 7
#1

Reshaping wide to long data

15 Jan 2022, 09:31

Hi!

I am working with quite large dataset, which consists of both "fixed" variables (like ID, gender, date of birth, date of hospitalization, initial treatment, previous diseases) as well as variables containing longitudinal data (like ldl-cholesterol, blood pressure, employment status) assessed at certain time points (at baseline, after two weeks, after 3months etc). An example from the dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str6 Gender str19 Treatment double(d00ldl d14ldl m03ldl m06ldl) float(d00arbeid d14arbeid m03arbeid m06arbeid) "Mann" "PCI-medikamentstent" 7 3.1 2.8 2.2 1 0 0 0 "Mann" "PCI-medikamentstent" 2.2 2 2.2 2.2 1 0 0 1 "Mann" "PCI-metallstent" 3.2 3.3 3.3 2.9 1 0 1 1 "Kvinne" "PCI-metallstent" 3.3 2.4 2.3 2.2 1 1 1 1 "Mann" "PCI-medikamentstent" 2.4 . . . 1 . . . "Mann" "PCI-medikamentstent" 4.7 2.3 2.1 2.2 0 0 0 0 "Mann" "POBA" 3 2.1 1.8 1.7 0 0 0 0 "Mann" "PCI-medikamentstent" 4.1 . . . 0 . . . "Mann" "PCI-medikamentstent" 2.6 2.9 2.4 2.6 0 0 0 0 "Mann" "PCI-medikamentstent" 2.3 . . . 1 . . . end label values d00arbeid neija label values d14arbeid neija label values m03arbeid neija label values m06arbeid neija label def neija 0 "Nei", modify label def neija 1 "Ja", modify

I have tried to read other posts on how to reshape data from wide to long, but I see that there are different advices in different situations, so I decided to ask to be sure I apply the right commands. I firstly renamed all the time-dependent variables to be compatible for long format by using rename command, as in examples below:

rename d00* *00
rename d14* *01
rename m06* *06

The dataset is quite large and I wonder if there is better way to do it than using the standard command (reshape long varname, i(ID) j(time) ).

Thanks for your attention!
Tags: None

Øyvind Snilsberg

Join Date: Oct 2021
Posts: 591

15 Jan 2022, 10:05

I am unaware of better options but you could consider partitioning the data, i.e.,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 Gender str19 Treatment double(d00ldl d14ldl m03ldl m06ldl) float(d00arbeid d14arbeid m03arbeid m06arbeid)
"Mann"   "PCI-medikamentstent"   7 3.1 2.8 2.2 1 0 0 0
"Mann"   "PCI-medikamentstent" 2.2   2 2.2 2.2 1 0 0 1
"Mann"   "PCI-metallstent"     3.2 3.3 3.3 2.9 1 0 1 1
"Kvinne" "PCI-metallstent"     3.3 2.4 2.3 2.2 1 1 1 1
"Mann"   "PCI-medikamentstent" 2.4   .   .   . 1 . . .
"Mann"   "PCI-medikamentstent" 4.7 2.3 2.1 2.2 0 0 0 0
"Mann"   "POBA"                  3 2.1 1.8 1.7 0 0 0 0
"Mann"   "PCI-medikamentstent" 4.1   .   .   . 0 . . .
"Mann"   "PCI-medikamentstent" 2.6 2.9 2.4 2.6 0 0 0 0
"Mann"   "PCI-medikamentstent" 2.3   .   .   . 1 . . .
end

gen id = _n

rename d00* *1
rename d14* *2
rename m03* *3
rename m06* *4

tempfile data
save `data'

use `data' if id<=5, clear
reshape long ldl arbeid, i(id) j(tid)
tempfile data1
save `data1'

use `data' if id>5, clear
reshape long ldl arbeid, i(id) j(tid)
tempfile data2
save `data2'

clear
append using `data1'
append using `data2'

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#3

15 Jan 2022, 12:13

I'm not sure what you mean by a "better" alternative. With very large data sets, -reshape- can be very slow and people often look for something that will do the job faster. There are a few user-wrritten programs that can do that.

There is the -greshape- command, which is part of Mauricio Caceres' -gtools- package, available at github.com/mcaceresb/stata-gtools.

And there is -tolong-, by Rafal Raciborski, available from SSC.

-greshape- will do both wide to long and long to wide reshaping. -tolong-, as its name suggests, does only wide to long. Both are much faster than Stata's native -reshape- command in large data sets. And both have syntax that is largely similar to that of -reshape-, so learning to use them is simple enough.
Comment
Anete Kaldal

Join Date: Oct 2020

Posts: 7
#4

16 Jan 2022, 13:32

Thanks a lot for your input, I will try your suggestions!
Comment

Announcement

Reshaping wide to long data

Comment

Comment

Comment