I'm not the first to notice that the reshape command can be unnecessarily slow. In a big program I'm finding that the reshape command is taking a substantial fraction of the runtime, and that I can reshape the data more quickly using other commands. Granted, I'm solving a specific problem, and reshape is designed to be general. But still it's surprising to me that I can outperform a core command with very simple alternatives.
I would suggest that Stata update the command to make it more efficient, or that some user develop an alternative command, reshape_fast, that accomplishes the same things faster.
Here is a simple example of what I'm talking about.
/* Simulate some long data. */
clear
set obs 999
gen id = 1
gen i = _n
gen v=rnormal()
save long, replace
/* I can make it wide, using reshape. */
timer clear 1
timer on 1
reshape wide v, i(id) j(i)
timer off 1
timer list 1
/* That took 6 seconds. Why? */
/* Look how much faster I can do the same thing with lagged variables. */
use long, clear
timer clear 2
timer on 2
forvalues i = 1/999 {
gen v`i' = v[`i'] in 1
}
drop v i
keep in 1
timer off 2
timer list 2
/* That took 0.03 sec! */
/* Likewise I can go from wide to long using reshape. */
save wide, replace
timer clear 3
timer on 3
reshape long v, i(id) j(i)
timer off 3
timer list 3
/* That took 4 seconds. Why? */
/* Look how much faster I can do the same thing with expand. */
use wide, clear
timer clear 4
timer on 4
expand 999
gen v = .
gen i = .
forvalues i=1/999 {
replace v = v`i' in `i'
replace i = `i' in `i'
}
timer off 4
timer list 4
/* That took 0.01 sec! */
I would suggest that Stata update the command to make it more efficient, or that some user develop an alternative command, reshape_fast, that accomplishes the same things faster.
Here is a simple example of what I'm talking about.
/* Simulate some long data. */
clear
set obs 999
gen id = 1
gen i = _n
gen v=rnormal()
save long, replace
/* I can make it wide, using reshape. */
timer clear 1
timer on 1
reshape wide v, i(id) j(i)
timer off 1
timer list 1
/* That took 6 seconds. Why? */
/* Look how much faster I can do the same thing with lagged variables. */
use long, clear
timer clear 2
timer on 2
forvalues i = 1/999 {
gen v`i' = v[`i'] in 1
}
drop v i
keep in 1
timer off 2
timer list 2
/* That took 0.03 sec! */
/* Likewise I can go from wide to long using reshape. */
save wide, replace
timer clear 3
timer on 3
reshape long v, i(id) j(i)
timer off 3
timer list 3
/* That took 4 seconds. Why? */
/* Look how much faster I can do the same thing with expand. */
use wide, clear
timer clear 4
timer on 4
expand 999
gen v = .
gen i = .
forvalues i=1/999 {
replace v = v`i' in `i'
replace i = `i' in `i'
}
timer off 4
timer list 4
/* That took 0.01 sec! */
Comment