Improve speed of double foreach loop

Clemens Lehner

Join Date: Oct 2018
Posts: 7

Improve speed of double foreach loop

05 Mar 2020, 10:20

Hello, I have rainfall data from different weather stations and for each station and each month I want to calculate the probability that it rains on a given day in that month at that station. I wrote a nested loop that works fine, however, it is quite slow and the sample is large. Are there any best practices how to increase the speed of calculation?

Code:

clear //Input data is the station ID, the month and the amount of rainfall in mm 
input station_id month rainfall
1 3 1
1 3 1
1 3 0
1 5 2
1 5 0
2 6 5
2 6 0
2 6 0
end

generate raindummy = 0 
replace raindummy = 1 if rainfall > 0

gen prob_rain = 0

levelsof month, local(month) 
levelsof station_id, local(station)

foreach m of local month {
 foreach s of local station{
 gen sum_rain`m'_`s' = sum(raindummy) if month == `m' & station_id == `s'
 count if month == `m' & station_id == `s'
 replace prob_rain = sum_rain`m'_`s' / r(N) if month == `m' & station_id == `s'
}
}

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35414

05 Mar 2020, 11:21

A way to speed something up is to cut it out. I offer here a reduction of 12 lines of code to 1.

Code:

clear
//Input data is the station ID, the month and the amount of rainfall in mm  
input station_id month rainfall
1 3 1
1 3 1
1 3 0
1 5 2
1 5 0
2 6 5
2 6 0
2 6 0
end  

egen pr_rain = mean(rainfall > 0), by(month station_id)

It's true that egen imparts some interpretative overload. So here is a longer version that should be faster and also worries about the possibility of missing values.

Code:

bysort month station_id : gen wanted = sum(rainfall > 0 & rainfall < .) / sum(rainfall < .)  
by month station_id: replace wanted = wanted[_N]

Announcement

Improve speed of double foreach loop

Comment