Hello, I have rainfall data from different weather stations and for each station and each month I want to calculate the probability that it rains on a given day in that month at that station. I wrote a nested loop that works fine, however, it is quite slow and the sample is large. Are there any best practices how to increase the speed of calculation?
Code:
clear //Input data is the station ID, the month and the amount of rainfall in mm input station_id month rainfall 1 3 1 1 3 1 1 3 0 1 5 2 1 5 0 2 6 5 2 6 0 2 6 0 end generate raindummy = 0 replace raindummy = 1 if rainfall > 0 gen prob_rain = 0 levelsof month, local(month) levelsof station_id, local(station) foreach m of local month { foreach s of local station{ gen sum_rain`m'_`s' = sum(raindummy) if month == `m' & station_id == `s' count if month == `m' & station_id == `s' replace prob_rain = sum_rain`m'_`s' / r(N) if month == `m' & station_id == `s' } }
Comment