Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improve speed of double foreach loop

    Hello, I have rainfall data from different weather stations and for each station and each month I want to calculate the probability that it rains on a given day in that month at that station. I wrote a nested loop that works fine, however, it is quite slow and the sample is large. Are there any best practices how to increase the speed of calculation?

    Code:
    clear //Input data is the station ID, the month and the amount of rainfall in mm 
    input station_id month rainfall
    1 3 1
    1 3 1
    1 3 0
    1 5 2
    1 5 0
    2 6 5
    2 6 0
    2 6 0
    end
    
    generate raindummy = 0 
    replace raindummy = 1 if rainfall > 0
    
    gen prob_rain = 0
    
    levelsof month, local(month) 
    levelsof station_id, local(station)
    
    foreach m of local month {
     foreach s of local station{
     gen sum_rain`m'_`s' = sum(raindummy) if month == `m' & station_id == `s'
     count if month == `m' & station_id == `s'
     replace prob_rain = sum_rain`m'_`s' / r(N) if month == `m' & station_id == `s'
    }
    }

  • #2
    A way to speed something up is to cut it out. I offer here a reduction of 12 lines of code to 1.


    Code:
    clear
    //Input data is the station ID, the month and the amount of rainfall in mm  
    input station_id month rainfall
    1 3 1
    1 3 1
    1 3 0
    1 5 2
    1 5 0
    2 6 5
    2 6 0
    2 6 0
    end  
    
    egen pr_rain = mean(rainfall > 0), by(month station_id)
    It's true that egen imparts some interpretative overload. So here is a longer version that should be faster and also worries about the possibility of missing values.

    Code:
    bysort month station_id : gen wanted = sum(rainfall > 0 & rainfall < .) / sum(rainfall < .)  
    by month station_id: replace wanted = wanted[_N]

    Comment

    Working...
    X