Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding to an associative array in a loop

    In my real application, I have a function with a vector of arguments. I want to evaluate it at many arguments (on a grid) and store the results for later use.

    This example of a function with a one-dimensional argument (not my actual use-case) illustrates my problem.

    Code:
    rseed(76543219)
    pg = jumble(1::1e7)
    
    real rowvector function myfun(real rowvector p){
        return(1*sqrt(p):*(1:-p))
    }
    
    myvals = asarray_create("real",1,1e7)
    for (i=1;i<=1e7;i++){
            asarray(myvals,i,myfun(pg[i]))
        }    
    }
    My instinct is to run the loop above to populate "myvals". However, the loop slows down dramatically over time (making this infeasible for my application). Try running this code to see what I mean:

    Code:
    myvals = asarray_create("real",1,1e7)
    n_iter = 10
    
    timer_clear(1)
    timer_on(1)
    
    timer_clear(2)
    timer_on(2)
    for (iter=1;iter<=n_iter;iter++){
        for (i=1+(iter-1)*1e6;i<=iter*1e6;i++){
            asarray(myvals,i,myfun(pg[i]))
        }
        timer_off(1)
        timer_value(1)
        printf(strofreal(iter))
        timer_clear(1)
        timer_on(1)
    }
    timer_off(2)
    timer_value(2)
    On my computer the final iteration of the loop (iter=10) takes 170% as long as the first (iter=1).

    Each iteration should take the same amount of time. Compare with a loop that evaluates the function without assignment to an asarray:

    Code:
    n_iter = 10
    
    timer_clear(1)
    timer_on(1)
    
    timer_clear(2)
    timer_on(2)
    for (iter=1;iter<=n_iter;iter++){
        for (i=1+(iter-1)*1e6;i<=iter*1e6;i++){
            x = myfun(pg[i])
        }
        timer_off(1)
        timer_value(1)
        printf(strofreal(iter))
        timer_clear(1)
        timer_on(1)
    }
    timer_off(2)
    timer_value(2)
    I see roughly the same time for every iteration.

    So, my question is: What am I doing wrong? Is the size I'm setting in the third argument of asarray_create() not sufficient to warn Mata of how many keys I will have? Because I think I have set the minsize correctly, I've tried disabling downsizing and upsizing (as mentioned in the docs), but to no avail.

  • #2
    Since you initialize the array table with 1e7 elements and then proceed to fill it with 1e7 elements,
    I believe you are dealing with collisions. The hashing function is not necessarily going to map your 1e7
    keys uniquely, even though the table has 1e7 entries. Try using a bigger initial table size. I played around
    with different values and found that 1e8 yielded only about 10% time difference between the first and last
    group.

    Comment


    • #3
      Okay, thanks Jeff; I'll try that. I didn't realize hashes could/would run into collisions in that case.

      Comment

      Working...
      X