Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speeding up a binomial-based likelihood calculation

    Greetings,

    I'm trying to improve the time efficiency of a single line of Mata code that accounts for about half of total run time in an ML estimation program I've written. I'd like some advice on tweaking the Mata code that calculates the LL. The LL function comes from a binomial model and the part of the LL at issue is:

    \[V = \sum\limits_{i = 1}^{N - 1} {\sum\limits_{j = i + 1}^N {{A_{ij}}\ln ({P_{ij}}) + } } \left( {K - {A_{ij}}} \right)\ln \left( {1 - {P_{ij}}} \right)\]

    where N is the number of subjects. The Aij and Pij are already calculated, and are obtained from N X N symmetrical matrices. K is an integer. The Mata code I'm currently using here is just one line:
    Code:
    V = sum(uppertriangle((A :* ln(P)) + (K :- A) :* ln( 1:- P),0))
    I know there is some inefficiency here, since I do the multiplication for the whole N X N A and P matrices, and then just sum the upper (or lower) part I need, but I'm stuck on a good way to avoid that, or otherwise improve this. As I indicated above, this one line is about half of total run time for a process that can be quite slow for large N (say N = 500), so improving this one line would be very helpful.

    So, this is sort of a speed challenge. Here's some code to create a data example if you want to mess with this:
    Code:
    mata:
    // given
    K = 20
    N = 500
    // Fake data for A and P
    A = makesymmetric(floor(K * lowertriangle(runiform(N,N))))
    P = makesymmetric(lowertriangle(runiform(N,N)))
    // LL
    V = sum(uppertriangle((A :* ln(P)) + (K :- A) :* ln( 1:- P),0))
    end
    Regards, Mike


  • #2
    Because matrices A and P are symmetric, we only need their strict lower (or upper) diagonals for calculations. I use the undocumented vech_lower() function that was recently added to Mata so Mike should make sure that his Stata is up to date.

    Code:
    clear all
    set seed 12345
    mata:
        k = 20
        n = 5000
        X = runiform(n,n)
        Y = runiform(n,n)
        A = floor(k*vech_lower(X))
        P = vech_lower(Y)
        LL = sum( (A:*ln(P)) + (k:-A):*ln(1:-P) )
    end

    Comment


    • #3
      Thanks, I'll look forward to trying this out in my program and reporting back. I would like the program to be able to run under less recent versions (say v. 12) of Stata, where only vech() tis available, which includes the main diagonal from the matrix into the vector. I need not to include the main diagonal. Is my best best there do something like P = vech(Y-diag(Y)) ?

      Regards, Mike

      Comment


      • #4
        This should be a bit faster.

        Code:
        _diag(X,0)
        P = vech(X)

        Comment


        • #5
          Good news and confusing news: I've implemented use of the vech_lower() function, and it saves about 50% of the time at relatively smaller N (say up to 400 or so), but above that, it only saves about 20% or so. That confuses me? Any insight, Rafal or others?
          In my application, larger sample sizes are where the time saving really gets interesting, so I'm mildly disappointed <grin>. I created some code from my example to demo this issue (up to date Stata 14 MP2 on a Windows machine). Here are the timings followed by the test code:
          Code:
          N     time1    time2    % reduction in time
          100    1.4     0.725    48
          200    2.54    1.23     52
          400    8.22    4.03     51
          800    34.7    21.6     38
          1600   145     103      29
          3200   562     432      23

          time1 is the time in seconds under the code using the new vech_lower() approach, while time2 is the time under the original approach, with multiplication of whole matrices, viz below:

          Code:
          clear all
          set seed 12345
          mata:
             timer_clear(1)
             timer_clear(2)
             reps  = 1000   // to take more time
              k = 20
              for (n = 100; n <=3200; n = n * 2.0) {   // various size ns
                A  = floor(k * runiform(n,n))
                P = runiform(n,n)
                for (i = 1; i <= reps; i++) {
                      timer_on(1)  // vech approach
                      PV = vech_lower(P)
                      AV = vech_lower(A)
                      LL = sum( (AV:*ln(PV)) + (k:-AV):*ln(1:-PV) )
                      timer_off(1)
                      //
                      timer_on(2)  // whole matrix approach
                      LL = sum(lowertriangle((A :* ln(P)) + (k :- A) :* ln( 1:- P),0))  
                      timer_off(2)
                 }  
                 // crude display of results
                 n
                 timer()
                 ""
                 timer_clear(1); timer_clear(2)
             }    
          end


          Regards, Mike

          Comment


          • #6
            vec() functions are not parallelized, thus the speedup will be lower for large n.

            Comment


            • #7
              I'm writing to close this thread, with what I hope will be some useful information for others:

              1) Because vech_lower() was only recently added to Mata, my program using vech_lower() would not run on an earlier version of Stata. This problem appears to be easily solved: The code for vech_lower() is available as vech_lower.mata. I simply inserted that code into my program, called it myvech_lower(), and that should take care of that issue.

              2) About the speed issue I mentioned above: At medium size N, the advantages of vech_lower() were substantial, saving about 50% of execution time over an approach in which the whole square matrix was multiplied. However, for larger sizes of the square symmetric matrix, the cost of extracting the lower half into a vector with vech_lower() starts to outweigh the benefit of avoiding redundant operations on the upper half of the matrix. Obviously, the break-even point of cost vs. benefit will depend on the particular computation being done with the square matrix. In my case, the vech_lower() approach was generally about 50% faster up to something like a 400 X 400 square matrix (my A matrix above), was only about 30% better for a 1600 X 1600, and was about 10% worse with a 3200 X 3200 matrix. This is clearly a case where one's mileage will vary, but it's worth doing a few tests if your usage of vech_lower() occurs in a speed-crucial context.

              Regards, Mike

              Comment


              • #8
                I need to make a correction to my comments on vech_lower(), thanks to a message from Rafal Raciborski.
                One can't simply substitute in the new code for vech_lower() and use it in older versions. vech_lower needs to call a newer version of vech(), which uses some different code in the executable.

                Comment

                Working...
                X