Speeding up a binomial-based likelihood calculation

Mike Lacy

Join Date: Apr 2014

Posts: 2404
#1

Speeding up a binomial-based likelihood calculation

11 Jan 2016, 12:53

Greetings,

I'm trying to improve the time efficiency of a single line of Mata code that accounts for about half of total run time in an ML estimation program I've written. I'd like some advice on tweaking the Mata code that calculates the LL. The LL function comes from a binomial model and the part of the LL at issue is:

\[V = \sum\limits_{i = 1}^{N - 1} {\sum\limits_{j = i + 1}^N {{A_{ij}}\ln ({P_{ij}}) + } } \left( {K - {A_{ij}}} \right)\ln \left( {1 - {P_{ij}}} \right)\]

where N is the number of subjects. The A_ij and P_ij are already calculated, and are obtained from N X N symmetrical matrices. K is an integer. The Mata code I'm currently using here is just one line:

Code:

V = sum(uppertriangle((A :* ln(P)) + (K :- A) :* ln( 1:- P),0))

I know there is some inefficiency here, since I do the multiplication for the whole N X N A and P matrices, and then just sum the upper (or lower) part I need, but I'm stuck on a good way to avoid that, or otherwise improve this. As I indicated above, this one line is about half of total run time for a process that can be quite slow for large N (say N = 500), so improving this one line would be very helpful.

So, this is sort of a speed challenge. Here's some code to create a data example if you want to mess with this:

Code:

mata: // given K = 20 N = 500 // Fake data for A and P A = makesymmetric(floor(K * lowertriangle(runiform(N,N)))) P = makesymmetric(lowertriangle(runiform(N,N))) // LL V = sum(uppertriangle((A :* ln(P)) + (K :- A) :* ln( 1:- P),0)) end

Regards, Mike
Tags: None
Rafal Raciborski (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 83
#2

11 Jan 2016, 18:52

Because matrices A and P are symmetric, we only need their strict lower (or upper) diagonals for calculations. I use the undocumented vech_lower() function that was recently added to Mata so Mike should make sure that his Stata is up to date.

Code:

clear all set seed 12345 mata: k = 20 n = 5000 X = runiform(n,n) Y = runiform(n,n) A = floor(k*vech_lower(X)) P = vech_lower(Y) LL = sum( (A:*ln(P)) + (k:-A):*ln(1:-P) ) end
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#3

12 Jan 2016, 14:30

Thanks, I'll look forward to trying this out in my program and reporting back. I would like the program to be able to run under less recent versions (say v. 12) of Stata, where only vech() tis available, which includes the main diagonal from the matrix into the vector. I need not to include the main diagonal. Is my best best there do something like P = vech(Y-diag(Y)) ?

Regards, Mike
Comment
Rafal Raciborski (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 83
#4

12 Jan 2016, 17:24

This should be a bit faster.

Code:

_diag(X,0) P = vech(X)
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2404

13 Jan 2016, 06:31

Good news and confusing news: I've implemented use of the vech_lower() function, and it saves about 50% of the time at relatively smaller N (say up to 400 or so), but above that, it only saves about 20% or so. That confuses me? Any insight, Rafal or others?
In my application, larger sample sizes are where the time saving really gets interesting, so I'm mildly disappointed <grin>. I created some code from my example to demo this issue (up to date Stata 14 MP2 on a Windows machine). Here are the timings followed by the test code:

Code:

N     time1    time2    % reduction in time
100    1.4     0.725    48
200    2.54    1.23     52
400    8.22    4.03     51
800    34.7    21.6     38
1600   145     103      29
3200   562     432      23

time1 is the time in seconds under the code using the new vech_lower() approach, while time2 is the time under the original approach, with multiplication of whole matrices, viz below:

Code:

clear all
set seed 12345
mata:
   timer_clear(1)
   timer_clear(2)
   reps  = 1000   // to take more time
    k = 20
    for (n = 100; n &lt;=3200; n = n * 2.0) {   // various size ns
      A  = floor(k * runiform(n,n))
      P = runiform(n,n)
      for (i = 1; i &lt;= reps; i++) {
            timer_on(1)  // vech approach
            PV = vech_lower(P)
            AV = vech_lower(A)
            LL = sum( (AV:*ln(PV)) + (k:-AV):*ln(1:-PV) )
            timer_off(1)
            //
            timer_on(2)  // whole matrix approach
            LL = sum(lowertriangle((A :* ln(P)) + (k :- A) :* ln( 1:- P),0))  
            timer_off(2)
       }  
       // crude display of results
       n
       timer()
       ""
       timer_clear(1); timer_clear(2)
   }    
end

Regards, Mike

Comment

Rafal Raciborski (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 83
#6

13 Jan 2016, 09:23

vec() functions are not parallelized, thus the speedup will be lower for large n.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#7

20 Jan 2016, 09:48

I'm writing to close this thread, with what I hope will be some useful information for others:

1) Because vech_lower() was only recently added to Mata, my program using vech_lower() would not run on an earlier version of Stata. This problem appears to be easily solved: The code for vech_lower() is available as vech_lower.mata. I simply inserted that code into my program, called it myvech_lower(), and that should take care of that issue.

2) About the speed issue I mentioned above: At medium size N, the advantages of vech_lower() were substantial, saving about 50% of execution time over an approach in which the whole square matrix was multiplied. However, for larger sizes of the square symmetric matrix, the cost of extracting the lower half into a vector with vech_lower() starts to outweigh the benefit of avoiding redundant operations on the upper half of the matrix. Obviously, the break-even point of cost vs. benefit will depend on the particular computation being done with the square matrix. In my case, the vech_lower() approach was generally about 50% faster up to something like a 400 X 400 square matrix (my A matrix above), was only about 30% better for a 1600 X 1600, and was about 10% worse with a 3200 X 3200 matrix. This is clearly a case where one's mileage will vary, but it's worth doing a few tests if your usage of vech_lower() occurs in a speed-crucial context.

Regards, Mike
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#8

20 Jan 2016, 20:56

I need to make a correction to my comments on vech_lower(), thanks to a message from Rafal Raciborski.
One can't simply substitute in the new code for vech_lower() and use it in older versions. vech_lower needs to call a newer version of vech(), which uses some different code in the executable.
Comment

Announcement