Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 5 minimum values of a vector or matrix

    Hi,

    I am using mata in Stata 12. I have a matrix, say 10x10, and I want to have a value of '1' for the five smallest values for each row in the matrix and '0' otherwise. In a similar post in the past (http://www.stata.com/statalist/archi.../msg00418.html) there is a code to assign a value of '1' for the smallest value of a row, but I couldn't figure out how to do this if I want it to be for the five smallest numbers. Any help will be greatly appreciated. Thanks!

  • #2
    Doing it for the columns is not too difficult with the order() function.

    Here is an example, pulling a 10 by 10 matrix from the auto data:

    Code:
    sysuse auto
    drop make
    mata:
    X = st_data(1::10, 1..10)
    X
    r = cols(X)
    c = cols(X)
    sel = J(r,c,0)
    for (i=1; i<=c; i++) {
            o = order(X,i)
            sel[o[1..5],i] = J(5,1,1)
    }
    sel
    end
    I imagine you could just work with a transposed X matrix, then
    just transpose sel for the matrix you are asking for.

    Comment


    • #3
      I'll just add another example in case efficiency is a concern. Method 1 below uses a single loop through the matrix (O(n) time?). Method 2 is, I hope, what Jeff described, using the order() function (O(nlogn) time due to the sorting?).

      In one benchmark of 5 repetitions on a 10^6 x 10 matrix, the results were:
      Method 1: 11.16sec
      Method 2: 42.53sec
      If there are repeated values of the largest minimum value that push the number of observations that should be tagged over 5, only 5 are selected (eg if your input matrix were (1,2,3,4,4,4,4,5), then (1,1,1,1,1,0,0,0) would be returned, rather than (1,1,1,1,1,1,1,0).

      Method 1: manual loop 5-minimum
      Code:
      clear all
      set processors 1
      mata
      real matrix tag5lowest_mat(real matrix X) {
          
          // Initializations
          real scalar i, c, r
          real vector out
          
          // Set up objects
          r = rows(X)
          c = cols(X)
          out = J(r,c,.)
          
          // Loop and get each vector of minimums individually
          for (i=1;i<=c;i++) out[.,i] = _tag5lowest_vec(X[.,i])
          
          // Return result
          return(out)
          
      }
      real vector _tag5lowest_vec(real vector input) {
          
          // Initializations
          real scalar N, i
          real vector out
          real matrix res
          
          // Set up objects
          N = rows(input)
          res = J(5,2,.) // results vector. col1 = value, col2 = position in input
          
          // Error check    
          if (N<=5) return(input) // no work to do, since length of input is <= 5
          
          // Loop once through input vector
          // -    update the results matrix if input[i] is smaller than 
          //        previous largest result
          for (i=1;i<=N;i++) {
              if (input[i] < res[5,1]) {
                  res[5,1] = input[i]
                  res[5,2] = i
                  _sort(res,1)
              }
          }
          
          // Return a binary vector the same length as input 
          // that tags the 5 smallest obs
          out = J(N,1,0)
          out[res[.,2]] = J(5,1,1)
          return(out)
      }
      end
      Method 2: use mata's order() function (ref Jeff Pitblado)
      Code:
      mata
      real matrix tag5lowest_ordermethod(real matrix X) {
          
          // Initializations
          real scalar r, c, i
          real matrix sel, o
          
          // Set up objects
          r = rows(X)
          c = cols(X)
          sel = J(r,c,0)
          
          // Loop through each column and use order() function
          for (i=1; i<=c; i++) {
                  o = order(X,i)
                  sel[o[1..5],i] = J(5,1,1)
          }
          return(sel)
      }
      end
      Benchmark the two methods
      Code:
      // Benhcmarking parameters
      local reps = 5                             // number of repetitions
      mata A = ceil(10^7*runiform(10^6,10))    // size of matrix to get min of
      
      * Method 1: manual loop 5-minimum
      *     - O(n) time
      forval i = 1/`reps' {
          timer on 1
          qui mata tag5lowest_mat(A)
          timer off 1
      }
      
      * Method 2: use mata's order() function
      *     - O(nlogn) time due to the sort?
      forval i = 1/`reps' {
          timer on 2
          qui mata tag5lowest_ordermethod(A)
          timer off 2
      }
      
      timer list
      // 1: 11.16sec 
      // 2: 42.53sec
      
      exit
      ********************** End Code **************************************

      Comment


      • #4
        Another possibility might be to just use the mata minindex function recursively on the rows. For example:
        Code:
        mata:
        Z=runiform(10,10)
        Mins=J(10,10,0)
        for (i=1;i<=rows(Z);i++) {
            minindex(Z[i,],5,min=.,junk=.)
            Mins[i,min]=J(1,5,1)
        }
        end
        The matrix Min should have ones where the 5 smallest row values are. Just my two cents!

        Matt Baker



        Comment

        Working...
        X