Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • why does st_addvar(,,nofill=1) fills values with zero?

    Hi,
    I just tried to generate generic mock data as fast as possible with my up-to-date Stata 13.1 MP for the mac. I though the fastest (though irreproducible) way would be to declare vars and let Stata use whatever values it finds at its allocation in RAM. Instead, when I used -st_addvar()- with the nofill option, I generated variables filled with zeros (instead of . when nofill=0). Why is this option useful, then? Or is this OS-specific, and the mac (OS 10.10) is careful enough to erase RAM before it gives it to Stata? Or Stata erases it before Mata (re)uses it? Or did I just get unlucky to be allocated a blank part of RAM?

    Yes, setting up a view after -st_addvar- and using -runiform- for all variables at once is not prohibitive, but still slower.

    Otherwise the fastest way to generate 19 random variables for 100000 observations in Stata (I know of) is:
    Code:
    clear all
    set obs 100000000
    mata:
    idx = st_addvar("double",("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19"),1)
    V = J(0,0,.)
    st_view(V,.,idx)
    V[.,.] = runiform(100000000,19)
    end

  • #2
    Hi László,

    See the following code for some benchmarks. I'm far from an expert, so take any comments of mine with a heavy grain of salt.

    First, see test3() below for st_addvar() with the nofill option. The observations of the generated variables I get, when just using what was previously allocated in RAM, are all nonzero. I'm using a PC, with Stata 13.1. If RAM is already allocated to the relevant Stata instance, the variables are created instantaneously.

    See below for 3 test programs (test1(): your version using runiform() and st_view(), test2(): another version that uses runiform() but directly saves to Stata using st_store(), and finally test3(): a version that just allocates RAM using the nofill=1 option of st_addvar().

    Results
    Benchmarked over 3 repetitions, for 10^8 observations
    • test1() total time: 74.40s
    • test2() total time: 63.48s
    • test3() total time: 0.00s
    Define Programs
    Code:
    clear all
    
    *******************
    * Define Programs *
    *******************
    * test1(): method using st_view() and runiform(N,19)
    * test2(): method using st_store() directly and runiform(N,19)
    * test3(): method using st_store(), taking somewhat "random" content already in RAM
    
    
    mata
        void test1() {
            N = 10^8
            
            // Skip initializations and store to view using st_view()
            idx = st_addvar("double",("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19"),1)
            V = J(0,0,.)
            st_view(V,.,idx)
            V[.,.] = runiform(N,19)
        }
        void test2() {
            // Initializations
            real scalar N
            real vector idx
            real matrix X
            N = 10^8
            
            // Store data directly with st_store()
            st_store(.,st_addvar("double",("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19"),1),runiform(N,19))
        }
        void test3() {
            // Use not so "random" content already in RAM
            (void) st_addvar("double",("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18","x19"),1)
        }
    end
    Benchmark Programs
    Code:
    **********************
    * Benchmark Programs *
    **********************
    
    * Set parameters
    local N = 10^8
    local reps = 3
    
    * Ensure that RAM allocation time isn't a factor in comparisons
    set min_memory 20g
    set obs `N'
    mata test3()
    
    * Time test1()
    forval i = 1/`reps' {
        clear
        set obs `N'
        timer on 1`i'
        mata test1()
        timer off 1`i'
    }
    
    * Time test2()
    forval i = 1/`reps' {
        clear
        set obs `N'
        timer on 2`i'
        mata test2()
        timer off 2`i'
    }
    
    * Time test3()
    forval i = 1/`reps' {
        clear
        set obs `N'
        timer on 3`i'
        mata test3()
        timer off 3`i'
    }
    
    timer list
    local reps = 3
    
    * Display total times over all reps *
    qui foreach prog in 1 2 3 {
        local total = 0
        forval i = 1/`reps' {
            local total = `total'+r(t`prog'`i')
        }
        noi display as text "test`prog'() total time: " %5.2f `total' "s"
    }
    *************************************
    exit

    Comment


    • #3
      Excellent, thanks! I definitely learnt that st_store is faster than st_view, I wonder why I had a different impression.

      But also, it is fascinating to learn that 'nofill' actually does what I expected (and is superfast for benchmarking some code on big data, when reproducibility is not a key concern), with the caveat that it depends on the OS giving it "random" data, and apparently my systems gave Stata/Mata some cleared memory space. This is actually good for security, I am not sure I want applications have access to the remnants of the stack of another application. Though my understanding was that it is the responsibility of the application to erase sensitive data if necessary, and let the OS be as fast as possible, without erasing memory. Fascinating.

      Comment


      • #4
        A couple side notes that have come up for me before that maybe someone could comment on:

        A fine point on memory usage
        I'd be very interested if someone could comment on this. The issue has to do with how assignment happens from the right hand side to the left hand side of the equation. In the earlier code, we first allocate memory to the variables in an amount equal to 15.2GB, or 19 variables * 10^8 observations * 8 bytes per element / (10^9 bytes in a gigabyte). However, if you run test1() or test2() while watching task manager, memory usage will peak around 30.4GB!

        The issue is in the line V[.,.] = runiform(N,19). Before the line is executed, V exists and is taking up 15.2GB of space. But before runiform(N,19) is written to V, it must be constructed itself. So instantaneously we have both the left hand side and right hand side existing independently, or a 15.2GB chunk of RAM for V and another 15.2GB chunk of RAM for runiform(N,19). This seems incredibly wasteful. Why can't the right hand side be written directly into the memory identified by the left hand side (thereby only requiring 15.2GB peak memory usage rather than 30.4GB)?

        A question on nofill
        nofill has been very useful to me when writing code more efficiently for cases where all elements of the object being created are about to be replaced. nofill is available in st_addvar() and st_addobs(). Why is it not also available for mata's J() function? (aside - would this somewhat parallel C's malloc vs calloc functions?)

        Comment


        • #5
          Originally posted by Andrew Maurer View Post
          Before the line is executed, V exists and is taking up 15.2GB of space.
          You raise two great points. That said, I think this specific sentence is wrong. V exists, but is very small before this line is executed. It is still a shame Mata is not smarter about making certain copies. (Though they definitely try hard to economize on copies, thence views in the first place…)

          Comment

          Working...
          X