Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Yes, the name of the fastxtile also confused me. I thought I was comparing the same the same program.

    1. The timing for the new astile and fastxtile ( Michael Stepner ) are given below

    .
    Code:
    clear
    set seed 1234
    set obs 1000
    gen year=_n+1000
    expand 1000
    gen size=uniform()*100
    timer clear
    
    . timer on 1
    
    . fastxtile ft10=size, nq(10)
    
    . timer off 1
    
    . timer on 2
    
    . astile as10=size, nq(10)
    
    . timer off 2
    
    . timer list
       1:      8.73 /        1 =       8.7270
       2:      6.48 /        1 =       6.4800
    
    assert ft10 ==as10
    2. Yes, I am using two different machines. The first post was based on the most latest machine, which I have in my office. All other tests, including today's, are based on my home computer which is half as fast as my office machine.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #17
      Attaullah Shah, I can see from your work on astile that it can be quite useful to have a by-able version of xtile. So I've made some tweaks to fastxtile to make it by-able as well. The updated by-able version is currently available from the development branch of the fastxtile Github repository. It will eventually be available in the SSC as well, but for now it can be installed directly by running:

      Code:
      net install fastxtile, from("https://github.com/michaelstepner/fastxtile/raw/develop") replace
      In the Github repository, you will also find a file called "test_fastxtile.do" that runs a battery of tests comparing the speed of fastxtile to xtile and ensuring that fastxtile accurately matches the xtile results. I adapted that code to compare fastxtile with astile, and have posted the speed comparison results here (3 files: a table comparing runtimes, a log, and the code).
      • You can see that the speed of astile and fastxtile are similar: usually within a factor of two.
      • astile is faster on small datasets that take a fraction of a second to process on my computer.
      • fastxtile is faster as datasets get larger, including all the datasets that take more than a second to process on my computer. (Of course the precise run times will differ on different systems.)
      • You can also see in the log that astile crashed when asked to calculate 2 quantile bins for a dataset with 10,000,000 observations, 1 variable, and 2 by-groups.

      I've also run a speed test comparing fastxtile and xtile, and posted the results here. The test is interrupted: I ran it overnight, and interrupted it when I woke up in the morning. The reason it's taking so long is because xtile is very slow at some of these large computations: slower than fastxtile by a factor of 100 (2 orders of magnitude) with 10,000,000 observations, 100 quantiles and 20 by-groups. I'll complete this speed test on a server when I publish the updated version of fastxtile to the SSC, so I can publish a full companion table of speed comparisons.

      At the moment, it appears that fastxtile and astile are both leaps and bounds faster than the built-in xtile, while reproducing its results. But fastxtile is a bit faster than astile in the cases where it matters most: when the data is big and the command takes a non-trivial amount of time to run.

      I develop fastxtile in the open on Github: https://github.com/michaelstepner/fa...e/tree/develop. Attaullah Shah, if you think there are improvements to be made to fastxtile, you are welcome to contribute them on Github. If you do, I'd be happy to credit you accordingly.

      All the best,
      Michael

      Comment


      • #18
        @ Michael Stepner

        I appreciate that you have conducted comprehensive tests and have spent a lot of time on comparing astile, xtile and fastxtile. I hope these tests will give better idea to users about each program. I myself conducted some tests which I want to report here. From the comparison of testing astile and fastxtile on different machines that have different CPU speeds and RAM capabilities, I think the results significantly vary. Generally, on older and relatively slow machines, astile has a some speed efficiency over fastxtile, while the converse is true when we use computers with significant CPU speed and bigger RAMs etc. Following are few tests that I conducted on my home PC, that has 6 GB of RAM, Intel (R) Core(TM)2 CPU 6400@ 2.13GHz 2.13GHz, 64 bit Operating System, Windows 10, Stata 13.

        Program versions
        Code:
         which fastxtile
        c:\ado\plus\f\fastxtile.ado
        *! version 2.0.0beta1  26mar2017  Michael Stepner, [email protected]
        
        . which astile
        c:\ado\plus\a\astile.ado
        *! 3.0.0  :  1Apr2017 , Added speed efficiency
        *! Author : Attaullah Shah: [email protected]


        First small data sets

        Code:
        local i = 2
        putexcel    A1=("astile")    B1=("fastxtile") using "compare", modify
        
        forv obs = 1000(5000)5000000{
            clear
            qui set obs `obs'
            gen x=uniform()
            
            timer clear
            timer on 1
            astile ast=x, nq(10)
            timer off 1
            
            timer on 2
            fastxtile fa=x, nq(10)
            timer off 2
            timer list
            qui putexcel ///
            A`i' = (`obs') ///
            B`i'=(`r(t1)') ///
            C`i'=(`r(t2)') ///
            using compare, modify
            dis "rep `i' of total 20"
            local i = `i'+1
            }
        ================================= 
        obs astile fastxtile % Difference
        1000 0.009 0.019 -111.1%
        51000 0.197 0.366 -85.8%
        101000 0.424 0.593 -39.9%
        151000 0.611 0.909 -48.8%
        201000 0.84 1.224 -45.7%
        251000 1.096 1.573 -43.5%
        301000 1.375 2.013 -46.4%
        351000 1.583 2.232 -41.0%
        401000 1.834 2.603 -41.9%
        451000 2.073 2.94 -41.8%
        501000 2.317 3.307 -42.7%
        551000 2.547 3.694 -45.0%
        601000 2.814 3.967 -41.0%
        651000 3.086 4.493 -45.6%
        701000 3.365 4.753 -41.2%
        751000 3.601 5.127 -42.4%
        801000 3.898 5.506 -41.3%
        851000 4.401 6.63 -50.6%
        901000 5.614 7.029 -25.2%
        951000 4.886 7.785 -59.3%
        1001000 5.381 7.055 -31.1%
        1051000 5.307 7.376 -39.0%
        1101000 5.78 7.893 -36.6%
        1151000 7.108 8.386 -18.0%
        1201000 7.472 8.61 -15.2%
        1251000 6.471 9.102 -40.7%
        1301000 6.778 9.352 -38.0%
        1351000 7.113 10.835 -52.3%
        1401000 7.425 10.116 -36.2%
        1451000 7.668 10.452 -36.3%
        1501000 8.003 11.125 -39.0%
        1551000 8.283 11.506 -38.9%
        1601000 8.576 11.62 -35.5%
        1651000 9.04 12.134 -34.2%
        1701000 9.088 12.485 -37.4%
        1751000 9.44 14.425 -52.8%
        1801000 9.903 14.173 -43.1%
        1851000 9.873 13.462 -36.4%
        1901000 10.349 15.318 -48.0%
        1951000 12.455 14.983 -20.3%
        2001000 12.192 16.195 -32.8%
        2051000 12.443 15.766 -26.7%
        2101000 12.468 16.396 -31.5%
        2151000 14.14 20.883 -47.7%
        2201000 11.967 18.061 -50.9%
        2251000 13.668 21.058 -54.1%
        2301000 25.593 23.02 10.1%
        2351000 14.497 21.566 -48.8%
        2401000 14.869 22.317 -50.1%
        2451000 13.77 19.268 -39.9%
        2501000 15.278 21.409 -40.1%
        2551000 16.062 20.384 -26.9%
        2601000 14.255 19.404 -36.1%
        2651000 15.836 20.874 -31.8%
        2701000 14.73 20.894 -41.8%
        2751000 16.125 22.335 -38.5%
        2801000 19.254 21.583 -12.1%
        2851000 17.793 21.105 -18.6%
        2901000 22.513 24.753 -9.9%
        2951000 17.353 25.532 -47.1%
        3001000 17.824 22.883 -28.4%
        3051000 17.744 23.492 -32.4%
        3101000 17.465 26.09 -49.4%
        3151000 19.665 28.635 -45.6%
        3201000 18.339 28.063 -53.0%
        3251000 19.039 25.342 -33.1%
        3301000 19.2 26.64 -38.8%
        3351000 21.3 27.45 -28.9%
        3401000 21.313 26.678 -25.2%
        3451000 21.023 28.659 -36.3%
        3501000 19.926 27.495 -38.0%
        3551000 21.887 27.61 -26.1%
        3601000 23.783 28.798 -21.1%
        3651000 23.025 30.15 -30.9%
        3701000 22.819 29.914 -31.1%
        3751000 23.195 34.669 -49.5%
        3801000 23.624 35.934 -52.1%
        3851000 22.787 32.117 -40.9%
        3901000 22.159 31.199 -40.8%
        3951000 25.108 35.141 -40.0%
        4001000 24.803 33.583 -35.4%
        4051000 22.414 31.993 -42.7%
        4101000 23.749 36.759 -54.8%
        4151000 26.943 35.342 -31.2%
        4201000 31.937 32.53 -1.9%
        4251000 24.598 32.854 -33.6%
        4301000 25.346 36.673 -44.7%
        4351000 29.867 33.915 -13.6%
        4401000 30.745 35.998 -17.1%
        4451000 24.996 35.225 -40.9%
        4501000 26.158 35.3 -34.9%
        4551000 26.028 35.408 -36.0%
        4601000 26.806 35.585 -32.8%
        4651000 27.795 35.768 -28.7%
        4701000 29.9 38.513 -28.8%
        4751000 26.957 37.089 -37.6%
        4801000 29.373 44.476 -51.4%
        4851000 29.344 37.748 -28.6%
        4901000 27.828 37.804 -35.8%
        4951000 28.59 39.973 -39.8%
        Relatively Large Data sets
        Code:
        local i = 2
        putexcel    A1=("astile")    B1=("fastxtile") using "compare2", modify
        
        forv obs = 1000000(1000000)50000000{
            clear
            qui set obs `obs'
            gen x=uniform()
            
            timer clear
            timer on 1
            astile ast=x, nq(10)
            timer off 1
            timer on 2
            fastxtile fa=x, nq(10)
            
            timer off 2
            timer list
            qui putexcel ///
            A`i' = (`obs') ///
            B`i'=(`r(t1)') ///
            C`i'=(`r(t2)') ///
            using compare2, modify
            dis "rep `i' of total 100"
            local i = `i'+1
            }
        
        ================================== 
        obs astile fastxtile % Difference
        1000000 4.941 6.912 -39.89%
        2000000 10.609 14.381 -35.55%
        3000000 17.219 23.25 -35.03%
        4000000 22.166 30.524 -37.71%
        5000000 29.433 37.71 -28.12%
        6000000 35.67 45.974 -28.89%
        7000000 41.339 55.805 -34.99%
        8000000 48.383 63.907 -32.09%
        9000000 54.351 72.217 -32.87%
        10000000 61.13 82.498 -34.96%
        11000000 67.215 91.032 -35.43%
        12000000 76.225 100.191 -31.44%
        13000000 86.609 112.405 -29.78%
        14000000 91.812 121.985 -32.86%
        15000000 100.843 127.168 -26.10%
        16000000 108.818 129.391 -18.91%
        17000000 134.025 147.642 -10.16%
        18000000 132.431 168.233 -27.03%
        19000000 146.897 172.879 -17.69%
        20000000 143.319 175.564 -22.50%
        It would be interesting to conduct other tests even on relatively slower computers and faster ones, with different data types, nquantile options etc. Thanks for your efforts and interest.
        Last edited by Attaullah Shah; 08 Apr 2017, 15:05.
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #19
          Hello Attaullah, I was wondering if it is also possible to double-sort as I need to perform a bivariate dependent-sort analysis. Thank you in advance!

          Comment


          • #20
            astile supports double-sort, you just have to add two variables after the bys prefix
            Code:
            bys var1 var2 : astile newvar = existin_var, nq(10)
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment


            • #21
              Dear Attaullah,

              I appreciate your response and am happy to know that double-sorting does work with astile. However, I do have some quick additional questions:

              - I have to sort in a specific order because I am doing a dependent-sort, meaning that the first variable would be the control variable. And then within the first-sort, I was to sort again by my independent variable. In the end, I want to find the relation between the independent variable and the dependent variable, given the control variable.

              So, var1 would be my control variable and var2 would be my independent variable. However, I do not understand then what the existing_var would be?

              Also, does the nq(5), for example, apply to quintile sorting for the first variable and then within that sort, another quintile sorting?

              Thank you kindly for your help!

              Best regards, Kate
              Last edited by Kate Lussy; 28 Apr 2019, 14:43.

              Comment


              • #22
                I did not understand the question clearly. If you mean to say that first you would create quantile groups on one variable, and then within each group, you would again create quantile groups, then you have to use astile twice. So in the following example, we first create two groups based on the median of invest, call it nq_invest. Then each year we create three groups based on mvalue variable for each of the two nq_invest groups. This way this is a dependent sort. i.e. sorting based on year, and within each year, sorting on nq_invest.

                Code:
                webuse grunfeld, clear
                astile nq_invest = invest, nq(2)
                bys year nq_invest : astile nq_mvalue=mvalue , nq(3)
                Regards
                --------------------------------------------------
                Attaullah Shah, PhD.
                Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                FinTechProfessor.com
                https://asdocx.com
                Check out my asdoc program, which sends outputs to MS Word.
                For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                Comment

                Working...
                X