The slow speed of xtile has long irritated me . As an attempt to find a speedy alternative, I posted astile program on this forum (http://www.statalist.org/forums/foru...tile-vs-astile). However, that version had issues and was abandoned. Even so, my motivation for a speedy alternative never died. Thanks to Kit Baum, new version of astile is now available on SSC. Here is the description of the program and some speed tests. To install the package, type
Title
astile - Creates variable containing quantile categories
Syntax
astile newvar = exp [if] [in] [, nquantiles(#) by(varlist)]
Description
astile creates a new variable that categorizes exp by its quantiles. For example, we might be interested in making 10 size-based portfolios. This will involve placing the smallest 10% firms in portfolio 1, next 10% in portfolio 2, and so on. astile creates a new variable as specified in the newvar option from the existing variable which is specified in the = exp. Values of the newvar ranges from 1, 2, 3, ... up to n, where n is the maximum number of quantile groups specified in the nq option. For example, if we want to make 10 portfolios, values of the newvar will range from 1 to 10.
astile is faster than Stata official xtile. It's speed efficiency matters more in larger data sets or when the quantile categories are created multiple times, e.g, we might want to create portfolios in each year or each month. Unlike Stata's official xtile, astile is byable.
Options
astile has the following two optional options.
1. nquantiles
The nq(#) option specifies the number of quantiles. The default value of nq is 2, that is the median.
2. by
astile is byable. Hence, it can be run on groups as specified by option by(varlist).
Example 1: Create 10 groups of firms based on thier market value
.
Example 2: Create 5 groups of firms based on thier market value in each year
Limitatons
This version of astile does not support weights, altdef and cutpoint options that are available in the official xtile function. In the next version, I plan to include some of these options.
SPEED COMPARISON
The following tests are performed using Stata 14.2. The test results might vary from computer to computer based on CPU speed.
Without by Option
* To generate an example data set of one million observations
* Both xtile (from egenmore package and astile perform three time faster than the official xtile, with marginal speed efficiency for astile over xtile from egenmore.
With By Option
Since the official xtile does not have a by option, I would compare astile with xtile from egenmore package.
Code:
ssc install astile help astile
astile - Creates variable containing quantile categories
Syntax
astile newvar = exp [if] [in] [, nquantiles(#) by(varlist)]
Description
astile creates a new variable that categorizes exp by its quantiles. For example, we might be interested in making 10 size-based portfolios. This will involve placing the smallest 10% firms in portfolio 1, next 10% in portfolio 2, and so on. astile creates a new variable as specified in the newvar option from the existing variable which is specified in the = exp. Values of the newvar ranges from 1, 2, 3, ... up to n, where n is the maximum number of quantile groups specified in the nq option. For example, if we want to make 10 portfolios, values of the newvar will range from 1 to 10.
astile is faster than Stata official xtile. It's speed efficiency matters more in larger data sets or when the quantile categories are created multiple times, e.g, we might want to create portfolios in each year or each month. Unlike Stata's official xtile, astile is byable.
Options
astile has the following two optional options.
1. nquantiles
The nq(#) option specifies the number of quantiles. The default value of nq is 2, that is the median.
2. by
astile is byable. Hence, it can be run on groups as specified by option by(varlist).
Example 1: Create 10 groups of firms based on thier market value
.
Code:
webuse grunfeld . astile size10=mvalue, nq(10)
Example 2: Create 5 groups of firms based on thier market value in each year
Code:
. webuse grunfeld . astile size5=mvalue, nq(5) by(year)
This version of astile does not support weights, altdef and cutpoint options that are available in the official xtile function. In the next version, I plan to include some of these options.
SPEED COMPARISON
The following tests are performed using Stata 14.2. The test results might vary from computer to computer based on CPU speed.
Without by Option
* To generate an example data set of one million observations
Code:
clear set obs 1000 gen year=_n+1000 expand 1000 bys year: gen id=_n gen size=uniform()*100 timer clear timer on 1 egen xt10=xtile(size), nq(10) // from egenmore package timer off 1 timer on 2 astile as10=size, nq(10) timer off 2 timer on 3 xtile of10=size, nq(10) // Stata official timer off 3 assert as10==of10 timer list 1: 4.71 / 1 = 4.7130 2: 3.61 / 1 = 3.6140 3: 9.01 / 1 = 9.0050
* Both xtile (from egenmore package and astile perform three time faster than the official xtile, with marginal speed efficiency for astile over xtile from egenmore.
With By Option
Since the official xtile does not have a by option, I would compare astile with xtile from egenmore package.
Code:
. timer clear . timer on 1 bys year: egen yxt10=xtile(size), nq(10) timer off 1 timer on 2 bys year: astile yas10=size, nq(10) timer off 2 timer list 1: 1037.37 / 1 = 1037.3700 2: 198.31 / 1 = 198.3080 assert yxt10==yas10
Comment