I'm excited to announce a pair of packages that radically speeds up reghdfe--at least for hard problems where speed is an issue. The spark for this project is the observation that the equivalent in the Julia programming language of Sergio Correia's path-breaking reghdfe--the FixedEffectModels.jl package by Matthieu Gomez--is much faster. The new command reghdfejl is meant to mimic reghdfe while calling Julia to do the hard work.
Below is an edited log comparing three ways to perform the same estimate on a data set with 100,000,000 observations and two sets of absorbed fixed effects. It is run in Stata with 1 processor, on a high-end Windows laptop. First reghdfe is used, then reghdfejl, then reghdfejl with the "gpu" option to request computation on an NVIDIA GPU. (Apple Silicon GPUs are also supported.) The run times are 312, 36, and 27 seconds.
To do the same on your computer you need to:
Because Julia uses just-in-time compilation, first calls to "jl" and Julia-based programs such as "reghdfejl" are slow--very slow if required Julia packages need to be downloaded and installed. Be patient.
The "jl" command is meant as low-level infrastructure. Its core is written in C/C++ and separately compiled for four platforms (Windows, Linux, Intel Mac, ARM Mac). It includes subcommand for high-speed, C-based copying of data between Stata and Julia. But see the readme for a simple of example of its use in estimation.
reghdfejl starts by copying the data needed for estimation into a Julia DataFrame. This duplication of data takes a bit of time and potentially a lot of RAM. In extreme cases, it will more than double the storage demand because even variables stored in Stata in small types, such as byte, will be stored double-precision--8 bytes per value--in Julia. If the memory demand is too great, performance will plummet. reghdfejl therefore is most useful when you have plenty of RAM, when the number of non-absorbed regressors is low (fewer variables need copying), and when the number of absorbed terms is high (for then the computational efficiency of Julia shines).
reghdfejl lacks some reghdfe features that are typically secondary for users:
This is a brand new project with a lot of moving parts. Think of this as the beta release. Please post about any installation problems.
Benchmark log:
Below is an edited log comparing three ways to perform the same estimate on a data set with 100,000,000 observations and two sets of absorbed fixed effects. It is run in Stata with 1 processor, on a high-end Windows laptop. First reghdfe is used, then reghdfejl, then reghdfejl with the "gpu" option to request computation on an NVIDIA GPU. (Apple Silicon GPUs are also supported.) The run times are 312, 36, and 27 seconds.
To do the same on your computer you need to:
- Install the "julia" module for Stata, which includes the "jl" command for accessing Julia. Type "ssc install julia".
- Type "help jl" and read and follow the instructions in the Installation section to install the Julia programming language (which is free). On Windows and macOS machines, there is a complication related to assuring that Stata can find Julia once installed.
- In Stata, type "ssc install reghdfejl" to add reghdfejl.
Code:
. jl: "Hello world!" Hello world! . jl: sqrt(2) 1.4142135623730951
The "jl" command is meant as low-level infrastructure. Its core is written in C/C++ and separately compiled for four platforms (Windows, Linux, Intel Mac, ARM Mac). It includes subcommand for high-speed, C-based copying of data between Stata and Julia. But see the readme for a simple of example of its use in estimation.
reghdfejl starts by copying the data needed for estimation into a Julia DataFrame. This duplication of data takes a bit of time and potentially a lot of RAM. In extreme cases, it will more than double the storage demand because even variables stored in Stata in small types, such as byte, will be stored double-precision--8 bytes per value--in Julia. If the memory demand is too great, performance will plummet. reghdfejl therefore is most useful when you have plenty of RAM, when the number of non-absorbed regressors is low (fewer variables need copying), and when the number of absorbed terms is high (for then the computational efficiency of Julia shines).
reghdfejl lacks some reghdfe features that are typically secondary for users:
- It does not correct the estimates of the degrees of freedom consumed by absorbed fixed effects for collinearity and redundance among fixed-effect dummies. reghdfe displays these corrections in a table after the main results. reghdfejl does not.
- It does not offer the Group FE features.
- It does not allow control over whether the constant term is reported. The constant is always absorbed.
- It does not offer options such as technique() that give finer control over the algorithm. But these are largely obviated by reghdfejl's speed.
This is a brand new project with a lot of moving parts. Think of this as the beta release. Please post about any installation problems.
Benchmark log:
Code:
. scalar N = 100000000 . scalar K = 100 . set obs `=N' . gen id1 = runiformint(1, N/K) . gen id2 = runiformint(1, K) . drawnorm x1 x2 . gen double y = 3 * x1 + 2 * x2 + sin(id1) + cos(id2) + runiform() . set rmsg on . . set processors 1 . reghdfe y x1 x2, a(id1 id2) cluster(id1 id2) (MWFE estimator converged in 4 iterations) Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied. HDFE Linear regression Number of obs = 100000000 Absorbing 2 HDFE groups F( 2, 99) = 9.35e+09 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.9941 Adj R-squared = 0.9941 Number of clusters (id1) = 1,000,000 Within R-sq. = 0.9936 Number of clusters (id2) = 100 Root MSE = 0.2887 (Std. err. adjusted for 100 clusters in id1 id2) ------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x1 | 2.999985 .0000282 1.1e+05 0.000 2.999929 3.000041 x2 | 2.000006 .0000271 7.4e+04 0.000 1.999952 2.00006 _cons | .4946679 4.33e-09 1.1e+08 0.000 .4946679 .494668 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id1 | 1000000 1000000 0 *| id2 | 100 100 0 *| -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation r; t=312.21 16:51:54 . reghdfejl y x1 x2, a(id1 id2) cluster(id1 id2) (MWFE estimator converged in 6 iterations) HDFE linear regression with Julia Number of obs = 100000000 Absorbing 2 HDFE groups F( 2, 99) = 9.35e+09 Statistics cluster-robust Prob > F = 0.0000 R-squared = 0.9941 Adj R-squared = 0.9941 Number of clusters (id1) = 1000000 Within R-sq. = 0.9936 Number of clusters (id2) = 100 Root MSE = 0.2887 (Std. err. adjusted for 100 clusters in id1 id2) ------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x1 | 2.999985 .0000282 1.1e+05 0.000 2.999929 3.000041 x2 | 2.000006 .0000271 7.4e+04 0.000 1.999952 2.00006 ------------------------------------------------------------------------------ r; t=35.94 16:56:13 . reghdfejl y x1 x2, a(id1 id2) cluster(id1 id2) gpu (MWFE estimator converged in 8 iterations) HDFE linear regression with Julia Number of obs = 100000000 Absorbing 2 HDFE groups F( 2, 99) = 9.35e+09 Statistics cluster-robust Prob > F = 0.0000 R-squared = 0.9941 Adj R-squared = 0.9941 Number of clusters (id1) = 1000000 Within R-sq. = 0.9936 Number of clusters (id2) = 100 Root MSE = 0.2887 (Std. err. adjusted for 100 clusters in id1 id2) ------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x1 | 2.999985 .0000282 1.1e+05 0.000 2.999929 3.000041 x2 | 2.000006 .0000271 7.4e+04 0.000 1.999952 2.00006 ------------------------------------------------------------------------------ r; t=27.28 16:58:22
Comment