ddml: number of repititions

Henry Strawforrd

Join Date: Sep 2021

Posts: 228
#1

ddml: number of repititions

25 Sep 2024, 00:36

Anybody here with experience using the excellent ddml package from Ahrens et al.

I am trying to figure how large the number of repetitions should be. The help file confuses me. There is also the option for kfolds() which is the number of cross-validation folds. What then, exactly, is repititions if there we are setting the number of cross-validations?
Tags: None
John Moran

Join Date: Oct 2015

Posts: 17
#2

25 Sep 2024, 23:57

Hi Henry

I guess it depends upon your model structure (and number of learners) and observation numbers. In the 2024 SJ paper (page 38) and the Feb 2023 Discussion paper (IZA Institute of Labor Economics, Paper 15963), page 32:

Initialize ddml model (N=2217, "market for automobiles")
.set seed 123
.ddml init fiv, kfolds(4) reps(5)
Note that in the ddml init step, we include the option reps(5) which will result
in running the full cross-fitting procedure five times, each with a different random split
of the data. Replicating the procedure multiple times allows us to gauge the impact of
randomness due to the random splitting of the data into subsamples.

Such is not the case, for the "financial wealth" example ( N=9915).

Note also there is an option "njobs" in the learner syntax for paralleiization.

I happen to use kfolds(5) and reps(5) with parallelization, but this may be overkill.

Hopefully we may be guided by an author comment.

john moran
1 like
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#3

03 Jan 2025, 08:28

DML relies on random fold splitting (as part of the cross-fitting procedure). Each random fold split gives you a different DML estimate. All are equally valid, but to avoid unnecessary dependence on random fold splitting, you can repeat the random fold splitting and aggregate the DML estimates (take average or median). That's what `reps()` does. `reps()` refers to how often a DML estimation is repeated with different random fold splits. This is explained in Remark 2 in the SJ paper.

`kfolds()` refers to the number of folds. See Remark 1 in the SJ paper.

SJ paper: https://journals.sagepub.com/doi/ful...6867X241233641

Working paper version of our SJ paper: https://arxiv.org/abs/2301.09397

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#4

03 Jan 2025, 15:48

Finally, note that ddml supports both median and mean aggregation. See `ddml estimate`.

Last edited by Achim Ahrens; 03 Jan 2025, 15:51.

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
Comment

Announcement

ddml: number of repititions

Comment

Comment

Comment