Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ddml: number of repititions

    Anybody here with experience using the excellent ddml package from Ahrens et al.

    I am trying to figure how large the number of repetitions should be. The help file confuses me. There is also the option for kfolds() which is the number of cross-validation folds. What then, exactly, is repititions if there we are setting the number of cross-validations?

  • #2
    Hi Henry

    I guess it depends upon your model structure (and number of learners) and observation numbers. In the 2024 SJ paper (page 38) and the Feb 2023 Discussion paper (IZA Institute of Labor Economics, Paper 15963), page 32:

    Initialize ddml model (N=2217, "market for automobiles")
    .set seed 123
    .ddml init fiv, kfolds(4) reps(5)
    Note that in the ddml init step, we include the option reps(5) which will result
    in running the full cross-fitting procedure five times, each with a different random split
    of the data. Replicating the procedure multiple times allows us to gauge the impact of
    randomness due to the random splitting of the data into subsamples.

    Such is not the case, for the "financial wealth" example ( N=9915).

    Note also there is an option "njobs" in the learner syntax for paralleiization.

    I happen to use kfolds(5) and reps(5) with parallelization, but this may be overkill.

    Hopefully we may be guided by an author comment.

    john moran


    Comment


    • #3
      DML relies on random fold splitting (as part of the cross-fitting procedure). Each random fold split gives you a different DML estimate. All are equally valid, but to avoid unnecessary dependence on random fold splitting, you can repeat the random fold splitting and aggregate the DML estimates (take average or median). That's what `reps()` does. `reps()` refers to how often a DML estimation is repeated with different random fold splits. This is explained in Remark 2 in the SJ paper.

      `kfolds()` refers to the number of folds. See Remark 1 in the SJ paper.

      SJ paper: https://journals.sagepub.com/doi/ful...6867X241233641

      Working paper version of our SJ paper: https://arxiv.org/abs/2301.09397
      --
      Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.

      Comment


      • #4
        Finally, note that ddml supports both median and mean aggregation. See `ddml estimate`.
        Last edited by Achim Ahrens; 03 Jan 2025, 16:51.
        --
        Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.

        Comment

        Working...
        X