ddml crossfit error ending in r(7102)

Hannah Beilby

Join Date: Apr 2024
Posts: 16

ddml crossfit error ending in r(7102)

07 Aug 2024, 01:31

Hi Statalist,

After spending all afternoon figuring out how to install packages (scikit-learn) on Python, create a virtual environment and get Python to talk to Stata via the new ddml package - I am sad to have retrieved the following longwinded error message:

Code:

 
. ddml crossfit
Cross-fitting E[y|X,D] equation: nllonely
Resample 1...
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Resample 2...
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Resample 3...
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Resample 4...
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Resample 5...
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: source
Resample 1...
Cross-fitting fold 1 joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\jo
> blib\_utils.py", line 72, in __call__
    return self.func(**kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\jo
> blib\parallel.py", line 598, in __call__
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\utils\parallel.py", line 136, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\ensemble\_base.py", line 40, in _fit_single_estimator
    estimator.fit(X, y, **fit_params)
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\base.py", line 1473, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\pipeline.py", line 473, in fit
    self._final_estimator.fit(Xt, y, **last_step_params["fit"])
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
> learn\utils\_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'loss' parameter of Gradien
> tBoostingClassifier must be a str among {'exponential', 'log_loss'}. Got 'deviance' 
> instead.
"""

The above exception was the direct cause of the following exception:

# Repeat of above from "Traceback ... Got 'deviance' instead'"

r(7102);

After loading and cleaning my data, the rest of the relevant code is:

Code:

 
* DDML - interactive model
global Y nllonely
global D source
global X male dagecat2 dagecat3 dagecat4 dagecat5 dagecat6 fulltime parttime unemployed uni postgrad kids couplenodeps hshareother mid advant nbh_id_7pt pnq3
set seed 123

*estimate the model 5 times using randomly chosen folds.
ddml init interactive, kfolds(5) reps(5)

*consider two supervised learners: linear regression and gradient boosted trees, stacked using pystacked.
ddml E[Y|X,D]: pystacked $Y $X, type(reg) methods(ols gradboost)
ddml E[D|X]: pystacked $D $X, type(class) methods(logit gradboost)

*cross-fit - stops at E[D|X] equation: source Cross-fitting fold 1 
ddml crossfit

*estimate the average treatment effect (the default)
ddml estimate

Would anyone have any suggestions on how to solve - except for going to the authors? Grateful for any tips.

Kind regards,
Hannah

Tags: None

Daniel Schaefer

Join Date: Mar 2020

Posts: 794
#2

07 Aug 2024, 16:36

The problem you are having is on the python side in sklearn.

sklearn.utils._param_validation.InvalidParameterEr ror: The 'loss' parameter of GradientBoostingClassifier must be a str among {'exponential', 'log_loss'}. Got 'deviance' instead.

The problem is that GradientBoostingClassifier in sklearn has a parameter called "loss" that only accepts the strings 'exponential' or 'log_loss', but is getting 'deviance' instead. If you roll back the GradientBoostingClassifier to version 1.2.2, you should see that it used to accept 'deviance' in an older version. Looks like the authors of the Stata package haven't been maintaining it.

You should be able to fix this issue by rolling back your version of sklearn to version 1.2.2. It should be on 1.5.1 currently. You might want to figure out which version of python that was built for and roll back your python instance as well. If it were me, I'd just see if I can do this in python instead. Python is the best platform out there for any kind of machine learning. Just my two cents.

Edit: Actually, version 1.0 might be better since the deviance option was depreciated before 1.2.2 rolled out. 1.0 is most likely to avoid other compatibility issues.

Last edited by Daniel Schaefer; 07 Aug 2024, 16:55.
1 like
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#3

02 Jan 2025, 14:32

This is actually a pystacked issue, not a ddml issue. I am updating pystacked periodically to reflect changes in the sklearn syntax (they keep changing the syntax). I am recording in the pystacked help file which sklearn versions are supported. From the latest version:

pystacked requires at least Stata 16 (or higher), a Python installation and scikit-learn (0.24 or higher). pystacked has been tested with scikit-learn 0.24.2, 1.0.2, 1.1.3, 1.2.1 and 1.3. See here and here for how to set up Python for Stata on your system.

Also, see here.

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
1 like
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#4

03 Jan 2025, 16:53

The new version of pystacked (0.7.6) supports sklearn versions 0.24 to 1.6.0.

Make sure you update pystacked from github.

Please reach out if you run into issues.

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
Comment

Announcement

ddml crossfit error ending in r(7102)

Comment

Comment

Comment