Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ddml crossfit error ending in r(7102)

    Hi Statalist,

    After spending all afternoon figuring out how to install packages (scikit-learn) on Python, create a virtual environment and get Python to talk to Stata via the new ddml package - I am sad to have retrieved the following longwinded error message:

    Code:
     
    . ddml crossfit
    Cross-fitting E[y|X,D] equation: nllonely
    Resample 1...
    Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
    Resample 2...
    Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
    Resample 3...
    Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
    Resample 4...
    Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
    Resample 5...
    Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
    Cross-fitting E[D|X] equation: source
    Resample 1...
    Cross-fitting fold 1 joblib.externals.loky.process_executor._RemoteTraceback: 
    """
    Traceback (most recent call last):
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\jo
    > blib\_utils.py", line 72, in __call__
        return self.func(**kwargs)
               ^^^^^^^^^^^^^^^^^^^
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\jo
    > blib\parallel.py", line 598, in __call__
        return [func(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\utils\parallel.py", line 136, in __call__
        return self.function(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\ensemble\_base.py", line 40, in _fit_single_estimator
        estimator.fit(X, y, **fit_params)
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\base.py", line 1473, in wrapper
        return fit_method(estimator, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\pipeline.py", line 473, in fit
        self._final_estimator.fit(Xt, y, **last_step_params["fit"])
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\base.py", line 1466, in wrapper
        estimator._validate_params()
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\base.py", line 666, in _validate_params
        validate_parameter_constraints(
      File "C:\Users\uqhbeilb\AppData\Local\Programs\Python\Python312\Lib\site-packages\sk
    > learn\utils\_param_validation.py", line 95, in validate_parameter_constraints
        raise InvalidParameterError(
    sklearn.utils._param_validation.InvalidParameterError: The 'loss' parameter of Gradien
    > tBoostingClassifier must be a str among {'exponential', 'log_loss'}. Got 'deviance' 
    > instead.
    """
    
    The above exception was the direct cause of the following exception:
    
    # Repeat of above from "Traceback ... Got 'deviance' instead'"
    
    r(7102);

    After loading and cleaning my data, the rest of the relevant code is:

    Code:
     
    * DDML - interactive model
    global Y nllonely
    global D source
    global X male dagecat2 dagecat3 dagecat4 dagecat5 dagecat6 fulltime parttime unemployed uni postgrad kids couplenodeps hshareother mid advant nbh_id_7pt pnq3
    set seed 123
    
    *estimate the model 5 times using randomly chosen folds.
    ddml init interactive, kfolds(5) reps(5)
    
    *consider two supervised learners: linear regression and gradient boosted trees, stacked using pystacked.
    ddml E[Y|X,D]: pystacked $Y $X, type(reg) methods(ols gradboost)
    ddml E[D|X]: pystacked $D $X, type(class) methods(logit gradboost)
    
    *cross-fit - stops at E[D|X] equation: source Cross-fitting fold 1 
    ddml crossfit
    
    *estimate the average treatment effect (the default)
    ddml estimate

    Would anyone have any suggestions on how to solve - except for going to the authors? Grateful for any tips.

    Kind regards,
    Hannah

  • #2
    The problem you are having is on the python side in sklearn.

    sklearn.utils._param_validation.InvalidParameterEr ror: The 'loss' parameter of GradientBoostingClassifier must be a str among {'exponential', 'log_loss'}. Got 'deviance' instead.
    The problem is that GradientBoostingClassifier in sklearn has a parameter called "loss" that only accepts the strings 'exponential' or 'log_loss', but is getting 'deviance' instead. If you roll back the GradientBoostingClassifier to version 1.2.2, you should see that it used to accept 'deviance' in an older version. Looks like the authors of the Stata package haven't been maintaining it.

    You should be able to fix this issue by rolling back your version of sklearn to version 1.2.2. It should be on 1.5.1 currently. You might want to figure out which version of python that was built for and roll back your python instance as well. If it were me, I'd just see if I can do this in python instead. Python is the best platform out there for any kind of machine learning. Just my two cents.

    Edit: Actually, version 1.0 might be better since the deviance option was depreciated before 1.2.2 rolled out. 1.0 is most likely to avoid other compatibility issues.
    Last edited by Daniel Schaefer; 07 Aug 2024, 16:55.

    Comment


    • #3
      This is actually a pystacked issue, not a ddml issue. I am updating pystacked periodically to reflect changes in the sklearn syntax (they keep changing the syntax). I am recording in the pystacked help file which sklearn versions are supported. From the latest version:

      pystacked requires at least Stata 16 (or higher), a Python installation and scikit-learn (0.24 or higher). pystacked has been tested with scikit-learn 0.24.2, 1.0.2, 1.1.3, 1.2.1 and 1.3. See here and here for how to set up Python for Stata on your system.
      Also, see here.
      --
      Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.

      Comment


      • #4
        The new version of pystacked (0.7.6) supports sklearn versions 0.24 to 1.6.0.

        Make sure you update pystacked from github.

        Please reach out if you run into issues.
        --
        Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.

        Comment

        Working...
        X