doeren.ml package

Provide tools for machine learning pipelines.

Submodules

doeren.ml.pipeline module

Classes for running machine learning pipelines.

class doeren.ml.pipeline.PipelineRunner(*, data: Tuple[DataFrame | ndarray, DataFrame | ndarray, Series | ndarray, Series | ndarray], pipelines: Dict[str, Any])[source]

Bases: BaseModel

Class for optimizing and comparing machine learning pipelines.

data

Union[pd.DataFrame, np.ndarray], Union[pd.DataFrame, np.ndarray], Union[pd.Series, np.ndarray], Union[pd.Series, np.ndarray]

Type:

Tuple[

]

The data to use for training and validation. Expected to be a tuple of (X_train, X_valid, y_train, y_valid).

pipelines

The pipelines to run.

Type:

Dict[str, Any]

property best_criterion: Dict[str, Any] | None

Return the best value for optimization criterion for each pipeline.

property best_params: Dict[str, Any] | None

Return the best parameters for each pipeline.

property best_pipeline: Dict[str, Any] | None

Return the model trained with the best set of hyperparameters for each pipeline.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

property optimizer: Tuple[Callable | None, Dict[str, Any]]

Return the optimizer and optimizer related kwargs.

run_pipelines() None[source]

Optimize and run all pipelines.

set_optimizer(optimizer: Callable, kwargs=typing.Dict[str, typing.Any]) None[source]

Set the optimizer and optimizer related kwargs to use for hyperparameter tuning.

Parameters:

optimizer (Tuple[Callable, Dict[str, Any]]) – A scikit-learn optimizer and optimizer related kwargs to use for hyperparameter tuning.