doeren.ml package

Provide tools for machine learning pipelines.

Submodules

doeren.ml.pipeline module

Classes for running machine learning pipelines.

class doeren.ml.pipeline.PipelineRunner(*, data: Tuple[DataFrame | ndarray, DataFrame | ndarray, Series | ndarray, Series | ndarray], pipelines: Dict[str, Any])[source]

Bases: BaseModel

Class for optimizing and comparing machine learning pipelines.

data

Union[pd.DataFrame, np.ndarray], Union[pd.DataFrame, np.ndarray], Union[pd.Series, np.ndarray], Union[pd.Series, np.ndarray]

Type:: Tuple[

]: The data to use for training and validation. Expected to be a tuple of (X_train, X_valid, y_train, y_valid).

pipelines

The pipelines to run.

Type:: Dict[str, Any]

property best_criterion: Dict[str, Any] | None: Return the best value for optimization criterion for each pipeline.

property best_params: Dict[str, Any] | None: Return the best parameters for each pipeline.

property best_pipeline: Dict[str, Any] | None: Return the model trained with the best set of hyperparameters for each pipeline.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

property optimizer: Tuple[Callable | None, Dict[str, Any]]: Return the optimizer and optimizer related kwargs.

run_pipelines() → None[source]: Optimize and run all pipelines.

set_optimizer(optimizer: Callable, kwargs=typing.Dict[str, typing.Any]) → None[source]

Set the optimizer and optimizer related kwargs to use for hyperparameter tuning.

Parameters:: optimizer (Tuple[Callable, Dict[str, Any]]) – A scikit-learn optimizer and optimizer related kwargs to use for hyperparameter tuning.