API Reference
Dataset Management
omnirec.recsys_data_set.RecSysDataSet(data: Optional[T] = None, meta: _DatasetMeta = _DatasetMeta())
Bases: Generic[T]
Source code in src\omnirec\recsys_data_set.py
use_dataloader(data_set: DataSet | str, raw_dir: Optional[PathLike | str] = None, canon_path: Optional[PathLike | str] = None, force_download=False, force_canonicalize=False) -> RecSysDataSet[RawData]
staticmethod
Loads a dataset using a registered DataLoader. If not already done the data set is downloaded and canonicalized. Canonicalization means duplicates are dropped, identifiers are normalized and the data is saved in a standardized format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_set
|
DataSet | str
|
The name of the dataset from the DataSet enum. Must be a registered DataLoader name. |
required |
raw_dir
|
Optional[PathLike | str]
|
Target directory where the raw data is stored. If not provided, the data is downloaded to the default raw data directory (_DATA_DIR). |
None
|
canon_path
|
Optional[PathLike | str]
|
Path where the canonicalized data should be saved. If not provided, the data is saved to the default canonicalized data directory (_DATA_DIR / "canon"). |
None
|
force_download
|
bool
|
If True, forces re-downloading of the raw data even if it already exists. Defaults to False. |
False
|
force_canonicalize
|
bool
|
If True, forces re-canonicalization of the data even if a canonicalized file exists. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
RecSysDataSet[RawData]
|
RecSysDataSet[RawData]: The loaded dataset in canonicalized RawData format. |
Example
Source code in src\omnirec\recsys_data_set.py
save(file: str | PathLike)
Saves the RecSysDataSet object to a file with the default suffix .rsds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
str | PathLike
|
The path where the file is saved. |
required |
Source code in src\omnirec\recsys_data_set.py
load(file: str | PathLike) -> RecSysDataSet[T]
staticmethod
Loads a RecSysDataSet object from a file with the .rsds suffix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
str | PathLike
|
The path to the .rsds file. |
required |
Returns:
| Type | Description |
|---|---|
RecSysDataSet[T]
|
RecSysDataSet[T]: The loaded RecSysDataSet object. |
Source code in src\omnirec\recsys_data_set.py
Data Loaders
omnirec.data_loaders.base.Loader
Bases: ABC
info(name: str) -> DatasetInfo
abstractmethod
staticmethod
Provide metadata information about the dataset identified by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name under which the loader was registered. Different names may return different DatasetInfo implementations depending on the dataset. This is useful when multiple datasets share the same loading logic but have, for example, different download URLs or checksums. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetInfo |
DatasetInfo
|
Metadata including download URLs and optional checksum for verification. |
Source code in src\omnirec\data_loaders\base.py
load(source_dir: Path, name: str) -> pd.DataFrame
abstractmethod
staticmethod
Loads dataset from the given directory into a pd.DataFrame.
The DataFrame should have the standard columns:
- user
- item
- rating
- timestamp
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_dir
|
Path
|
The directory that contains the downloaded dataset files. |
required |
name
|
str
|
The name under which the loader was registered. This allows selecting between different datasets that share the same loading logic but differ in structure or file naming. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Loaded dataset as a pd.DataFrame with expected columns. |
Source code in src\omnirec\data_loaders\base.py
omnirec.data_loaders.base.DatasetInfo(download_urls: Optional[str | list[str]] = None, checksum: Optional[str] = None, download_file_name: Optional[str] = None, verify_tls: bool = True, license_or_registration: bool = False)
dataclass
Metadata about a dataset.
Attributes
Optional[Union[str, List[str]]]
URL or list of URLs to download the dataset. If a list is provided, URLs are tried in order until one succeeds (skipping on checksum mismatch or HTTP errors).
Optional[str]
Optional SHA256 checksum to verify the downloaded file's integrity.
If provided, the downloaded file will be hashed using SHA256 and compared
to this value. Use e.g. hashlib.sha256() to compute the checksum in python:
download_file_name : Optional[str]
Optional custom file name to use when saving the downloaded dataset.
If not provided, the name will be inferred from the URL.
verify_tls : bool
Whether to verify TLS/SSL certificates when downloading.
Defaults is True.
license_or_registration : bool
Indicates if the dataset requires a license agreement or registration to access.
Default is False.
import hashlib
hasher = hashlib.sha256()
with open("ml-100k.zip", "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
hasher.update(chunk)
print(hasher.hexdigest())
omnirec.data_loaders.registry.register_dataloader(names: str | list[str], cls: type[Loader])
Register a data loader class under one or multiple names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
str | list[str]
|
Name(s) to register the loader under. |
required |
cls
|
type[Loader]
|
Loader class to register. Must inherit from the common |
required |
Source code in src\omnirec\data_loaders\registry.py
omnirec.data_loaders.registry.list_datasets() -> list[str]
List all registered dataset names.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of all registered dataset names. |
Preprocessing Pipeline
omnirec.preprocess.base.Preprocessor()
Bases: ABC, Generic[T, U]
process(dataset: RecSysDataSet[T]) -> RecSysDataSet[U]
abstractmethod
Processes the dataset and returns a new dataset variant.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
RecSysDataSet[T]
|
The dataset to process. |
required |
Returns:
| Type | Description |
|---|---|
RecSysDataSet[U]
|
RecSysDataSet[U]: The processed dataset. |
Source code in src\omnirec\preprocess\base.py
omnirec.preprocess.subsample.Subsample(sample_size: int | float)
Bases: Preprocessor[RawData, RawData]
Subsamples the dataset to a specified size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_size
|
int | float
|
The size of the sample to draw from the dataset. int: The absolute number of interactions to include in the sample. float: The fraction of the dataset to include in the sample (between 0 and 1). |
required |
Source code in src\omnirec\preprocess\subsample.py
omnirec.preprocess.core_pruning.CorePruning(core: int)
Bases: Preprocessor[RawData, RawData]
Prune the dataset to the specified core. Core pruning with a threshold of e.g. 5 means that only users and items with at least 5 interactions are included in the pruned dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
core
|
int
|
The core threshold for pruning. |
required |
Source code in src\omnirec\preprocess\core_pruning.py
omnirec.preprocess.feedback_conversion.MakeImplicit(threshold: int | float)
Bases: Preprocessor[RawData, RawData]
Converts explicit feedback to implicit feedback using the specified threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
int | float
|
The threshold for converting feedback. int: Used directly as the threshold, e.g. 3 -> only interactions with a rating of 3 or higher are included. float: Interpreted as a fraction of the maximum rating, e.g. 0.5 -> only interactions with a rating of at least 50% of the maximum rating are included. |
required |
Source code in src\omnirec\preprocess\feedback_conversion.py
Filtering
omnirec.preprocess.filter.TimeFilter(start: Optional[pd.Timestamp] = None, end: Optional[pd.Timestamp] = None)
Bases: Preprocessor[RawData, RawData]
Filters the interactions by a time range. Only interactions within the specified start and end timestamps are retained.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
Optional[Timestamp]
|
The start timestamp for the filter. Defaults to None. |
None
|
end
|
Optional[Timestamp]
|
The end timestamp for the filter. Defaults to None. |
None
|
Source code in src\omnirec\preprocess\filter.py
omnirec.preprocess.filter.RatingFilter(lower: Optional[int | float] = None, upper: Optional[int | float] = None)
Bases: Preprocessor[RawData, RawData]
Filters the interactions by rating values. Only interactions with ratings within the specified lower and upper bounds are retained.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lower
|
Optional[int | float]
|
The lower bound for the filter. Defaults to None. |
None
|
upper
|
Optional[int | float]
|
The upper bound for the filter. Defaults to None. |
None
|
Source code in src\omnirec\preprocess\filter.py
omnirec.preprocess.split.RandomCrossValidation(num_folds: int, validation_size: float | int)
Bases: DataSplit[RawData, FoldedData]
Applies random cross-validation to the dataset. Randomly splits the dataset into training, validation, and test sets for each fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_folds
|
int
|
The number of folds to use for cross-validation. |
required |
validation_size
|
float | int
|
float: The proportion (between 0 and 1) of the dataset to include in the validation split. int: The absolute number of interactions to include in the validation split. |
required |
Source code in src\omnirec\preprocess\split.py
omnirec.preprocess.split.RandomHoldout(validation_size: float | int, test_size: float | int)
Bases: DataSplit[RawData, SplitData]
Applies a random holdout split to the dataset. Randomly splits the dataset into training, validation, and test sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validation_size
|
float | int
|
float: The proportion (between 0 and 1) of the dataset to include in the validation split. int: The absolute number of interactions to include in the validation split. |
required |
test_size
|
float | int
|
float: The proportion (between 0 and 1) of the dataset to include in the test split. int: The absolute number of interactions to include in the test split. |
required |
Source code in src\omnirec\preprocess\split.py
omnirec.preprocess.split.UserCrossValidation(num_folds: int, validation_size: float | int)
Bases: DataSplit[RawData, FoldedData]
Applies user-based cross-validation to the dataset. Ensures that each user has interactions in the training, validation, and test sets in each fold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_folds
|
int
|
The number of folds to use for cross-validation. |
required |
validation_size
|
float | int
|
float: The proportion (between 0 and 1) of the dataset to include in the validation split. int: The absolute number of interactions to include in the validation split. |
required |
Source code in src\omnirec\preprocess\split.py
omnirec.preprocess.split.UserHoldout(validation_size: float | int, test_size: float | int)
Bases: DataSplit[RawData, SplitData]
Applies the user holdout split to the dataset. Ensures that each user has interactions in the training, validation, and test sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validation_size
|
float | int
|
float: The proportion (between 0 and 1) of the dataset to include in the validation split. int: The absolute number of interactions to include in the validation split. |
required |
test_size
|
float | int
|
float: The proportion (between 0 and 1) of the dataset to include in the test split. int: The absolute number of interactions to include in the test split. |
required |
Source code in src\omnirec\preprocess\split.py
omnirec.preprocess.split.TimeBasedHoldout(validation: float | int | pd.Timestamp, test: float | int | pd.Timestamp)
Bases: DataSplit[RawData, SplitData]
Applies a time-based hold-out split on a dataset. Splits the dataset into a train, test and validation split based on the timestamp. Can either use proportions, absolute numbers or timestamps as cut-off criteria.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
validation
|
float | int | Timestamp
|
float: The proportion (between 0 and 1) of newest interactions in the dataset to include in the validation split. int: The absolute number of newest interactions to include in the validation split. pd.Timestamp: The timestamp to use as a cut-off for the validation split. Interactions after this timestamp (newer) are included in the validation split. |
required |
test
|
float | int | Timestamp
|
float: The proportion (between 0 and 1) of newest interactions in the dataset to include in the test split. int: The absolute number of newest interactions to include in the test split. pd.Timestamp: The timestamp to use as a cut-off for the test split. Interactions after this timestamp (newer) are included in the test split. |
required |
Source code in src\omnirec\preprocess\split.py
omnirec.preprocess.pipe.Pipe(*steps: Unpack[tuple[Unpack[Ts], Preprocessor[Any, T]]])
Bases: Generic[T]
Pipeline for automatically applying sequential preprocessing steps. Takes as input a sequence of Preprocessor objects. If process() is called, each step's process method is called in the order they were provided. Example:
# Define preprocessing steps
pipe = Pipe(
Subsample(0.1),
MakeImplicit(3),
CorePruning(5),
UserCrossValidation(5, 0.1),
)
# Apply the steps
dataset = pipe.process(dataset)
Source code in src\omnirec\preprocess\pipe.py
Evaluation Metrics
omnirec.runner.evaluation.Evaluator(*metrics: Metric)
Initialize the Evaluator with metrics to compute on predictions.
The Evaluator computes specified metrics on algorithm predictions and accumulates
results across experiments. Use get_tables() to retrieve formatted result tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*metrics
|
Metric
|
One or more metric instances to compute. Common metrics include
NDCG, HR (Hit Rate), and Recall. Each metric can be configured with multiple
k values (e.g., |
()
|
Source code in src\omnirec\runner\evaluation.py
Metric Base Classes
omnirec.metrics.base.Metric
Bases: ABC
omnirec.metrics.base.Metric.calculate(predictions: pd.DataFrame, test: pd.DataFrame) -> MetricResult
abstractmethod
omnirec.metrics.base.MetricResult(name: str, result: float | dict[int, float])
dataclass
Represents the result of a metric calculation. It holds the name as str and either a single float result or a dictionary of results for multiple k values.
Ranking Metrics
omnirec.metrics.ranking.HR(k: int | list[int])
Bases: RankingMetric
Computes the HR metric. k is the number of top recommendations to consider. It can be a single integer or a list of integers, in which case the metric will be computed for each value of k.
It follows the formula:
\(HR@k = \frac{1}{|U|} \sum_{u \in U} \mathbf{1}\{\text{Rel}(u) \cap \text{Pred}_k(u) \neq \emptyset\}\)
where \(\text{Pred}_k(u)\) is the set of top-k predicted items for user u.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int | list[int]
|
The number of top recommendations to consider. |
required |
Source code in src\omnirec\metrics\ranking.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculates the Hit Rate (HR) metric. Considers the top-k predictions for one or multiple k values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the top k predictions for one or more users. |
required |
test
|
DataFrame
|
Contains the ground truth relevant items for one or more users. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
The computed HR scores for each value k. If multiple users are provided, the scores are averaged. |
Source code in src\omnirec\metrics\ranking.py
omnirec.metrics.ranking.NDCG(k: int | list[int])
Bases: RankingMetric
Initializes the NDCG (Normalized Discounted Cumulative Gain) metric. k is the number of top predictions to consider. It can be a single integer or a list of integers, in which case the metric will be computed for each value of k.
The NDCG considers the position of relevant items in a ranked list of predictions.
For a user u, the discounted cumulative gain at cutoff k is
\(DCG@k(u) = \sum_{i=1}^{k} \frac{\mathbf{1}\{\text{pred}_i \in \text{Rel}(u)\}}{\log_2(i+1)}\)
where \(\mathbf{1}\{\cdot\}\) is the indicator function and
\(\text{Rel}(u)\) is the set of relevant items for user u.
The ideal discounted cumulative gain is
\(IDCG@k = \sum_{i=1}^{k} \frac{1}{\log_2(i+1)}\)
The normalized score is
\(NDCG@k(u) = \frac{DCG@k(u)}{IDCG@k}\)
Finally, the reported score is averaged over all users:
\(\text{NDCG@k} = \frac{1}{|U|} \sum_{u \in U} NDCG@k(u)\)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int | list[int]
|
The number of top predictions to consider. |
required |
Source code in src\omnirec\metrics\ranking.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Computes the Normalized Discounted Cumulative Gain (NDCG). Considers the top-k predictions for one or multiple k values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the top k predictions for one or more users. |
required |
test
|
DataFrame
|
Contains the ground truth relevant items for one or more users. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
The computed NDCG scores for each value k. If multiple users are provided, the scores are averaged. |
Source code in src\omnirec\metrics\ranking.py
omnirec.metrics.ranking.Recall(k: int | list[int])
Bases: RankingMetric
Calculates the average recall at k for one or multiple k values. Recall at k is defined as the proportion of relevant items that are found in the top-k recommendations.
It follows the formula:
\(Recall@k = \frac{1}{|U|} \sum_{u \in U} \frac{|\text{Rel}(u) \cap \text{Pred}_k(u)|}{\min(|\text{Rel}(u)|, k)}\)
where \(\text{Pred}_k(u)\) is the set of top-k predicted items for user u.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int | list[int]
|
The number of top recommendations to consider. |
required |
Source code in src\omnirec\metrics\ranking.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculates the Recall metric. Considers the top-k predictions for one or multiple k values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the top k predictions for one or more users. |
required |
test
|
DataFrame
|
Contains the ground truth relevant items for one or more users. |
required |
Returns:
| Type | Description |
|---|---|
MetricResult
|
list[float]: The computed Recall scores for each value k. If multiple users are provided, the scores are averaged. |
Source code in src\omnirec\metrics\ranking.py
Prediction Metrics
omnirec.metrics.prediction.MAE()
Bases: PredictionMetric
Mean Absolute Error (MAE) metric. Calculates the average of the absolute differences between predicted and actual ratings, according to the formula: \(MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|\)
Source code in src\omnirec\metrics\prediction.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculates the MAE metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the predicted ratings. |
required |
test
|
DataFrame
|
Contains the ground truth ratings. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
Contains the name of the metric and the computed MAE value. |
Source code in src\omnirec\metrics\prediction.py
omnirec.metrics.prediction.RMSE()
Bases: PredictionMetric
Root Mean Squared Error (RMSE) metric. Calculates the square root of the average of the squared differences between predicted and actual ratings, according to the formula:
\(RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\)
Source code in src\omnirec\metrics\prediction.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculate the RMSE metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
description |
required |
test
|
DataFrame
|
description |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
Contains the name of the metric and the computed RMSE value. |
Source code in src\omnirec\metrics\prediction.py
Experiment Planning
omnirec.runner.plan.ExperimentPlan(plan_name: Optional[str] = None)
Source code in src\omnirec\runner\plan.py
add_algorithm(algorithm: Algorithms | str, algorithm_config: Optional[AlgorithmConfig] = None, force=False)
Adds an algorithm to the experiment plan.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algorithm
|
Algorithms | str
|
The algorithm to add. |
required |
algorithm_config
|
Optional[AlgorithmConfig]
|
The configuration for the algorithm. Algorithm config depends of the origin library of the algorithm. We refer to their documentation for details about the algorithm hyperparameters. |
None
|
force
|
bool
|
Whether to forcefully overwrite an existing algorithm config. Defaults to False. |
False
|
Example
# Create a new experiment plan
plan = ExperimentPlan(plan_name="Example Plan")
# Define algorithm configuration based on the lenskit ItemKNNScorer parameters
lenskit_itemknn = {"max_nbrs": [10, 20], "min_nbrs": 5, "feedback": "implicit"}
# Add algorithm with configuration to the plan
plan.add_algorithm(Algorithms.ItemKNNScorer, lenskit_itemknn)
Source code in src\omnirec\runner\plan.py
update_algorithm(algorithm_name: str, algorithm_config: AlgorithmConfig)
Updates the configuration for an existing algorithm in the experiment plan.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algorithm_name
|
str
|
The name of the algorithm to update. |
required |
algorithm_config
|
AlgorithmConfig
|
The new configuration for the algorithm. |
required |
Source code in src\omnirec\runner\plan.py
Runner Function
omnirec.util.run.run_omnirec(datasets: RecSysDataSet[T] | Iterable[RecSysDataSet[T]], plan: ExperimentPlan, evaluator: Evaluator, slurm_script: Optional[PathLike | str] = None)
Run the OmniRec framework with the specified datasets, experiment plan, and evaluator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets
|
RecSysDataSet[T] | Iterable[RecSysDataSet[T]]
|
The dataset(s) to use for the experiment. |
required |
plan
|
ExperimentPlan
|
The experiment plan to follow. |
required |
evaluator
|
Evaluator
|
The evaluator to use for the experiment. |
required |
slurm_script
|
Optional[PathLike | str]
|
Path to a SLURM script used to schedule experiments on an HPC cluster. If not provided, the experiments are run locally in normal mode. |
None
|
Source code in src\omnirec\util\run.py
Coordinator Class
omnirec.runner.coordinator.Coordinator(checkpoint_dir: PathLike | str = Path('./checkpoints'), tmp_dir: Optional[PathLike | str] = None)
Initialize the Coordinator for orchestrating recommendation algorithm experiments. The Coordinator manages the execution of experiments across multiple datasets, algorithms, and configurations. It handles environment isolation, checkpointing, progress tracking, and communication with framework-specific runners.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
checkpoint_dir
|
PathLike | str
|
Directory for storing persistent experiment data including model checkpoints, predictions, and progress files. Directory is created if it doesn't exist. Defaults to "./checkpoints". |
Path('./checkpoints')
|
tmp_dir
|
Optional[PathLike | str]
|
Directory for temporary files such as intermediate CSV exports. If None, a temporary directory is created automatically and cleaned up on exit. Defaults to None. |
None
|
Note
- Automatically registers default runners (LensKit, RecBole, RecPack) on initialization
- Generates SSL certificates for secure RPC communication with runner subprocesses
- The checkpoint directory structure is:
checkpoint_dir/dataset-hash/config-hash/
Source code in src\omnirec\runner\coordinator.py
run(datasets: RecSysDataSet[T] | Iterable[RecSysDataSet[T]], config: ExperimentPlan, evaluator: Evaluator) -> Evaluator
Execute recommendation algorithm experiments across datasets and configurations. Orchestrates the complete experiment lifecycle: environment setup, model training, prediction generation, and evaluation. Supports automatic checkpointing and resuming of interrupted experiments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets
|
RecSysDataSet[T] | Iterable[RecSysDataSet[T]]
|
Single dataset or list of datasets to run experiments on. Datasets must contain either SplitData (train/val/test) or FoldedData (cross-validation folds). Use preprocessing steps to create these splits. |
required |
config
|
ExperimentPlan
|
Experiment configuration specifying algorithms and their hyperparameters. Each algorithm in the plan will be executed with all specified parameter combinations. |
required |
evaluator
|
Evaluator
|
Evaluator instance containing metrics to compute on predictions.
Results are accumulated across all experiments and accessible via |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Evaluator |
Evaluator
|
The same evaluator instance passed in, now containing results from all experiments.
Use |
Raises:
| Type | Description |
|---|---|
SystemExit
|
If the experiment plan is empty or if runner/algorithm validation fails. |
Note
- Each algorithm runs in an isolated Python environment with framework-specific dependencies
- Progress is checkpointed after each phase (Fit, Predict, Eval) for fault tolerance
- Identical dataset/config combinations are cached and skipped automatically
- For cross-validation (FoldedData), experiments run sequentially across all folds
- Runner subprocesses are automatically started and terminated for each algorithm
Source code in src\omnirec\runner\coordinator.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |
Utility Functions
omnirec.util.util.set_random_state(random_state: int) -> None
Set the global random state for reproducibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
random_state
|
int
|
The random state seed. |
required |
omnirec.util.util.get_random_state() -> int
Get the global random state for reproducibility.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The current random state seed. |