Evaluation Metrics
omnirec.runner.evaluation.Evaluator(*metrics: Metric)
Initialize the Evaluator with metrics to compute on predictions.
The Evaluator computes specified metrics on algorithm predictions and accumulates
results across experiments. Use get_tables() to retrieve formatted result tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*metrics
|
Metric
|
One or more metric instances to compute. Common metrics include
NDCG, HR (Hit Rate), and Recall. Each metric can be configured with multiple
k values (e.g., |
()
|
Source code in src\omnirec\runner\evaluation.py
get_results() -> dict[str, DataFrame]
Return evaluation results grouped by dataset.
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
dict[str, DataFrame]: Mapping of dataset identifiers to their result tables. Keys are dataset names with a unique hash appended. Each value is a DataFrame containing the columns:
|
Source code in src\omnirec\runner\evaluation.py
get_tables() -> list[Table]
Return evaluation results as formatted Rich tables, one per dataset.
Each table has one row per algorithm (and per fold when cross-validation is used)
and one column per metric+k combination (e.g. NDCG@10). The tables are
automatically printed to the console by :func:~omnirec.util.run.run_omnirec
after all experiments complete, so you only need to call this method directly
if you want to redisplay results (e.g. after :meth:load_results).
Returns:
| Type | Description |
|---|---|
list[Table]
|
list[rich.table.Table]: One Rich |
Example
Source code in src\omnirec\runner\evaluation.py
save_results(path: Path)
Persist evaluation results to a JSON file.
Serialises the internal results dictionary to JSON so that results can be
reloaded later with :meth:load_results without re-running experiments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Destination file path. The file is created or overwritten. |
required |
Source code in src\omnirec\runner\evaluation.py
load_results(path: Path)
Load previously saved evaluation results from a JSON file.
Restores results that were written by :meth:save_results. After loading,
:meth:get_results and :meth:get_tables work as if the experiments had
just finished.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to a JSON file previously written by :meth: |
required |
Example
from pathlib import Path
from rich.console import Console
evaluator.load_results(Path("results/my_experiment.json"))
# Inspect raw DataFrames
for dataset_id, df in evaluator.get_results().items():
print(df)
# Or redisplay the formatted tables
console = Console()
for table in evaluator.get_tables():
console.print(table)
Source code in src\omnirec\runner\evaluation.py
Metric Base Classes
omnirec.metrics.base.Metric
Bases: ABC
omnirec.metrics.base.Metric.calculate(predictions: pd.DataFrame, test: pd.DataFrame) -> MetricResult
abstractmethod
omnirec.metrics.base.MetricResult(name: str, result: float | dict[int, float])
dataclass
Represents the result of a metric calculation. It holds the name as str and either a single float result or a dictionary of results for multiple k values.
Ranking Metrics
omnirec.metrics.ranking.HR(k: int | list[int])
Bases: RankingMetric
Computes the HR metric. k is the number of top recommendations to consider. It can be a single integer or a list of integers, in which case the metric will be computed for each value of k.
It follows the formula:
\(HR@k = \frac{1}{|U|} \sum_{u \in U} \mathbf{1}\{\text{Rel}(u) \cap \text{Pred}_k(u) \neq \emptyset\}\)
where \(\text{Pred}_k(u)\) is the set of top-k predicted items for user u.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int | list[int]
|
The number of top recommendations to consider. |
required |
Source code in src\omnirec\metrics\ranking.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculates the Hit Rate (HR) metric. Considers the top-k predictions for one or multiple k values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the top k predictions for one or more users. |
required |
test
|
DataFrame
|
Contains the ground truth relevant items for one or more users. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
The computed HR scores for each value k. If multiple users are provided, the scores are averaged. |
Source code in src\omnirec\metrics\ranking.py
omnirec.metrics.ranking.NDCG(k: int | list[int])
Bases: RankingMetric
Initializes the NDCG (Normalized Discounted Cumulative Gain) metric. k is the number of top predictions to consider. It can be a single integer or a list of integers, in which case the metric will be computed for each value of k.
The NDCG considers the position of relevant items in a ranked list of predictions.
For a user u, the discounted cumulative gain at cutoff k is
\(DCG@k(u) = \sum_{i=1}^{k} \frac{\mathbf{1}\{\text{pred}_i \in \text{Rel}(u)\}}{\log_2(i+1)}\)
where \(\mathbf{1}\{\cdot\}\) is the indicator function and
\(\text{Rel}(u)\) is the set of relevant items for user u.
The ideal discounted cumulative gain is
\(IDCG@k = \sum_{i=1}^{k} \frac{1}{\log_2(i+1)}\)
The normalized score is
\(NDCG@k(u) = \frac{DCG@k(u)}{IDCG@k}\)
Finally, the reported score is averaged over all users:
\(\text{NDCG@k} = \frac{1}{|U|} \sum_{u \in U} NDCG@k(u)\)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int | list[int]
|
The number of top predictions to consider. |
required |
Source code in src\omnirec\metrics\ranking.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Computes the Normalized Discounted Cumulative Gain (NDCG). Considers the top-k predictions for one or multiple k values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the top k predictions for one or more users. |
required |
test
|
DataFrame
|
Contains the ground truth relevant items for one or more users. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
The computed NDCG scores for each value k. If multiple users are provided, the scores are averaged. |
Source code in src\omnirec\metrics\ranking.py
omnirec.metrics.ranking.Recall(k: int | list[int])
Bases: RankingMetric
Calculates the average recall at k for one or multiple k values. Recall at k is defined as the proportion of relevant items that are found in the top-k recommendations.
It follows the formula:
\(Recall@k = \frac{1}{|U|} \sum_{u \in U} \frac{|\text{Rel}(u) \cap \text{Pred}_k(u)|}{\min(|\text{Rel}(u)|, k)}\)
where \(\text{Pred}_k(u)\) is the set of top-k predicted items for user u.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int | list[int]
|
The number of top recommendations to consider. |
required |
Source code in src\omnirec\metrics\ranking.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculates the Recall metric. Considers the top-k predictions for one or multiple k values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the top k predictions for one or more users. |
required |
test
|
DataFrame
|
Contains the ground truth relevant items for one or more users. |
required |
Returns:
| Type | Description |
|---|---|
MetricResult
|
list[float]: The computed Recall scores for each value k. If multiple users are provided, the scores are averaged. |
Source code in src\omnirec\metrics\ranking.py
Prediction Metrics
omnirec.metrics.prediction.MAE()
Bases: PredictionMetric
Mean Absolute Error (MAE) metric. Calculates the average of the absolute differences between predicted and actual ratings, according to the formula: \(MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|\)
Source code in src\omnirec\metrics\prediction.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculates the MAE metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
Contains the predicted ratings. |
required |
test
|
DataFrame
|
Contains the ground truth ratings. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
Contains the name of the metric and the computed MAE value. |
Source code in src\omnirec\metrics\prediction.py
omnirec.metrics.prediction.RMSE()
Bases: PredictionMetric
Root Mean Squared Error (RMSE) metric. Calculates the square root of the average of the squared differences between predicted and actual ratings, according to the formula:
\(RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\)
Source code in src\omnirec\metrics\prediction.py
calculate(predictions: DataFrame, test: DataFrame) -> MetricResult
Calculate the RMSE metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
DataFrame
|
description |
required |
test
|
DataFrame
|
description |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MetricResult |
MetricResult
|
Contains the name of the metric and the computed RMSE value. |