Data Loaders
omnirec.data_loaders.base.Loader
Bases: ABC
info(name: str) -> DatasetInfo
abstractmethod
staticmethod
Provide metadata information about the dataset identified by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name under which the loader was registered. Different names may return different DatasetInfo implementations depending on the dataset. This is useful when multiple datasets share the same loading logic but have, for example, different download URLs or checksums. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetInfo |
DatasetInfo
|
Metadata including download URLs and optional checksum for verification. |
Source code in src\omnirec\data_loaders\base.py
load(source_dir: Path, name: str) -> pd.DataFrame
abstractmethod
staticmethod
Loads dataset from the given directory into a pd.DataFrame.
The DataFrame should have the standard columns:
- user
- item
- rating
- timestamp
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_dir
|
Path
|
The directory that contains the downloaded dataset files. |
required |
name
|
str
|
The name under which the loader was registered. This allows selecting between different datasets that share the same loading logic but differ in structure or file naming. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Loaded dataset as a pd.DataFrame with expected columns. |
Source code in src\omnirec\data_loaders\base.py
omnirec.data_loaders.base.DatasetInfo(download_urls: Optional[str | list[str]] = None, checksum: Optional[str] = None, download_file_name: Optional[str] = None, verify_tls: bool = True, license_or_registration: bool = False)
dataclass
Metadata about a dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
download_urls |
Optional[Union[str, List[str]]]
|
URL or list of URLs to download the dataset. If a list is provided, URLs are tried in order until one succeeds (skipping on checksum mismatch or HTTP errors). |
checksum |
Optional[str]
|
Optional SHA256 checksum to verify the downloaded file's integrity. If provided, the downloaded file will be hashed using SHA256 and compared to this value. |
download_file_name |
Optional[str]
|
Optional custom file name to use when saving the downloaded dataset. If not provided, the name will be inferred from the URL. |
verify_tls |
bool
|
Whether to verify TLS/SSL certificates when downloading.
Defaults is |
license_or_registration |
bool
|
Indicates if the dataset requires a license agreement or registration to access.
Default is |
omnirec.data_loaders.registry.register_dataloader(names: str | list[str], cls: type[Loader])
Register a data loader class under one or multiple names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
str | list[str]
|
Name(s) to register the loader under. |
required |
cls
|
type[Loader]
|
Loader class to register. Must inherit from the common |
required |
Source code in src\omnirec\data_loaders\registry.py
omnirec.data_loaders.registry.list_datasets() -> list[str]
List all registered dataset names.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of all registered dataset names. |