Skip to content

Data Loaders

omnirec.data_loaders.base.Loader

Bases: ABC

info(name: str) -> DatasetInfo abstractmethod staticmethod

Provide metadata information about the dataset identified by name.

Parameters:

Name Type Description Default
name str

The name under which the loader was registered. Different names may return different DatasetInfo implementations depending on the dataset. This is useful when multiple datasets share the same loading logic but have, for example, different download URLs or checksums.

required

Returns:

Name Type Description
DatasetInfo DatasetInfo

Metadata including download URLs and optional checksum for verification.

Source code in src\omnirec\data_loaders\base.py
@staticmethod
@abstractmethod
def info(name: str) -> DatasetInfo:
    """Provide metadata information about the dataset identified by `name`.

    Args:
        name (str): The name under which the loader was registered. Different names may return different DatasetInfo
            implementations depending on the dataset. This is useful when multiple
            datasets share the same loading logic but have, for example, different
            download URLs or checksums.

    Returns:
        DatasetInfo: Metadata including download URLs and optional checksum for verification.
    """

load(source_dir: Path, name: str) -> pd.DataFrame abstractmethod staticmethod

Loads dataset from the given directory into a pd.DataFrame. The DataFrame should have the standard columns: - user - item - rating - timestamp

Parameters:

Name Type Description Default
source_dir Path

The directory that contains the downloaded dataset files.

required
name str

The name under which the loader was registered. This allows selecting between different datasets that share the same loading logic but differ in structure or file naming.

required

Returns:

Type Description
DataFrame

pd.DataFrame: Loaded dataset as a pd.DataFrame with expected columns.

Source code in src\omnirec\data_loaders\base.py
@staticmethod
@abstractmethod
def load(source_dir: Path, name: str) -> pd.DataFrame:
    """
    Loads dataset from the given directory into a `pd.DataFrame`.
    The DataFrame should have the standard columns:
    - user
    - item
    - rating
    - timestamp

    Args:
        source_dir (Path): The directory that contains the downloaded dataset files.
        name (str): The name under which the loader was registered. This allows selecting between different
            datasets that share the same loading logic but differ in structure or
            file naming.

    Returns:
        pd.DataFrame: Loaded dataset as a pd.DataFrame with expected columns.
    """

omnirec.data_loaders.base.DatasetInfo(download_urls: Optional[str | list[str]] = None, checksum: Optional[str] = None, download_file_name: Optional[str] = None, verify_tls: bool = True, license_or_registration: bool = False) dataclass

Metadata about a dataset.

Attributes:

Name Type Description
download_urls Optional[Union[str, List[str]]]

URL or list of URLs to download the dataset. If a list is provided, URLs are tried in order until one succeeds (skipping on checksum mismatch or HTTP errors).

checksum Optional[str]

Optional SHA256 checksum to verify the downloaded file's integrity. If provided, the downloaded file will be hashed using SHA256 and compared to this value.

download_file_name Optional[str]

Optional custom file name to use when saving the downloaded dataset. If not provided, the name will be inferred from the URL.

verify_tls bool

Whether to verify TLS/SSL certificates when downloading. Defaults is True.

license_or_registration bool

Indicates if the dataset requires a license agreement or registration to access. Default is False.

omnirec.data_loaders.registry.register_dataloader(names: str | list[str], cls: type[Loader])

Register a data loader class under one or multiple names.

Parameters:

Name Type Description Default
names str | list[str]

Name(s) to register the loader under.

required
cls type[Loader]

Loader class to register. Must inherit from the common Loader base class.

required
Source code in src\omnirec\data_loaders\registry.py
def register_dataloader(names: str | list[str], cls: type[Loader]):
    """Register a data loader class under one or multiple names.

    Args:
        names (str | list[str]): Name(s) to register the loader under.
        cls (type[Loader]): Loader class to register. Must inherit from the common `Loader` base class.
    """
    if type(names) is list:
        for n in names:
            _add_loader(n, cls)
    elif type(names) is str:
        _add_loader(names, cls)

omnirec.data_loaders.registry.list_datasets() -> list[str]

List all registered dataset names.

Returns:

Type Description
list[str]

list[str]: A list of all registered dataset names.

Source code in src\omnirec\data_loaders\registry.py
def list_datasets() -> list[str]:
    """List all registered dataset names.

    Returns:
        list[str]: A list of all registered dataset names.
    """
    return list(_DATA_LOADERS.keys())