Dataset Module#

Module with all classes and methods to manage a MLOps Dataset

MLOpsDataset#

class mlops_codex.dataset.MLOpsDataset(login: str, password: str, base_url: str, hash: str, dataset_name: str, group: str)[source]#

Bases: object

Dataset class to represent mlops dataset.

Parameters:

login (str) – Login for authenticating with the client.
password (str) – Password for authenticating with the client.
base_url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
hash (str) – Dataset hash to download.
dataset_name (str) – Name of the dataset.
group (str) – Name of the group where we will search the dataset
origin (str) – Origin of the dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”

base_url: str#

dataset_name: str#

download(path: str | None = './', filename: str | None = 'dataset') → None[source]#

Download a dataset from mlops. The dataset will be a csv or parquet file.

Parameters:

path (str, optional) – Path to the downloaded dataset. Defaults to ‘./’.
filename (str, optional) – Name of the downloaded dataset. Defaults to ‘dataset.zip’.

Raises:

AuthenticationError – Raised if there is an authentication issue.
DatasetNotFoundError – Raised if there is no dataset with the given name.
ServerError – Raised if the server encounters an issue.

group: str#

hash: str#

login: str#

password: str#

MLOpsDatasetClient#

class mlops_codex.dataset.MLOpsDatasetClient(login: str | None = None, password: str | None = None, tenant: str | None = None, url: str | None = None)[source]#

Bases: BaseMLOpsClient

Class to operate actions in a dataset.

Parameters:

login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this

delete(group: str, dataset_hash: str) → None[source]#

Delete the dataset on mlops. Pay attention when doing this action, it is irreversible!

Parameters:

group (str) – Group to delete.
dataset_hash (str) – Dataset hash to delete.

Example

>>> dataset.delete()

list_datasets(*, origin: str | None = None, origin_id: int | None = None, datasource_name: str | None = None, group: str | None = None) → None[source]#

List datasets from datasources.

Parameters:

origin (Optional[str]) – Origin of a dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”
origin_id (Optional[str]) – Integer that represents the id of a dataset, given an origin
datasource_name (Optional[str]) – Name of the datasource
group (Optional[str]) – Name of the group where we will search the dataset

Example

>>> dataset.list_datasets()

load_dataset(dataset_hash: str)[source]#