Dataset Module#

Module with all classes and methods to manage a MLOps Dataset

MLOpsDataset#

class mlops_codex.dataset.MLOpsDataset(login: str, password: str, base_url: str, hash: str, dataset_name: str, group: str)[source]#

Bases: object

Dataset class to represent mlops dataset.

Parameters:
  • login (str) – Login for authenticating with the client.

  • password (str) – Password for authenticating with the client.

  • base_url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this

  • hash (str) – Dataset hash to download.

  • dataset_name (str) – Name of the dataset.

  • group (str) – Name of the group where we will search the dataset

  • origin (str) – Origin of the dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”

base_url: str#
dataset_name: str#
download(path: str | None = './', filename: str | None = 'dataset') None[source]#

Download a dataset from mlops. The dataset will be a csv or parquet file.

Parameters:
  • path (str, optional) – Path to the downloaded dataset. Defaults to ‘./’.

  • filename (str, optional) – Name of the downloaded dataset. Defaults to ‘dataset.zip’.

Raises:
group: str#
hash: str#
login: str#
password: str#

MLOpsDatasetClient#

class mlops_codex.dataset.MLOpsDatasetClient(login: str | None = None, password: str | None = None, url: str | None = None)[source]#

Bases: BaseMLOpsClient

Class to operate actions in a dataset.

Parameters:
  • login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this

  • password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this

  • url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this

delete(group: str, dataset_hash: str) None[source]#

Delete the dataset on mlops. Pay attention when doing this action, it is irreversible!

Parameters:
  • group (str) – Group to delete.

  • dataset_hash (str) – Dataset hash to delete.

Example

>>> dataset.delete()
list_datasets(*, origin: str | None = None, origin_id: int | None = None, datasource_name: str | None = None, group: str | None = None) None[source]#

List datasets from datasources.

Parameters:
  • origin (Optional[str]) – Origin of a dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”

  • origin_id (Optional[str]) – Integer that represents the id of a dataset, given an origin

  • datasource_name (Optional[str]) – Name of the datasource

  • group (Optional[str]) – Name of the group where we will search the dataset

Example

>>> dataset.list_datasets()
load_dataset(dataset_hash: str)[source]#