Dataset Module#
Module with all classes and methods to manage a MLOps Dataset
MLOpsDataset#
- class mlops_codex.dataset.MLOpsDataset(login: str, password: str, base_url: str, hash: str, dataset_name: str, group: str)[source]#
Bases:
object
Dataset class to represent mlops dataset.
- Parameters:
login (str) – Login for authenticating with the client.
password (str) – Password for authenticating with the client.
base_url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
hash (str) – Dataset hash to download.
dataset_name (str) – Name of the dataset.
group (str) – Name of the group where we will search the dataset
origin (str) – Origin of the dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”
- base_url: str#
- dataset_name: str#
- download(path: str | None = './', filename: str | None = 'dataset') None [source]#
Download a dataset from mlops. The dataset will be a csv or parquet file.
- Parameters:
path (str, optional) – Path to the downloaded dataset. Defaults to ‘./’.
filename (str, optional) – Name of the downloaded dataset. Defaults to ‘dataset.zip’.
- Raises:
AuthenticationError – Raised if there is an authentication issue.
DatasetNotFoundError – Raised if there is no dataset with the given name.
ServerError – Raised if the server encounters an issue.
- group: str#
- hash: str#
- login: str#
- password: str#
MLOpsDatasetClient#
- class mlops_codex.dataset.MLOpsDatasetClient(login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
BaseMLOpsClient
Class to operate actions in a dataset.
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
- delete(group: str, dataset_hash: str) None [source]#
Delete the dataset on mlops. Pay attention when doing this action, it is irreversible!
- Parameters:
group (str) – Group to delete.
dataset_hash (str) – Dataset hash to delete.
Example
>>> dataset.delete()
- list_datasets(*, origin: str | None = None, origin_id: int | None = None, datasource_name: str | None = None, group: str | None = None) None [source]#
List datasets from datasources.
- Parameters:
origin (Optional[str]) – Origin of a dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”
origin_id (Optional[str]) – Integer that represents the id of a dataset, given an origin
datasource_name (Optional[str]) – Name of the datasource
group (Optional[str]) – Name of the group where we will search the dataset
Example
>>> dataset.list_datasets()