DataSource module#
Module with all classes and methods to manage the data sources and data sets
MLOpsDataSourceClient#
- class mlops_codex.datasources.MLOpsDataSourceClient(*, login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
BaseMLOpsClient
Class for client for manage datasources
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net/, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
- Raises:
ServerError – Database produced an unexpected error.
AuthenticationError – If user is not in the master group.
CredentialError – If the Cloud Credential is Invalid
- credentials_to_json(input_data: dict) str [source]#
Transform dict to json.
- Parameters:
input_data (dict) – A dictionary to save.
- Returns:
Path to the credentials file.
- Return type:
str
- get_datasource(*, datasource_name: str, provider: str, group: str)[source]#
Get a MLOpsDataSource to make datasource operations.
- Parameters:
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
group (str) – Name of the group where we will search the datasources
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.get_datasource(datasource_name='MyDataSourceName', provider='GCP', group='my_group')
- list_datasources(*, provider: str, group: str)[source]#
List all datasources of the group with this provider type.
- Parameters:
group (str) – Name of the group where we will search the datasources
provider (str ("Azure" | "AWS" | "GCP")) –
- Raises:
AuthenticationError – Raised if there is an authentication issue.
ServerError – Raised if the server encounters an issue.
InputError – Raised if something went wrong.
- Returns:
A list of datasources information.
- Return type:
list
Example
>>> client.list_datasources(provider='GCP', group='my_group')
- register_datasource(*, datasource_name: str, provider: str, cloud_credentials: dict | str, group: str)[source]#
Register the user cloud credentials to allow MLOps to use the provider to download the datasource.
- Parameters:
group (str) – Name of the group where we will search the datasources.
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
cloud_credentials (str | Union[dict,str]) – Path or dict to a JSON with the credentials to access the provider.
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.register_datasource( >>> datasource_name='MyDataSourceName', >>> provider='GCP', >>> cloud_credentials='./gcp_credentials.json', >>> group='my_group' >>> )
MLOpsDataSource#
- class mlops_codex.datasources.MLOpsDataSourceClient(*, login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
BaseMLOpsClient
Class for client for manage datasources
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net/, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
- Raises:
ServerError – Database produced an unexpected error.
AuthenticationError – If user is not in the master group.
CredentialError – If the Cloud Credential is Invalid
- credentials_to_json(input_data: dict) str [source]#
Transform dict to json.
- Parameters:
input_data (dict) – A dictionary to save.
- Returns:
Path to the credentials file.
- Return type:
str
- get_datasource(*, datasource_name: str, provider: str, group: str)[source]#
Get a MLOpsDataSource to make datasource operations.
- Parameters:
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
group (str) – Name of the group where we will search the datasources
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.get_datasource(datasource_name='MyDataSourceName', provider='GCP', group='my_group')
- list_datasources(*, provider: str, group: str)[source]#
List all datasources of the group with this provider type.
- Parameters:
group (str) – Name of the group where we will search the datasources
provider (str ("Azure" | "AWS" | "GCP")) –
- Raises:
AuthenticationError – Raised if there is an authentication issue.
ServerError – Raised if the server encounters an issue.
InputError – Raised if something went wrong.
- Returns:
A list of datasources information.
- Return type:
list
Example
>>> client.list_datasources(provider='GCP', group='my_group')
- register_datasource(*, datasource_name: str, provider: str, cloud_credentials: dict | str, group: str)[source]#
Register the user cloud credentials to allow MLOps to use the provider to download the datasource.
- Parameters:
group (str) – Name of the group where we will search the datasources.
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
cloud_credentials (str | Union[dict,str]) – Path or dict to a JSON with the credentials to access the provider.
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.register_datasource( >>> datasource_name='MyDataSourceName', >>> provider='GCP', >>> cloud_credentials='./gcp_credentials.json', >>> group='my_group' >>> )
MLOpsDataset#
- class mlops_codex.datasources.MLOpsDataset(login: str, password: str, base_url: str, hash: str, dataset_name: str, group: str)[source]#
Bases:
object
Dataset class to represent mlops dataset.
- Parameters:
login (str) – Login for authenticating with the client.
password (str) – Password for authenticating with the client.
base_url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
hash (str) – Dataset hash to download.
dataset_name (str) – Name of the dataset.
group (str) – Name of the group where we will search the dataset
origin (str) – Origin of the dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”
- base_url: str#
- dataset_name: str#
- download(path: str | None = './', filename: str | None = 'dataset') None [source]#
Download a dataset from mlops. The dataset will be a csv or parquet file.
- Parameters:
path (str, optional) – Path to the downloaded dataset. Defaults to ‘./’.
filename (str, optional) – Name of the downloaded dataset. Defaults to ‘dataset.zip’.
- Raises:
AuthenticationError – Raised if there is an authentication issue.
DatasetNotFoundError – Raised if there is no dataset with the given name.
ServerError – Raised if the server encounters an issue.
- group: str#
- hash: str#
- login: str#
- password: str#