DataSource module#

Module with all classes and methods to manage the data sources and data sets

MLOpsDataSourceClient#

class mlops_codex.datasources.MLOpsDataSourceClient(login: str | None = None, password: str | None = None, tenant: str | None = None, url: str | None = None)[source]#

Bases: BaseMLOpsClient

Class for client for manage datasources

Parameters:

login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net/, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this

Raises:

ServerError – Database produced an unexpected error.
AuthenticationError – If user is not in the master group.
CredentialError – If the Cloud Credential is Invalid

credentials_to_json(input_data: dict) → str[source]#

Transform dict to json.

Parameters:: input_data (dict) – A dictionary to save.
Returns:: Path to the credentials file.
Return type:: str

get_datasource(*, datasource_name: str, provider: str, group: str)[source]#

Get a MLOpsDataSource to make datasource operations.

Parameters:

datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
group (str) – Name of the group where we will search the datasources

Returns:

A MLOpsDataSource object

Return type:

MLOpsDataSource

Example

>>> client.get_datasource(datasource_name='MyDataSourceName', provider='GCP', group='my_group')

list_datasources(*, provider: str, group: str)[source]#

List all datasources of the group with this provider type.

Parameters:

group (str) – Name of the group where we will search the datasources
provider (str ("Azure" | "AWS" | "GCP")) –

Raises:

AuthenticationError – Raised if there is an authentication issue.
ServerError – Raised if the server encounters an issue.
InputError – Raised if something went wrong.

Returns:

A list of datasources information.

Return type:

list

Example

>>> client.list_datasources(provider='GCP', group='my_group')

register_datasource(*, datasource_name: str, provider: str, cloud_credentials: dict | str, group: str)[source]#

Parameters:

group (str) – Name of the group where we will search the datasources.
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
cloud_credentials (str | Union[dict,str]) – Path or dict to a JSON with the credentials to access the provider.

Returns:

A MLOpsDataSource object

Return type:

MLOpsDataSource

Example

>>> client.register_datasource(
>>>     datasource_name='MyDataSourceName',
>>>     provider='GCP',
>>>     cloud_credentials='./gcp_credentials.json',
>>>     group='my_group'
>>> )

MLOpsDataSource#

class mlops_codex.datasources.MLOpsDataSourceClient(login: str | None = None, password: str | None = None, tenant: str | None = None, url: str | None = None)[source]#

Bases: BaseMLOpsClient

Class for client for manage datasources

Parameters:

login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net/, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this

Raises:

ServerError – Database produced an unexpected error.
AuthenticationError – If user is not in the master group.
CredentialError – If the Cloud Credential is Invalid

credentials_to_json(input_data: dict) → str[source]#

Transform dict to json.

Parameters:: input_data (dict) – A dictionary to save.
Returns:: Path to the credentials file.
Return type:: str

get_datasource(*, datasource_name: str, provider: str, group: str)[source]#

Get a MLOpsDataSource to make datasource operations.

Parameters:

datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
group (str) – Name of the group where we will search the datasources

Returns:

A MLOpsDataSource object

Return type:

MLOpsDataSource

Example

>>> client.get_datasource(datasource_name='MyDataSourceName', provider='GCP', group='my_group')

list_datasources(*, provider: str, group: str)[source]#

List all datasources of the group with this provider type.

Parameters:

group (str) – Name of the group where we will search the datasources
provider (str ("Azure" | "AWS" | "GCP")) –

Raises:

AuthenticationError – Raised if there is an authentication issue.
ServerError – Raised if the server encounters an issue.
InputError – Raised if something went wrong.

Returns:

A list of datasources information.

Return type:

list

Example

>>> client.list_datasources(provider='GCP', group='my_group')

register_datasource(*, datasource_name: str, provider: str, cloud_credentials: dict | str, group: str)[source]#

Parameters:

group (str) – Name of the group where we will search the datasources.
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
cloud_credentials (str | Union[dict,str]) – Path or dict to a JSON with the credentials to access the provider.

Returns:

A MLOpsDataSource object

Return type:

MLOpsDataSource

Example

>>> client.register_datasource(
>>>     datasource_name='MyDataSourceName',
>>>     provider='GCP',
>>>     cloud_credentials='./gcp_credentials.json',
>>>     group='my_group'
>>> )

MLOpsDataset#

class mlops_codex.datasources.MLOpsDataset(login: str, password: str, base_url: str, hash: str, dataset_name: str, group: str)[source]#

Bases: object

Dataset class to represent mlops dataset.

Parameters:

login (str) – Login for authenticating with the client.
password (str) – Password for authenticating with the client.
base_url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
hash (str) – Dataset hash to download.
dataset_name (str) – Name of the dataset.
group (str) – Name of the group where we will search the dataset
origin (str) – Origin of the dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”

base_url: str#

dataset_name: str#

download(path: str | None = './', filename: str | None = 'dataset') → None[source]#

Download a dataset from mlops. The dataset will be a csv or parquet file.

Parameters:

path (str, optional) – Path to the downloaded dataset. Defaults to ‘./’.
filename (str, optional) – Name of the downloaded dataset. Defaults to ‘dataset.zip’.

Raises:

AuthenticationError – Raised if there is an authentication issue.
DatasetNotFoundError – Raised if there is no dataset with the given name.
ServerError – Raised if the server encounters an issue.

group: str#

hash: str#

login: str#

password: str#