DataSource module#
Module with all classes and methods to manage the data sources and data sets
MLOpsDataSourceClient#
- class mlops_codex.datasources.MLOpsDataSourceClient(*, login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
BaseMLOpsClient
Class for client for manage datasources
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net/, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
- Raises:
ServerError – Database produced an unexpected error.
AuthenticationError – If user is not in the master group.
CredentialError – If the Cloud Credential is Invalid
- credentials_to_json(input_data: dict) str [source]#
Transform dict to json.
- Parameters:
input_data (dict) – A dictionary to save.
- Returns:
Path to the credentials file.
- Return type:
str
- get_datasource(*, datasource_name: str, provider: str, group: str)[source]#
Get a MLOpsDataSource to make datasource operations.
- Parameters:
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
group (str) – Name of the group where we will search the datasources
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.get_datasource(datasource_name='MyDataSourceName', provider='GCP', group='my_group')
- list_datasources(*, provider: str, group: str)[source]#
List all datasources of the group with this provider type.
- Parameters:
group (str) – Name of the group where we will search the datasources
provider (str ("Azure" | "AWS" | "GCP")) –
- Raises:
AuthenticationError – Raised if there is an authentication issue.
ServerError – Raised if the server encounters an issue.
InputError – Raised if something went wrong.
- Returns:
A list of datasources information.
- Return type:
list
Example
>>> client.list_datasources(provider='GCP', group='my_group')
- register_datasource(*, datasource_name: str, provider: str, cloud_credentials: dict | str, group: str)[source]#
Register the user cloud credentials to allow MLOps to use the provider to download the datasource.
- Parameters:
group (str) – Name of the group where we will search the datasources.
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
cloud_credentials (str | Union[dict,str]) – Path or dict to a JSON with the credentials to access the provider.
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.register_datasource( >>> datasource_name='MyDataSourceName', >>> provider='GCP', >>> cloud_credentials='./gcp_credentials.json', >>> group='my_group' >>> )
MLOpsDataSource#
- class mlops_codex.datasources.MLOpsDataSourceClient(*, login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
BaseMLOpsClient
Class for client for manage datasources
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net/, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
- Raises:
ServerError – Database produced an unexpected error.
AuthenticationError – If user is not in the master group.
CredentialError – If the Cloud Credential is Invalid
- credentials_to_json(input_data: dict) str [source]#
Transform dict to json.
- Parameters:
input_data (dict) – A dictionary to save.
- Returns:
Path to the credentials file.
- Return type:
str
- get_datasource(*, datasource_name: str, provider: str, group: str)[source]#
Get a MLOpsDataSource to make datasource operations.
- Parameters:
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
group (str) – Name of the group where we will search the datasources
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.get_datasource(datasource_name='MyDataSourceName', provider='GCP', group='my_group')
- list_datasources(*, provider: str, group: str)[source]#
List all datasources of the group with this provider type.
- Parameters:
group (str) – Name of the group where we will search the datasources
provider (str ("Azure" | "AWS" | "GCP")) –
- Raises:
AuthenticationError – Raised if there is an authentication issue.
ServerError – Raised if the server encounters an issue.
InputError – Raised if something went wrong.
- Returns:
A list of datasources information.
- Return type:
list
Example
>>> client.list_datasources(provider='GCP', group='my_group')
- register_datasource(*, datasource_name: str, provider: str, cloud_credentials: dict | str, group: str)[source]#
Register the user cloud credentials to allow MLOps to use the provider to download the datasource.
- Parameters:
group (str) – Name of the group where we will search the datasources.
datasource_name (str) – Name given previously to the datasource.
provider (str) – It can be “Azure”, “AWS” or “GCP”
cloud_credentials (str | Union[dict,str]) – Path or dict to a JSON with the credentials to access the provider.
- Returns:
A MLOpsDataSource object
- Return type:
MLOpsDataSource
Example
>>> client.register_datasource( >>> datasource_name='MyDataSourceName', >>> provider='GCP', >>> cloud_credentials='./gcp_credentials.json', >>> group='my_group' >>> )
MLOpsDataset#
- class mlops_codex.datasources.MLOpsDataset(*, login: str, password: str, url: str, dataset_hash: str, dataset_name: str, group: str, origin: str)[source]#
Bases:
BaseModel
Dataset class to represent mlops dataset.
- Parameters:
login (str) – Login for authenticating with the client.
password (str) – Password for authenticating with the client.
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
dataset_hash (str) – Dataset hash to download.
dataset_name (str) – Name of the dataset.
group (str) – Name of the group where we will search the dataset
origin (str) – Origin of the dataset. It can be “Training”, “Preprocessing”, “Datasource” or “Model”
- dataset_hash: str#
- dataset_name: str#
- download(*, path: str = './', filename: str = 'dataset') None [source]#
Download a dataset from mlops. The dataset will be a csv or parquet file.
- Parameters:
path (str, optional) – Path to the downloaded dataset. Defaults to ‘./’.
filename (str, optional) – Name of the downloaded dataset. Defaults to ‘dataset.parquet’ or ‘dataset.csv’.
- group: str#
- host_preprocessing(*, name: str, group: str, script_path: str, entrypoint_function_name: str, requirements_path: str, python_version: str | None = '3.9')[source]#
Host a preprocessing script via dataset module. By default, the user will host and wait the hosting. It returns a MLOpsPreprocessingAsyncV2, then you can run it.
- Parameters:
name (str) – Name of the new preprocessing script
group (str) – Group of the new preprocessing script Dataset to upload schema to
script_path (str) – Path to the python script
entrypoint_function_name (str) – Name of the entrypoint function in the python script
python_version (str) – Python version for the model environment. Available versions are 3.8, 3.9, 3.10. Defaults to ‘3.9’
requirements_path (str) – Path to the requirements file
- Returns:
Preprocessing async version of the new preprocessing script.
- Return type:
- login: str#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None #
We need to both initialize private attributes and call the user-defined model_post_init method.
- origin: str#
- password: str#
- run_preprocess(*, preprocessing_script_hash: str, execution_id: int)[source]#
Run a preprocessing script execution from a dataset. By default, the user will run the preprocessing script and wait until it completes.
- Parameters:
preprocessing_script_hash (str) – Hash of the preprocessing script
execution_id (int) – Preprocessing Execution ID
- url: str#