Training module#
Module with the classes that allows to manage your training experiments.
MLOpsTrainingExecution#
- class mlops_codex.training.MLOpsTrainingExecution(*, training_id: str, group: str, exec_id: str, login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
MLOpsExecution
Class to manage trained models.
- Parameters:
training_id (str) – Training id (hash) from the experiment you want to access
group (str) – Group the training is inserted.
exec_id (str) – Execution id for that specific training run
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
environment (str) – Environment of MLOps you are using.
run_data (dict) – Metadata from the execution.
- Raises:
TrainingError – When the training can’t be accessed in the server
AuthenticationError – Invalid credentials
Example
from mlops_codex.training import MLOpsTrainingClient from mlops_codex.base import MLOpsExecution client = MLOpsTrainingClient('123456') client.create_group('ex_group', 'Group for example purpose') training = client.create_training_experiment('Training example', 'Classification', 'ex_group') print(client.get_training(training.training_id, 'ex_group').training_data) data_path = './samples/train/' run = training.run_training('First test', data_path+'dados.csv', training_reference='train_model', training_type='Custom', python_version='3.9', requirements_file=data_path+'requirements.txt', wait_complete=True) print(run.get_training_execution(run.exec_id)) print(run.download_result()) run.promote_model('Teste notebook promoted custom', 'score', data_path+'app.py', data_path+'schema.json', 'csv')
- get_status() dict [source]#
Gets the status of the related execution.
- Raises:
ExecutionError – Execution unavailable
- Returns:
Returns the execution status.
- Return type:
dict
- promote_model(*, model_name: str, operation: str = 'Sync', schema: str | dict | None = None, model_reference: str | None = None, source_file: str | None = None, extra_files: list | None = None, requirements_file: str | None = None, env: str | None = None, input_type: str | None = None) MLOpsModel [source]#
Upload models trained inside MLOps.
- Parameters:
model_name (str) – The name of the model, in less than 32 characters
model_reference (Optional[str], optional) – The name of the scoring function inside the source file
source_file (Optional[str], optional) – Path of the source file. The file must have a scoring function that accepts two parameters: data (data for the request body of the model) and model_path (absolute path of where the file is located)
schema (Union[str, dict], optional) – Path to a JSON or XML file with a sample of the input for the entrypoint function. A dict with the sample input can be sending as well
extra_files (list, optional) – A optional list with additional files paths that should be uploaded. If the scoring function refer to this file they will be on the same folder as the source file
requirements_file (str, optional) – Path of the requirements file. This will override the requirements used in trainning. The packages versions must be fixed eg: pandas==1.0
env (str, optional) – Flag that choose which environment (dev, staging, production) of MLOps you are using. Default is True
operation (str) – Defines which kind operation is being executed (Sync or Async). Default value is Sync
input_type (str) – The type of the input file that should be ‘json’, ‘csv’ or ‘parquet’
- Raises:
TrainingError – The training execution shouldn’t be succeeded to be promoted
- Returns:
The new training model
- Return type:
Example
>>> training = run.promote_model('Teste notebook promoted custom', 'score', './samples/train/app.py', './samples/train/schema.json', 'csv')
MLOpsTrainingExperiment#
- class mlops_codex.training.MLOpsTrainingExperiment(*, training_id: str, login: str | None = None, password: str | None = None, group: str = 'datarisk', url: str = 'https://neomaril.datarisk.net/')[source]#
Bases:
BaseMLOps
Class to manage models being trained inside MLOps
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
training_id (str) – Training id (hash) from the experiment you want to access
group (str) – Group the training is inserted.
environment (str) – Flag that choose which environment of MLOps you are using. Test your deployment first before changing to production. Default is True
executions (List[int]) – Ids for the executions in that training
- Raises:
TrainingError – When the training can’t be accessed in the server
AuthenticationError – Invalid credentials
Example
from mlops_codex.training import MLOpsTrainingClient from mlops_codex.base import MLOpsExecution client = MLOpsTrainingClient('123456') client.create_group('ex_group', 'Group for example purpose') training = client.create_training_experiment('Training example', 'Classification', 'ex_group') print(client.get_training(training.training_id, 'ex_group').training_data) data_path = './samples/train/' run = run = training.run_training('First test', data_path+'dados.csv', training_reference='train_model', training_type='Custom', python_version='3.9', requirements_file=data_path+'requirements.txt', wait_complete=True) print(run.get_training_execution(run.exec_id)) print(run.download_result())
- get_all_training_executions() List[MLOpsTrainingExecution] [source]#
Get all executions from that experiment.
- Returns:
All executions from that training
- Return type:
List[MLOpsTrainingExecution]
- get_training_execution(exec_id: str | None = None) MLOpsTrainingExecution [source]#
Get the execution instance.
- Parameters:
exec_id (Optional[str], optional) – Execution id. If not informed we get the last execution.
- Returns:
The chosen execution
- Return type:
- log_train(*, name, X_train, y_train, description: str | None = None, save_path: str | None = None)[source]#
- run_training(*, run_name: str, training_type: str = 'External', description: str | None = None, train_data: str | None = None, dataset: str | MLOpsDataset | None = None, training_reference: str | None = None, python_version: str = '3.10', conf_dict: str | dict | None = None, source_file: str | None = None, requirements_file: str | None = None, extra_files: list | None = None, env: str | None = None, X_train=None, y_train=None, model_outputs=None, model_file: str | None = None, model_metrics: str | dict | None = None, model_params: str | dict | None = None, model_hash: str | None = None, wait_complete: bool | None = False) dict | MLOpsExecution [source]#
Runs a prediction from the current model.
- Parameters:
run_name (str) – The name of the model, in less than 32 characters
train_data (str) – Path of the file with train data.
training_reference (Optional[str], optional) – The name of the training function inside the source file. Just used when training_type is Custom
training_type (str) – Can be Custom, AutoML or External
description (Optional[str], optional) – Description of the experiment
python_version (Optional[str], optional) – Python version for the training environment. Available versions are 3.8, 3.9, 3.10. Defaults to ‘3.10’
conf_dict (Union[str, dict]) – Path to a JSON file with the AutoML configuration. A dict can be sending as well. Just used when training_type is AutoML
source_file (Optional[str], optional) – Path of the source file. The file must have a training function that accepts one parameter: model_path (absolute path of where the file is located). Just used when training_type is Custom
requirements_file (str) – Path of the requirements file. The packages versions must be fixed eg: pandas==1.0. Just used when training_type is Custom
env (Optional[str], optional) – .env file to be used in your training enviroment. This will be encrypted in the server.
extra_files (Optional[list], optional) – A optional list with additional files paths that should be uploaded. If the scoring function refer to this file they will be on the same folder as the source file. Just used when training_type is Custom
wait_complete (Optional[bool], optional) – Boolean that informs if a model training is completed (True) or not (False). Default value is False
- Raises:
InputError – Some input parameters its invalid
- Returns:
The return of the scoring function in the source file for Sync models or the execution class for Async models.
- Return type:
Union[dict, MLOpsExecution]
Example
>>> execution = run = training.run_training('First test', data_path+'dados.csv', training_reference='train_model', python_version='3.9', requirements_file=data_path+'requirements.txt', wait_complete=True)
MLOpsTrainingClient#
- class mlops_codex.training.MLOpsTrainingClient(*, login: str | None = None, password: str | None = None, url: str | None = None)[source]#
Bases:
BaseMLOpsClient
Class for client for accessing MLOps and manage models
- Parameters:
login (str) – Login for authenticating with the client. You can also use the env variable MLOPS_USER to set this
password (str) – Password for authenticating with the client. You can also use the env variable MLOPS_PASSWORD to set this
url (str) – URL to MLOps Server. Default value is https://neomaril.datarisk.net, use it to test your deployment first before changing to production. You can also use the env variable MLOPS_URL to set this
- Raises:
AuthenticationError – Invalid credentials
ServerError – Server unavailable
Example
from mlops_codex.training import MLOpsTrainingClient client = MLOpsTrainingClient('123456') client.create_group('ex_group', 'Group for example purpose') training = client.create_training_experiment('Training example', 'Classification', 'Custom', 'ex_group') print(client.get_training(training.training_id, 'ex_group').training_data)
- create_training_experiment(*, experiment_name: str, model_type: str, group: str, force: bool | None = False) MLOpsTrainingExperiment [source]#
Create a new training experiment on MLOps.
- Parameters:
experiment_name (str) – The name of the experiment, in less than 32 characters
model_type (str) – The name of the scoring function inside the source file.
group (str) – Group the model is inserted. Default to ‘datarisk’ (public group)
force (Optional[bool], optional) – Forces to create a new training with the same model_type, experiment_name, group
- Raises:
InputError – Some input parameters its invalid
ServerError – Unknow internal server error
- Returns:
A MLOpsTrainingExperiment instance with the training hash from training_id
- Return type:
Example
>>> training = client.create_training_experiment('Training example', 'Classification', 'ex_group')
- get_training(*, training_id: str, group: str) MLOpsTrainingExperiment [source]#
Acess a model using its id
- Parameters:
training_id (str) – Training id (hash) that needs to be acessed
group (str) – Group the model is inserted.
- Raises:
TrainingError – Model unavailable
ServerError – Unknown return from server
- Returns:
A MLOpsTrainingExperiment instance with the training hash from training_id
- Return type:
Example
>>> training = get_training('Tfb3274827a24dc39d5b78603f348aee8d3dbfe791574dc4a6681a7e2a6622fa')