Connecting to a Data Source#

It is possible to connect a cloud provider to MLOps. The cloud provider is a source of data, where you can store and get the data that is saved data to create/perform models.

Currently, MLOps supports the following providers:

  • Google GCP

  • AWS S3

  • Azure Blob Storage

Register a data source#

To register your data source to MLOps, you must have the provider credentials access json, data source name and group name:

client = MLOpsDataSourceClient()
client.register_datasource(
    datasource_name='MyDataSourceName',
    provider='GCP',
    cloud_credentials='./gcp_credentials.json',
    group='my_group'
)

If you already have a registered data source and want to get that, you can do as following:

datasource = client.get_datasource(datasource_name='testeDataSource', provider='GCP', group='datarisk')

Once you’re connected to MLOps and registered your data source, you can list the available data sources:

client.list_datasource(provider='GCP', group='datarisk')

Importing a data set#

Now, you already have access to a data source, it allows you to import a data set to your data source. It is mandatory that you register a datasource so that you can import your dataset into it.

You can import a data set via url:

dataset_uri = 'https://storage.cloud.google.com/projeto/arquivo.csv'

dataset = datasource.import_dataset(
    dataset_uri=dataset_uri,
    dataset_name='meudatasetcorreto'
)

It generates a DHash

If you already connected your data source to MLOps and imported a data set, you can get your data set using DHash:

dataset = datasource.get_dataset(dataset_hash='D66c8bc440dc4882bfeff40c0dac11641c3583f3aa274293b15ed5db21000b49')

Deleting data source and Data set#

If you want to remove a data source or a data set, you can do as the following example:

datasource.delete()