Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation examples for loading credentials externally #2299

Merged
merged 10 commits into from
Feb 8, 2023
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -29,6 +29,7 @@
* Fixed bug causing the `after_dataset_saved` hook only to be called for one output dataset when multiple are saved in a single node and async saving is in use.
* Log level for "Credentials not found in your Kedro project config" was changed from `WARNING` to `DEBUG`.
* Added safe extraction of tar files in `micropkg pull` to fix vulnerability caused by [CVE-2007-4559](https://github.com/advisories/GHSA-gw9q-c7gh-j9vm).
* Added example for loading external credentials to the Hooks documentation.

## Breaking changes to the API

69 changes: 69 additions & 0 deletions docs/source/hooks/common_use_cases.md
Original file line number Diff line number Diff line change
@@ -113,3 +113,72 @@ def after_dataset_loaded(self, dataset_name: str, data: Any) -> None:
end = time.time()
self._logger.info("Loading dataset %s ended at %0.3f", dataset_name, end)
```

## Use Hooks to load external credentials
We recommend using the `after_context_created` Hook to add credentials to the session's config loader instance from any external credentials manager. In this example we show how to load credentials from [Azure KeyVault](https://learn.microsoft.com/en-us/azure/key-vault/general/).

Here is the example KeyVault instance, note the KeyVault and secret names:

![](../meta/images/example_azure_keyvault.png)

These credentials will be used to access these datasets in the data catalog:

```yaml
weather:
type: spark.SparkDataSet
filepath: s3a://your_bucket/data/01_raw/weather*
file_format: csv
credentials: s3_creds

cars:
type: pandas.CSVDataSet
filepath: https://your_data_store.blob.core.windows.net/data/01_raw/cars.csv
file_format: csv
credentials: abs_creds
```

We can then use the following hook implementation to fetch and inject these credentials:

```python
# hooks.py

from kedro.framework.hooks import hook_impl
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential


class AzureSecretsHook:
@hook_impl
def after_context_created(self, context) -> None:
keyVaultName = "keyvault-0542abb" # or os.environ["KEY_VAULT_NAME"] if you would like to provide it through environment variables
KVUri = f"https://{keyVaultName}.vault.azure.net"

my_credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=my_credential)

secrets = {
"abs_creds": "azure-blob-store",
"s3_creds": "s3-bucket-creds",
}
azure_creds = {
cred_name: client.get_secret(secret_name).value
for cred_name, secret_name in secrets.items()
}

context.config_loader["credentials"] = {
**context.config_loader["credentials"],
**azure_creds,
}
```

Finally, [register the hook](https://kedro.readthedocs.io/en/stable/hooks/introduction.html#registering-your-hook-implementations-with-kedro) in your `settings.py` file:

```python
from my_project.hooks import AzureSecretsHook

HOOKS = (AzureSecretsHook(),)
```

```{note}
Note: `DefaultAzureCredential()` is Azure's recommended approach to authorise access to data in your storage accounts. For more information, consult the [documentation about how to authenticate to Azure and authorize access to blob data](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python).
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.