Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: how to access credentials inside a node? #575

Closed
mnowotnik opened this issue Oct 20, 2020 · 8 comments
Closed

Question: how to access credentials inside a node? #575

mnowotnik opened this issue Oct 20, 2020 · 8 comments

Comments

@mnowotnik
Copy link

mnowotnik commented Oct 20, 2020

What are you trying to do?

I am trying to perform ETL operations inside a node. To do this, I need access to database credentials.

Workaround

As a workaround, I can add sqlalchemy engine to DataCatalog in register_data_catalog hook dynamically, but I don't think DataCatalog should be used in this way.

        catalog.add_feed_dict(
            {"mssql_engine": MemoryDataSet(self._mssql_engine, copy_mode='assign')}
        )
@limdauto
Copy link
Contributor

Hi @mnowotnik, thank you for your question. Could you explain a bit more why your ETL operations, which interface with a DB, can't use a dataset?

@bensdm
Copy link

bensdm commented Oct 28, 2020

interested by the answer too, what if i need credentials to use let's say a google API inside a node? how can I pass it as a param?

@mnowotnik
Copy link
Author

mnowotnik commented Nov 5, 2020

Hi @mnowotnik, thank you for your question. Could you explain a bit more why your ETL operations, which interface with a DB, can't use a dataset?

Thanks for taking interest in my concern @limdauto .
I need, among other things, to run a procedure in an MySQL database to prepare data before it can be loaded as a dataset. Since Kedro does not shy away from both read and write operations to external sources, I assume my use case is not entirely outside of conceptual target use of Kedro.

Moreover, I want to execute this operation specifically in the scope of a Node, as opposed to in e.g. _load method of a custom dataset impl, to leverage hook mechanism for easy task tracking.

@atgmello
Copy link

atgmello commented Jan 3, 2021

I have a similar usecase as the one mentioned by @bensdm. I also could not find a built-in way of getting this data from the credentials.yml. For now I simply wrap my credential loading in a node and pass the resulting dict as an input to wherever they're needed. Something along the lines of:

# credentials.yml
# (...)
app:
    client_id: abc
    client_secret: xyz
# nodes.py
import yaml
# (...)
def get_app_credentials():
    with open("./conf/local/credentials.yml") as cred:
        cred_dict = yaml.safe_load(cred).get("app")
    return cred_dict

def authenticate_user(credentials):
  # (...)
  return something
# pipeline.py
# (...)
def create_pipeline(**kwargs):
    return Pipeline(
        [
            node(
                func=get_app_credentials,
                inputs=None,
                outputs="credentials"
            ),
            node(
                func=authenticate_user,
                inputs="credentials",
                outputs="something",
            ),
            # (...)
        ]
    )

Any thoughts or comments on whether this is an appropriate work-around or not are very much welcome!

@javierabosch2
Copy link

javierabosch2 commented Feb 2, 2021

I believe the proper way to do this is to implement or extend a custom DataSet, such as APIDataSet, but I can agree with the OP that there are other use cases, like @bensdm mentioned above, or to manage a session. Not to mentioned, it just more convenient.

you could also load the credentials inside the ProjectHooks and then set them as environment variables.

@limdauto
Copy link
Contributor

limdauto commented Feb 3, 2021

@bensdm

To run a procedure in an MySQL database to prepare data before it can be loaded as a dataset

I think the idiomatic way is to have a node called prepare_data and execute the procedure in MySQL through a dataset. The node can return a status code as its output for task tracking purpose or some custom string like the output table name that you want to use in the next node.

Having said that, if you still want to access credentials from credentials.yaml in a node, generally you would need a ConfigLoader instance if you don't want to hard code the path to the credentials file like in @atgmello's workaround. In Kedro 0.17, you can retrieve the current Kedro session with get_current_session and retrieve the config_loader instance from there:

from kedro.framework.session import get_current_session
session = get_current_session()
context = session.load_context()
credentials = context._get_config_credentials()

# or credentials = context.config_loader.get("credentials*", "credentials*/**", "**/credentials*")

But with great power comes great responsibility here. Coupling your node with the global session is only intended to be used sparingly.

The last workaround is instead of using credentials, you can use parameters instead. For example:

$ kedro run --params api_token=<my-api-token>

And you get access to that in the node through the params:api_token input. Or you can also inject this value into parameters.yaml through an environment variable. I have written a tutorial here on the injection of env var into your configuration: https://kedrozerotohero.com/programming-patterns/how-to-inject-secrets-into-your-kedro-configuration

Hope this helps!

@limdauto
Copy link
Contributor

limdauto commented Feb 3, 2021

Since there are a number of alternatives to accomplish what you are after, I will close this issue but please feel free to re-open it if you need more support.

@sami-sweng
Copy link

sami-sweng commented Feb 7, 2024

The generic solution that adapts to all type of config is as follow:

Create a dataset of type CredentialsDataset, under src/<project>/datasets/credentials.py.

from typing import Any, Dict
from kedro.io import AbstractDataset

class CredentialsDataset(AbstractDataset):
    def __init__(self, credentials: Dict[str, Any] = None):
        self._credentials = credentials

    def _load(self) -> dict:
        return self._credentials

    def _save(self) -> None:
        print("save")

    def _describe(self) -> Dict[str, Any]:
        return dict(credentials=self._credentials)

Create the empty file src/<project>/datasets/__init__.py if it does not exist.

In the catalog declare an entry with the credential you would like to load:

my_cred_input:
  type: <project>.datasets.credentials.CredentialsDataset
  credentials: test_creds # name of the credential entry

Then your node can use my_cred_input as input and get the credential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants