Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update table with Kedro DataCatalog or accesss credentials programmatically #545

Closed
noklam opened this issue Oct 5, 2020 · 6 comments
Closed

Comments

@noklam
Copy link
Contributor

noklam commented Oct 5, 2020

What are you trying to do?

I want to update a certain table with Kedro, which require

  1. Delete some rows base on some condition
  2. Insert new rows

Trials:

  1. Use Kedro DataCatalog API
table:
  type: pandas.SQLQueryDataSet
  sql: delete * from table where x=1
  credentials: oracle

Failed: Kedro DataCatalog API seems does not let you just delete, it expects rows return.

  1. Use plain pd.read_sql(), I cannot find a way to access to the credentials.yml
@noklam noklam changed the title Update SQL dataset instead of create a new Update table with Kedro DataCatalog or accesss credentials programmatically Oct 5, 2020
@lorenabalan
Copy link
Contributor

Please see this Stackoverflow thread. Since pandas doesn't support it, the deletion part should be done within a node, where inputs and outputs are sql datasets.
Please consider directing this type of questions to Stackoverflow and tag them with kedro, as we monitor that as well and it's a platform better suited for this purpose.

@noklam
Copy link
Contributor Author

noklam commented Oct 6, 2020

yes, I end up overwriting ProjectContext to make credentials part of parameters, since the crendentials is not accesible outside of data catalog, not sure if this is the ideal way.

@lorenabalan
Copy link
Contributor

Thanks for getting back to us. Credentials should definitely not be exposed in the parameters as it's sensitive information. The SQLTableDataSet can handle the credentials and loading/saving so no need to access them in the node. The node can just read the table, operate on the dataframe to delete the rows, then write the resulting dataframe back to sql.

@noklam
Copy link
Contributor Author

noklam commented Oct 6, 2020

@lorenabalan
I see, so your suggestion is instead of doing a "delete" operation in database, I should read the table, delete in pandas, then just overwrite with a new table?

I think it should work for this case, but it may not be an efficient way if I am only updating a small amount of data in a table.

@lorenabalan
Copy link
Contributor

Yup exactly. Depending on the use case it may not be the optimal workflow. I believe Kedro is not very well suited for a sql-only or sql-heavy workflow. I'll leave this thread here for more inspiration: https://www.thetopsites.net/article/58169947.shtml

@lorenabalan
Copy link
Contributor

Closing as answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants