-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to blacklist params. also sanitise keys #595
Add ability to blacklist params. also sanitise keys #595
Conversation
Hi, I definitely understand why you don't want to log secrets and I am not strictly against this change, but I think the main issue at stake here is that they should not be params in the first place. Avoid logging in mlflow is likely the least of your problems : if some secrets are prams, they will be logged to your remote version control forge alongside with configuration. Even if you use it in a local folder, this means these secrets will be required in a non crypted file in production. Can you elaborate on why you need to have sensitive data in the params in the first place? I suggest you use environment variable to retrieve these sensitive data directly in the node, so you have control over what is logged. |
@Galileo-Galilei let me get back on the secrets one, I think you have a point although I think kedro works this way (I agree ENV variables injection is better and we do do that but we still pass them through the catalog into the python functions to keep them pure) On the sanitization you're OK? The error was that we had properties in a data-fabricator that were something like |
Clearly an active discussion, spotted your name again here @merelcht @Galileo-Galilei given this plugin takes all params and sends off to 3rd party service, but kedro recommends passing credentials through parameters into nodes, wdyt is best here? Maybe just use Pydantic Settings and create a It does feel like a broader "what's kedros' stance" here. I like the recommendation of not passing secrets through the catalog by Galileo but then we need to also articulate an alternative in kedro docs. |
I removed the secrets removal from the parameters as that feels like a separate and much larger problem than the one where the parameters can have special chars in the key in kedro but not on MLflow. I hope this smaller change is less controversial |
Hello @pascalwhoop definitely ok to sanitize parameters like this, but I'd like to ensure consistency with mlflow. I was pretty sure it has existed at some point in the past, not sure why it's no longer the case. This issue mlflow/mlflow#9425 refers to the code with the exact regex mlflow uses under the hood (and explains the rationale behind this) , and we should use the same one. The implementation should import the mlflow function that performs validation so we don't have to mirror the evolution of the behaviour along with mlflow. Could you update that? EDIT: The regex is available here: https://github.com/mlflow/mlflow/blob/e40e782b6fcab473159e6d4fee85bc0fc10f78fd/mlflow/utils/validation.py#L140C1-L148C44 , and we can spot a slight difference between Windows and other OS. We cannot import this function, but we should mimic it and force replacement. Optionnaly, this should be an entry in the |
done @Galileo-Galilei |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, just a couple of questions / suggestions, but I'm ok to merge it without any change if you disagree.
bootstrap_project(kedro_project) | ||
with KedroSession.create( | ||
project_path=kedro_project, | ||
) as session: | ||
context = session.load_context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick, but can we just ignore all this? Since you invoke the hook manually, I don't think it's necessary creating a whole session and contexte, unless I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, this is needed to call the after_context_created hook, which format and stores the entire configuration.
) as session: | ||
context = session.load_context() | ||
mlflow_node_hook = MlflowHook() | ||
mlflow_node_hook.after_context_created(context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure you need after_context_created since you already set mlflow_tracking_uri
up just above.
Overall really like the test👍
pyproject.toml
Outdated
@@ -1,5 +1,5 @@ | |||
[tool.pytest.ini_options] | |||
addopts = "--cov=kedro_mlflow --cov-report=html tests/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't it necessary anymore? Don't remember exactly but I wonder if this is used by codecov to upload the coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah my bad, I had removed this to be able to call pytest ..path..to.test
manually, the tests/
at the end blocked that from being possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just for the record, I guess you're searching pytest -k "regex"
to execute only some tests based on a regex)
return name | ||
else: | ||
# Replace invalid characters with underscore | ||
return re.sub(r"[^/\w.\- :]", "_", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a logger.warning
to explain that the parameter is renamed? This might be confusing.
Almost good to go, I just don't understand why he test fails with "valid:param" -> "valid:param". I get I'll resume tomorrow, else it's good to go. |
It's fixed, it was a windows issue as usual :) I'll merge it tomorrow |
fc584c9
into
Galileo-Galilei:master
hey @Galileo-Galilei thanks, I was a bit swamped and failed to respond. Appreciate the merge |
Description
special chars need to be sanitized before sending to MLflow
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.