-
Notifications
You must be signed in to change notification settings - Fork 934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new function to specify namespace for KedroSession
and kedro run
#2306
Conversation
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Super cool! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work ✅
Do we need a specific test for the namespace option? I don't think it's needed as it is more or less equal to other arguments and the filtering function is tested in the test_pipeline.py already.
I think the updated test_run_with_pipeline_filters
makes an extra test unnecessary.
The ordering for these cli options and KedroSession.run is kind of arbitrary, for KedroSession it is the code API so it makes sense to add the new option as the last argument to keep it backward compatible.
I think the ordering of click.option
decorators is consistent with the order in which their passed value appears in the function signature of run
, so personally I think it should be added to the end if possible.
Noted some arguments have inconsistent names. For example kedro run --pipeline vs KedroSession.run(pipline_name=xxx)
I'll take a look if this is on our radar, if not I'll open a ticket.
@@ -336,6 +337,7 @@ def run( # pylint: disable=too-many-arguments,too-many-locals | |||
used as an end point of the new ``Pipeline``. | |||
load_versions: An optional flag to specify a particular dataset | |||
version timestamp to load. | |||
namespace: The namespace of the nodes that is being run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace: The namespace of the nodes that is being run. | |
namespace: An optional string specifying a namespace to run. If specified, only the nodes in this namespace will be run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @jmholzer
@@ -353,6 +355,7 @@ def activate_nbstripout( | |||
callback=_reformat_load_versions, | |||
) | |||
@click.option("--pipeline", "-p", type=str, default=None, help=PIPELINE_ARG_HELP) | |||
@click.option("--namespace", "-ns", type=str, default=None, help=NAMESPACE_ARG_HELP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the ordering of click.option
decorators consistent with the arguments in the function signature, I think this should be moved to the bottom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually follow the order? The first click
option is from_input
but the first argument would be tag
. I think it only affects when you do kedro run -h
, it will follow the order that how you decorate your function. In general, I think it should follow the order of importance. I would expect pipeline
to be the first argument in that case.
I don't have a strong opinion about moving it to the back, the only reason I keep it next to pipeline
is that I feel these two arguments are in the same category. Similarly, the nodes
arguments are together, and the config-related arguments are grouped as well. @merelcht or @AntonyMilneQB Is this something we've discussed in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree with you here @noklam 👍 We should go for what makes sense to the user when you do kedro run --help
; no need to be consistent with the order of KedroSession.run
(which as you've said is somewhat arbitrary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ⭐
On the questions you ask:
- Agreed with @jmholzer, tests look sufficient 👍
- Agreed with @noklam. We should aim for a good CLI UX first and then consistency with Python API second. Adding the argument to the end of the
KedroSession.run
is correct to maintain backwards compatibility but doesn't need to be reflected in the clickrun
command - as you notice, we already don't have consistency there. (Backwards compatibility is kind of an edge case here because I doubt anyone callsKedroSession.run
with positional arguments, but technically what you've done is right.)
Going beyond this PR, it would be nice to have eventual consistency between the Python API and click run
command. More importantly than the order is the naming of the arguments. This is something I've noticed before and it's on our radar here: #2247. Note the pipeline_name
vs. pipeline
question follows the same pattern as the node_name
vs. node
one noted there.
In fact, I now have another thought on the ordering of arguments - will add it to that issue to keep it in one place.
kedro/framework/cli/project.py
Outdated
NAMESPACE_ARG_HELP = """Name of the namespace to run. | ||
If not set, no namespace is used.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAMESPACE_ARG_HELP = """Name of the namespace to run. | |
If not set, no namespace is used.""" | |
NAMESPACE_ARG_HELP = """Name of the node namespace to run.""" |
IMO the "no namespace is used" just makes this less clear since it can't be really explained fully in just a line.
@@ -353,6 +355,7 @@ def activate_nbstripout( | |||
callback=_reformat_load_versions, | |||
) | |||
@click.option("--pipeline", "-p", type=str, default=None, help=PIPELINE_ARG_HELP) | |||
@click.option("--namespace", "-ns", type=str, default=None, help=NAMESPACE_ARG_HELP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree with you here @noklam 👍 We should go for what makes sense to the user when you do kedro run --help
; no need to be consistent with the order of KedroSession.run
(which as you've said is somewhat arbitrary).
Thank you @AntonyMilneQB, I will update the docs🙂 |
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
@@ -562,9 +565,13 @@ def test_run_with_pipeline_filters( | |||
): | |||
from_nodes = ["--from-nodes", "splitting_data"] | |||
to_nodes = ["--to-nodes", "training_model"] | |||
tags = ["--tag", "de"] | |||
namespace = ["--namespace", "fake_namespace"] | |||
tags = ["--tags", "de"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like tags
is being overwritten here 🤔
@@ -336,6 +337,7 @@ def run( # pylint: disable=too-many-arguments,too-many-locals | |||
used as an end point of the new ``Pipeline``. | |||
load_versions: An optional flag to specify a particular dataset | |||
version timestamp to load. | |||
namespace: The namespace of the nodes that is being run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @jmholzer
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
…un` (#2306) * Add new namespace options for KedroSession and `kedro run` Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Change the shortcut from `--ns` to `-ns` to be consistent Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Fix lint Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Fix broken tests and update new test for namespace argument Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * update docs and release notes Signed-off-by: Nok <nok_lam_chan@mckinsey.com> * Fix tests Signed-off-by: Nok <nok_lam_chan@mckinsey.com> * Fix test for incorrect merge Signed-off-by: Nok <nok_lam_chan@mckinsey.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Nok <nok_lam_chan@mckinsey.com> Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Description
Close #2145
Description
Currently there's a way to select different sections from a pipeline to run, but selecting by namespace is missing. We should add a way for users to run a specific subpipeline (namespace) by executing this in the terminal:
Development notes
session.py
to addnamespace
as record data and modify the test accordinglyproject.py
to add new option for the-ns
or--namespace
How this is tested?
It is tested manually with a modified version of
spaceflight
, since the current starters do not use namespace, I followed this modular pipeline documentation for testing.Steps
kedro new -s spaceflights
Kedro CLI
kedro run --namespace active_modelling
- 3 nodes should be runkedro run --namespace active_modelling --pipeline data_science
- 3 3 nodes should be run, since the namespace only available in one pipelinekedro run --namespace 1234
-- ErrorValueError: Pipeline does not contain nodes with namespace '1234'
KedroSession
Similar tests are run with the
kedro ipython
. run%reload_kedro
to refresh a sessionsession.run(namespace="active_modelling")
session.run(namespace="active_modelling", pipeline_name="data_science")
session.run(namespace="1234")
-- errorValueError: Pipeline does not contain nodes with namespace '1234'
Questions for Reviewers
namespace
option? I don't think it's needed as it is more or less equal to other arguments and the filtering function is tested in thetest_pipeline.py
already.KedroSession.run
is kind of arbitrary, forKedroSession
it is the code API so it makes sense to add the new option as the last argument to keep it backward compatible. Noted some arguments have inconsistent names. For examplekedro run --pipeline
vsKedroSession.run(pipline_name=xxx)
Checklist
RELEASE.md
file