Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate further inconsistencies between the CLI and Kedro API #2247

Closed
AhdraMeraliQB opened this issue Jan 23, 2023 · 6 comments
Closed
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Design: Research roadmap

Comments

@AhdraMeraliQB
Copy link
Contributor

Description

#2014 Explored inconsistencies within the CLI itself, but it stands that the CLI is tightly coupled with the API. This ticket looks at investigating any inconsistencies in this wider context with the aim of streamlining our user experience. For example:

  • Parameters in KedroSession.run() vs CLI kedro run
  • Parameter names in config.yml supplied to kedro run vs CLI kedro run flags

This list is far from exhaustive, but serves to highlight areas of consideration when conducting this investigation.

@AhdraMeraliQB AhdraMeraliQB changed the title Investigate inconsistencies between the CLI and Kedro API Investigate further inconsistencies between the CLI and Kedro API Jan 23, 2023
@merelcht merelcht added Component: CLI Issue/PR that addresses the CLI for Kedro Design: Research labels Jan 26, 2023
@astrojuanlu
Copy link
Member

Potentially useful: https://clig.dev/

@antonymilne
Copy link
Contributor

Copied from #2301 (comment):

Fine to keep node_names for now in #2245 but as part of #2247 I think we should consider just calling this nodes. (Just while I remember, Ivan's argument for why we might not want to do this: it's deliberately written as node_names to deter people from trying to pass node objects themselves to the underlying session.run. My argument for it is consistency and simplicity, particularly when we eventually return to #1423)

@antonymilne
Copy link
Contributor

antonymilne commented Feb 14, 2023

From discussion with @jmholzer and @noklam in #2306: there's inconsistent ordering between our click definition of run and the Python KedroSession.run API. This isn't a huge problem but would be nice to have them the same.

Proposed solution:

  1. Decide what the correct order of arguments is in click definition of run (which determines order flags appear in kedro run --help). Maybe this is already correct
  2. Re-order arguments in KedroSession.run to match
  3. Make KedroSession.run arguments keyword-only (i.e. def run (self, *, ...)) so we can alter the order or insert new arguments wherever we see fit in the future without it being a breaking change

I doubt anyone calls KedroSession.run without using keywords at the moment, but points 2 and 3 above are breaking changes so should only be done in 0.19. It would make sense to do any renaming of arguments at the same time as this in 0.19 anyway.

Probably the same applies to pipeline.filter and that should become keyword-only and match also.

wdyt @jmholzer @noklam?

@noklam
Copy link
Contributor

noklam commented Feb 16, 2023

I like the keyword-only idea and it should make things more flexible, if we shuffle the order in the future it won't be a breaking change anymore, is it correct?

On 1. we may leverage the HEAP statistic to arrange them according to usage, but the same group of arguments shoul go together, i.e. from-inputs, to-outputs. I expect the top arguments would be --pipeline, --env, --params,--tags

I agree that 2/3 should be done in 0.19, I don't know if people are using KedroSession.run without the keywords, it's still possible since the most common argument pipeline_name is the first argument. I would stay on the safe side and do it in 0.19

@antonymilne
Copy link
Contributor

I like the keyword-only idea and it should make things more flexible, if we shuffle the order in the future it won't be a breaking change anymore, is it correct?

Exactly. We should do the same for kedro-datasets at some point, since those are called (almost?) always with keyword arguments anyway, and it means we can reorder arguments and add new ones it however we like in the future. At the moment they're a bit of a mess.

On 1. we may leverage the HEAP statistic to arrange them according to usage, but the same group of arguments shoul go together, i.e. from-inputs, to-outputs. I expect the top arguments would be --pipeline, --env, --params,--tags`

Looking at the HEAP statistics is a great idea here although maybe there's the question about correlation vs. causation: what if some arguments are already used more frequently more frequently because of the position they appear in kedro run --help 🤔 I doubt that's really the case though, so fully agree about re-ordering them according to how popular they are and grouping together similar arguments.

I agree that 2/3 should be done in 0.19, I don't know if people are using KedroSession.run without the keywords, it's still possible since the most common argument pipeline_name is the first argument. I would stay on the safe side and do it in 0.19

👍

@merelcht merelcht added this to Roadmap Mar 28, 2024
@merelcht merelcht moved this to Future in Roadmap Mar 28, 2024
@merelcht merelcht moved this from Future to Near term in Roadmap Mar 28, 2024
@merelcht merelcht moved this from Near term to Backburner in Roadmap Jul 1, 2024
@astrojuanlu
Copy link
Member

It's the 2 year anniversary of this issue and this is always the least important thing to do. I think it's easier to just close this issue and whenever we spot an inconsistency, flag it individually.

@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Design: Research roadmap
Projects
No open projects
Status: Backburner
Development

No branches or pull requests

5 participants