Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Running kedro-mlflow projects with an MLflow orchestrator #358

Closed
takikadiri opened this issue Sep 13, 2022 · 3 comments · Fixed by #359
Closed

Allow Running kedro-mlflow projects with an MLflow orchestrator #358

takikadiri opened this issue Sep 13, 2022 · 3 comments · Fixed by #359
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@takikadiri
Copy link
Collaborator

takikadiri commented Sep 13, 2022

Description

kedro projects are mostly executed with kedro and kedro-mlflow is responsible in starting a new MLflow run/session with a given configs.
There are some scenarios where the kedro project could be executed with some sort of orchestrators, such as MLflow project, or an Airflow pipeline. Theses orchestrators can start themeseleves an MLflow RUN to take the control of the overall session. for example :

  • MLflow project that start an MLflow RUN where it put all the execution context before running the kedro project
  • An airflow Job that Start Run, execute kedro project, then get the resuts from the RUN to register or deploy the model

Context

We want to use MLflow project so we can run the kedro project from remote repo (for reproductibility) and fit the python environnement alongside with the fitted model (for accurate code dependencies)

This feature can also enable the integration of kedro-mlflow with more upstream tools

Possible Implementation

Maybe we can check here if mlflow have already an active RUN, if it's the case, we can use it when starting the kedro-mlflow run

@Galileo-Galilei Galileo-Galilei self-assigned this Sep 13, 2022
@Galileo-Galilei Galileo-Galilei added the enhancement New feature or request label Sep 13, 2022
@Galileo-Galilei Galileo-Galilei added this to the 0.11.4 milestone Sep 13, 2022
@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Sep 20, 2022

In such a situation, what is the expected behaviour at the end of the pipeline? Do we expect the run to be closed? The other problem is that if mlflow is not properly configured by the orchestrator, the active run may be located in another tracking_uri than the one specified in the configuration, hence raising a mlflow.exceptions.MlflowException: Run 'xxx' not found error.

The easiest way to inject behaviour would be to pass the tracking.run.id to the configuration, but it requires the orchestrator modifying the config...

@Galileo-Galilei
Copy link
Owner

So the final decision is:

  • if an active mlflow run exists, we ignore all configuration in mlflow.yml and uses the configuration from environment
  • the pipeline logs in this active run
  • the mlflow run is NOT closed at the end of the kedro run

@takikadiri
Copy link
Collaborator Author

That looks good to me. It makes sense to delegate the entire session to the entity that created the run in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants