Kedro-mlflow with kedro 0.19.11 produces multiple runs #624

gozderam · 2025-02-06T10:03:39Z

Description

Once upgraded kedro to 0.19.11, kedro-mlflow started to produce multiple runs in MLFlow.

Context

There are additional runs in MLFlow once running kedro run while only one is expected. When downgrading kedro back to 0.19.10 this problem does not occur.

Steps to Reproduce

Create a new kedro project, can be exemplary spaceflights project with kedro==0.19.11.
Add kedro-mlflow package and mlflow.yml file.
In catalog, make one of the datasets a MLFlow artifact, e.g. for spaceflights:

preprocessed_companies:
  type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
  dataset:
    type: pandas.ParquetDataset
    filepath: data/02_intermediate/preprocessed_companies.parquet

run kedro run

Expected Result

One run is produced in MLFlow.

Actual Result

MLFlow produces two runs, one with "default" name and the other one with some random name.

Your Environment

kedro and kedro-mlflow version used (pip show kedro and pip show kedro-mlflow):
kedro: 0.19.11 (this problem does not occur in 0.19.10!)
kedro-mlflow: 0.14.0
Python version used (python -V):
3.11
Operating system and version:
Mac os Sonoma 14.4

Does the bug also happen with the last version on master?

yes

The text was updated successfully, but these errors were encountered:

Galileo-Galilei · 2025-02-06T22:39:36Z

Thanks for the bug report.

The problem is indeed coming from kedro==0.19.11, I'll have a look this weekend. Several other people complain about this on slack. In the meantime, you should likely downgrade kedro. I tag @ankatiyar and @astrojuanlu for reference, because this is very likely an unexpected behavour of kedro itself.

astrojuanlu · 2025-02-07T08:28:34Z

Thanks both, we'll look into this. Unless kedro-mlflow was using a private API, this regression shouldn't have happened.

ankatiyar · 2025-02-07T18:33:19Z

I've been able to narrow down which change caused the regression and it's kedro-org/kedro#4353 (cc @merelcht)

Galileo-Galilei · 2025-02-08T14:11:03Z

I can reproduce the bug, even without the mlflow.yml and artifacts. Investigation are going on here: https://github.com/Galileo-Galilei/kedro_mlflow_624

pip install uv
uv tool install kedro
uvx kedro new -s spaceflights-pandas -n spaceflights-pandas --telemetry no
cd spaceflights-pandas
kedro run 
kedro mlflow ui

Galileo-Galilei · 2025-02-08T17:22:32Z

@ankatiyar @astrojuanlu : I can confirm the bug appears even with a very simple hook that just starts mlflow (see above repo for code and results). My best guess so far is that the before_pipeline_run hook is triggered in another thread than the one the nodes will be running (even with the sequential runner). Since mlflow is thread-safe, it cannot access this different thread (and I cannot even close it within the hook, so the run remains infinietly looping!)

merelcht · 2025-02-10T10:30:33Z

I will work on fixing this.

merelcht · 2025-02-10T14:53:43Z

@Galileo-Galilei just to double check that what I've observed is correct: the behaviour described here with the extra run being created was already always the case when using ThreadRunner and ParallelRunner?

Galileo-Galilei · 2025-02-10T14:55:45Z

Not with ThreadRunner

merelcht · 2025-02-13T11:29:36Z

Not with ThreadRunner

I've tried all versions back to 0.19.7 and it always creates two runs for ThreadRunner and ParallelRunner.

The issue here is that after my refactoring in kedro-org/kedro#4353, SequentialRunner is more similar to the other runners and we open an executor pool for managing the execution of the tasks. I dug into how mlflow and kedro-mflow work and what I also remember from working with mlflow earlier, is that it creates its own runs in turn. The run creation seems to be linked to whatever thread is running. Where before the run created in the before_pipeline_run hook was the same one triggered in before_node_run for logging the parameters, it now creates a new one because at that point the executor pool has been created.

I fiddled around a bit and managed to get this working by saving the active run ID and passing it on at the point where the parameters are logged, by calling MlflowClient().log_param(self.active_run, name, value) directly (self.active_run is something I added). Before opening a PR I wanted to check with you @Galileo-Galilei if this sounds like a reasonable approach to you? Tests will need to be adjusted.

Another solution would be to revert part of my earlier refactoring and not creating an executor pool for SequentialRunner, but then this issue with the two runs would still remain for the ThreadRunner.

Galileo-Galilei · 2025-02-13T19:36:15Z

Hum, weird. Let me look at it this weekend and I'll get back to you so we decide what's best when we fully understand the issue.

astrojuanlu · 2025-02-17T15:44:21Z

xref kedro-org/kedro#4486

… to ensure all the tracking is done within the same run (#623, #624)

…ode (#623, #624, #638) * Make mlflow not thread safe by reopening the same run before each run to ensure all the tracking is done within the same run (#623, #624) * fix test * fix test with thread runner * no cover for exception catching * 📝 Fix broken link

Galileo-Galilei · 2025-02-17T22:17:51Z

@merelcht @astrojuanlu I finally fixed it on kedro-mlflow's side because it's mostly due to the recent development to make mlflow thread-safe. Before mlflow 2.18, this wuold not have been a problem.

That said, I think it still need some work from kedro side:

It's really confusing that the "SequentialRunner" uses 2 threads and not one. It launches a new thread to run the nodes, which I found very dangerous and hard to debug. The MatplotlibWriter issue is a good warning of things that will happen in the future. I think this should be fixed on kedro's side.
that's another problem, but I've found that you can't use ParallelRunner with MlflowArtifactDataset, you get the following error:

AttributeError: The following datasets cannot be used with multiprocessing: ['model_input_table']
In order to utilize multiprocessing you need to make sure all datasets are serialisable, i.e. datasets should not make use of lambda functions, nested functions, closures etc.
If you are using custom decorators ensure they are correctly decorated using functools.wraps().

merelcht · 2025-02-18T09:30:08Z

Thanks @Galileo-Galilei, that makes sense. I agree it's confusing SequentialRunner is using 2 threads now, we briefly discussed in backlog grooming yesterday how that could be changed. We'll make sure to prioritise this asap!

github-project-automation bot added this to kedro-mlflow roadmap Feb 6, 2025

github-project-automation bot moved this to 🆕 New in kedro-mlflow roadmap Feb 6, 2025

Galileo-Galilei moved this from 🆕 New to 🔖 Ready in kedro-mlflow roadmap Feb 6, 2025

Galileo-Galilei added the bug Something isn't working label Feb 6, 2025

Galileo-Galilei mentioned this issue Feb 6, 2025

Parent run not more present in i.e. after_node_run #623

Closed

Galileo-Galilei mentioned this issue Feb 9, 2025

MlflowArtifactDataset - loading from a specific run_id after the run_id was enforced for all MlflowArtifactDatasets #622

Closed

merelcht mentioned this issue Feb 10, 2025

Fix regression in Kedro 0.19.11 causing multiple runs in kedro-mlflow kedro-org/kedro#4474

Closed

This was referenced Feb 14, 2025

Ugly crash when using MatplotlibWriter in Kedro 0.19.11 kedro-org/kedro-plugins#1012

Closed

SequentialRunner might fail with non thread-safe code kedro-org/kedro#4486

Closed

Galileo-Galilei added a commit that referenced this issue Feb 17, 2025

Make mlflow not thread safe by reopening the same run before each run…

0ebc472

… to ensure all the tracking is done within the same run (#623, #624)

Galileo-Galilei linked a pull request Feb 17, 2025 that will close this issue

Make mlflow not thread safe by reopening the same run before each node #638

Merged

6 tasks

Galileo-Galilei mentioned this issue Feb 17, 2025

Make mlflow not thread safe by reopening the same run before each node #638

Merged

6 tasks

Galileo-Galilei closed this as completed in #638 Feb 17, 2025

github-project-automation bot moved this from 🔖 Ready to ✅ Done in kedro-mlflow roadmap Feb 17, 2025

merelcht mentioned this issue Feb 20, 2025

Change runner logic to not create pool for sequential runner kedro-org/kedro#4502

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kedro-mlflow with kedro 0.19.11 produces multiple runs #624

Kedro-mlflow with kedro 0.19.11 produces multiple runs #624

gozderam commented Feb 6, 2025

Galileo-Galilei commented Feb 6, 2025 •

edited

Loading

astrojuanlu commented Feb 7, 2025

ankatiyar commented Feb 7, 2025

Galileo-Galilei commented Feb 8, 2025

Galileo-Galilei commented Feb 8, 2025 •

edited

Loading

merelcht commented Feb 10, 2025

merelcht commented Feb 10, 2025

Galileo-Galilei commented Feb 10, 2025

merelcht commented Feb 13, 2025

Galileo-Galilei commented Feb 13, 2025

astrojuanlu commented Feb 17, 2025

Galileo-Galilei commented Feb 17, 2025

merelcht commented Feb 18, 2025

Kedro-mlflow with kedro 0.19.11 produces multiple runs #624

Kedro-mlflow with kedro 0.19.11 produces multiple runs #624

Comments

gozderam commented Feb 6, 2025

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

Does the bug also happen with the last version on master?

Galileo-Galilei commented Feb 6, 2025 • edited Loading

astrojuanlu commented Feb 7, 2025

ankatiyar commented Feb 7, 2025

Galileo-Galilei commented Feb 8, 2025

Galileo-Galilei commented Feb 8, 2025 • edited Loading

merelcht commented Feb 10, 2025

merelcht commented Feb 10, 2025

Galileo-Galilei commented Feb 10, 2025

merelcht commented Feb 13, 2025

Galileo-Galilei commented Feb 13, 2025

astrojuanlu commented Feb 17, 2025

Galileo-Galilei commented Feb 17, 2025

merelcht commented Feb 18, 2025

Galileo-Galilei commented Feb 6, 2025 •

edited

Loading

Galileo-Galilei commented Feb 8, 2025 •

edited

Loading