Kedro-viz needs too much to run properly #1159

foxale · 2022-11-02T20:45:52Z

Description

So according to this #1125 kedro-viz requires

all plugins to be working (incl. mlflow connection to remote tracking server if leveraged)
all python packages to be installed

For me it just doesn't make sense. Why would I need to have the python env ready to visualize project flow? The original issue seems dead, but kedro-viz is still in some way a core kedro feature, so..

deepyaman · 2022-11-03T01:05:20Z

+1 that this is annoying. I remember in the past, when had nodes with hard-to-install dependencies (e.g. a node that called R code using rpy2, because epidemiologists don't write their libraries in Python), would comment out imports in order to get kedro-viz working locally.

What I recall:

kedro-viz is going to trawl through your pipeline/node definitions, so if you have imports at the top of files, they need to be resolvable. You could move the imports into nodes, but that's ugly.
Re kedro-mlflow, I wonder when (in the project lifecycle) is it making a call to set up the connection? Maybe it can be delayed as much as possible. I don't think kedro-viz can entirely ignore hooks, because some could modify the DAG, but at the same time just loading the kedro-mlflow plugin maybe shouldn't create a connection. Would need to look further into how this works.

merelcht · 2022-11-08T14:21:46Z

Transferred this issue to the Kedro-Viz repo, because it's not framework related.

Galileo-Galilei · 2022-11-08T21:00:48Z

Hi @foxale, Unfortunately I don't have a solution to your problem better than "turn off the mlflow hook".

I feel a bit responsible about this issue, both because I am the maintainer of the kedro-mlflow plugin, and because the root cause of this is due to a bunch of issues and suggestions I made to the core framework. I will try to explain what's going on here and the rationale behind this behaviour.

In these issues: kedro-org/kedro#506 and kedro-org/kedro#1431, there was a huge discussion about enabling hooks to access the config_loader. The pros and cons are discussed in the issues and I won't sum everything up here, but the key idea is that many plugins and hooks need to access proprietary config files or credentials.

After these discussion we decided to introduce an after_context_created hook in kedro-org/kedro#1434, and this is described in kedro-org/kedro#1458. This hook exposes the entire context to solve a lot of hook/plugins authors complaints (see the different discussions to understnad the inner details). One of the main argument was the need to be able to instantiate an external connection "once for all" at the beginning of the pipeline (e.g. for spark or mlflow).

The main advantage of creating connection after context creation is that when you are using the session interactively (in a script of a jupyter notebook), this is done automatically: you don't have to setup manually all connections.

On the other hand, the biggest drawback is that this connection is always instantiated, including when calling kedro viz (which loads a session). This was identified and discussed in this comment : kedro-org/kedro#506 (comment). @AntonyMilneQB and I concluded that this is a very uncommon scenario (it means that you run kedro-mlflow on a machine which cannot instantiate the mlflow connection: by definition, this can't be the machine where you launch kedro run because if you installed kedro-mlflow it was exactly because you wanted to have an mlflow connection. Since kedro-viz is a package that is more likely to be used in a development rather a production environment and considering the advantages, we did not investigate further.

Maybe @noklam or @AntonyMilneQB of the core team can add furhter help, and eventually suggest a solution ?

noklam · 2022-11-09T16:48:22Z

all plugins to be working (incl. mlflow connection to remote tracking server if leveraged)
all python packages to be installed

@foxale Thank you for your question. The answer to that is pretty much covered. tl;dr

Viz needs to load up the pipeline/nodes object, so the files would be loaded and the dependencies need to be installed. I don't see any way to get around it since you need to execute the code to get the Python representation of the pipeline.
In most case - plugins aren't relevant, but as @deepyaman mentioned, the hook can actually change the DAG, so viz need to execute that. Otherwise, we can skip all the hooks and it would avoid the mlflow connection problem.

foxale · 2022-11-09T23:27:52Z

Well, it still looks like a fixable problem on the kedro-viz side. One solution would be to add a parameter to kedro viz that would skip all import errors and after_catalog_created hooks and render the "raw" DAG.

tynandebold · 2022-11-10T12:24:09Z

@foxale would you feel comfortable opening a PR that tries to solve this problem? We'd certainly appreciate it, as we welcome any and all contributions from the community!

noklam · 2022-11-10T14:14:39Z

This is certainly doable if the team thinks we should do this. It's not the most elegant solution you can find, but I think simply

do something like

if SOME_FLAG:
    session._hook_manager = _NullPluginManager()
# Before this line
context = session.load_context()

This will remove all the hooks and should avoid this problem.

kedro-viz/package/kedro_viz/integrations/kedro/data_loader.py

Lines 72 to 91 in cc11edb

    
           with KedroSession.create( 
        
               project_path=project_path, 
        
               env=env, 
        
               save_on_close=False, 
        
               extra_params=extra_params, 
        
           ) as session: 
        
               context = session.load_context() 
        
               session_store = session._store 
        
               session_store_location = None 
        
               if isinstance(session_store, SQLiteStore): 
        
                   session_store_location = session_store.location 
        
               catalog = context.catalog 
        
               # Pipelines is a lazy dict-like object, so we force it to populate here 
        
               # in case user doesn't have an active session down the line when it's first accessed. 
        
               # Useful for users who have `get_current_session` in their `register_pipelines()`. 
        
               pipelines_dict = dict(pipelines) 
        
           return catalog, pipelines_dict, session_store_location

astrojuanlu · 2023-08-16T16:49:48Z

Potential solution for this: #1459 (comment)

rashidakanchwala · 2024-03-11T13:20:58Z

Reopening this issue, as there's some issues around working with Spark that --ignore--plugins does no resolve

ravi-kumar-pilla · 2024-08-23T21:24:20Z

Hi @foxale , We implemented a possible solution here which mocks missing dependencies. This will be merged in the upcoming Kedro-Viz release.

I know it is a long overdue ticket, but it is finally ready :D . I hope this will resolve the dependency problems. Try running kedro viz --lite next time. Cheers !

ravi-kumar-pilla · 2024-10-14T17:59:53Z

Hi @foxale , Could you please let us know if you have used kedro viz --lite and was it helpful. Thank you

rashidakanchwala · 2024-10-23T15:56:24Z

Closing this issue as we have some solutions to run kedro-viz easily such as kedro viz --lite

foxale changed the title ~~<Title>~~ Kedro-viz needs too much to run properly Nov 2, 2022

merelcht transferred this issue from kedro-org/kedro Nov 8, 2022

tynandebold mentioned this issue Nov 9, 2022

Why kedro-viz needs connection to mlflow to open? #1125

Closed

tynandebold added the Enhancement label Feb 13, 2023

Galileo-Galilei mentioned this issue Jun 14, 2023

[KED-2143] Adding a ConfigLoader instance into hook specs params kedro-org/kedro#506

Closed

noklam mentioned this issue Jul 31, 2023

perf(datasets): lazily load datasets in init files kedro-org/kedro-plugins#277

Merged

4 tasks

tynandebold added the Type: Needs Investigation label Jul 31, 2023

ravi-kumar-pilla self-assigned this Sep 11, 2023

tynandebold added the Community label Sep 18, 2023

ravi-kumar-pilla mentioned this issue Sep 29, 2023

Skip optional plugins while running Kedro Viz #1544

Merged

5 tasks

tynandebold closed this as completed in #1544 Oct 16, 2023

rashidakanchwala added this to the Lightweight Kedro-Viz milestone Feb 29, 2024

rashidakanchwala reopened this Feb 29, 2024

NeroOkwa mentioned this issue Mar 4, 2024

Investigate the minimum requirements for Kedro-Viz to run #1783

Closed

ravi-kumar-pilla mentioned this issue May 9, 2024

Introduce --include-hooks option and remove --ignore-plugins in Kedro-Viz #1818

Merged

5 tasks

rashidakanchwala closed this as completed Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kedro-viz needs too much to run properly #1159

Kedro-viz needs too much to run properly #1159

foxale commented Nov 2, 2022

deepyaman commented Nov 3, 2022

merelcht commented Nov 8, 2022

Galileo-Galilei commented Nov 8, 2022

noklam commented Nov 9, 2022 •

edited

Loading

foxale commented Nov 9, 2022 •

edited

Loading

tynandebold commented Nov 10, 2022

noklam commented Nov 10, 2022

astrojuanlu commented Aug 16, 2023

rashidakanchwala commented Mar 11, 2024

ravi-kumar-pilla commented Aug 23, 2024

ravi-kumar-pilla commented Oct 14, 2024

rashidakanchwala commented Oct 23, 2024

Kedro-viz needs too much to run properly #1159

Kedro-viz needs too much to run properly #1159

Comments

foxale commented Nov 2, 2022

Description

deepyaman commented Nov 3, 2022

merelcht commented Nov 8, 2022

Galileo-Galilei commented Nov 8, 2022

noklam commented Nov 9, 2022 • edited Loading

foxale commented Nov 9, 2022 • edited Loading

tynandebold commented Nov 10, 2022

noklam commented Nov 10, 2022

astrojuanlu commented Aug 16, 2023

rashidakanchwala commented Mar 11, 2024

ravi-kumar-pilla commented Aug 23, 2024

ravi-kumar-pilla commented Oct 14, 2024

rashidakanchwala commented Oct 23, 2024

noklam commented Nov 9, 2022 •

edited

Loading

foxale commented Nov 9, 2022 •

edited

Loading