Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight Kedro Viz Experimentation using AST #1966

Merged
merged 63 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
0e7f24d
merge main from remote
ravi-kumar-pilla Apr 25, 2024
c1aae75
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Apr 26, 2024
177ccbc
merging remote
ravi-kumar-pilla May 1, 2024
8ecf9bf
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 2, 2024
37f3bf4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 8, 2024
499d8c4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 14, 2024
b3ab479
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 16, 2024
e295e92
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 20, 2024
905b198
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 21, 2024
490a89f
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 30, 2024
c1a099b
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 31, 2024
573e3c0
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 10, 2024
5a12c65
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 13, 2024
960c113
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 18, 2024
49c05b1
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 21, 2024
354e024
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 21, 2024
60e2f27
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 26, 2024
52c2060
partially working parser - WIP
ravi-kumar-pilla Jun 27, 2024
cfd99a7
partial working commit
ravi-kumar-pilla Jun 29, 2024
de4a4ef
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 3, 2024
7125927
testing show code
ravi-kumar-pilla Jul 3, 2024
bff5a4c
adjust file permissions
ravi-kumar-pilla Jul 3, 2024
3038afd
update comments and rename parser file
ravi-kumar-pilla Jul 3, 2024
0e91504
remove gitignore
ravi-kumar-pilla Jul 3, 2024
a4b3b1a
handle func lambda case
ravi-kumar-pilla Jul 3, 2024
0a80f6c
mocking working draft proposal
ravi-kumar-pilla Jul 12, 2024
e31242f
reuse session with mock modules
ravi-kumar-pilla Jul 15, 2024
8b8e337
wip integration tests
ravi-kumar-pilla Jul 17, 2024
8e0ae73
sporadic working needs testing
ravi-kumar-pilla Jul 18, 2024
38782e3
update sys modules with patch
ravi-kumar-pilla Jul 18, 2024
1fc1faf
fix lint and pytests
ravi-kumar-pilla Jul 18, 2024
98361e3
add dataset factories test
ravi-kumar-pilla Jul 22, 2024
e120ccc
add e2e test
ravi-kumar-pilla Jul 22, 2024
a711cf0
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 22, 2024
b7a1862
fix CI
ravi-kumar-pilla Jul 22, 2024
c5a6f2a
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 22, 2024
06e35bf
dataset factory pattern support in lite mode
ravi-kumar-pilla Jul 23, 2024
78cd413
add doc strings
ravi-kumar-pilla Jul 23, 2024
f2dda93
add e2e test and clear unused func
ravi-kumar-pilla Jul 24, 2024
bfe069f
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 24, 2024
35f1ed5
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Jul 24, 2024
1cffd8a
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Jul 25, 2024
fc8f7e4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 30, 2024
c31fbda
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Aug 9, 2024
bc4aea2
testing relative to absolute imports
ravi-kumar-pilla Aug 13, 2024
60f9cd3
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 16, 2024
8162147
testing relative imports
ravi-kumar-pilla Aug 16, 2024
840cb9f
working draft for relative imports multi-level
ravi-kumar-pilla Aug 17, 2024
76e3c2b
remove resolving relative dependencies
ravi-kumar-pilla Aug 19, 2024
2d18e9a
test
ravi-kumar-pilla Aug 19, 2024
16e1ef5
working draft
ravi-kumar-pilla Aug 19, 2024
8c6d878
modify test and standalone support for lite
ravi-kumar-pilla Aug 19, 2024
f9de2fe
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 19, 2024
db1b416
improve readability
ravi-kumar-pilla Aug 20, 2024
fe09d20
fix lint and pytest
ravi-kumar-pilla Aug 20, 2024
fefafa6
revert link redirect
ravi-kumar-pilla Aug 21, 2024
ae94f1e
remove side effects
ravi-kumar-pilla Aug 21, 2024
57ea66a
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 22, 2024
45da624
pr suggestions addressed
ravi-kumar-pilla Aug 22, 2024
bcdd304
fix dict issue
ravi-kumar-pilla Aug 22, 2024
f4cd1dd
merge main
ravi-kumar-pilla Aug 22, 2024
050bff2
moved package check under dirs and add exception block
ravi-kumar-pilla Aug 22, 2024
63b9fd3
merge main
ravi-kumar-pilla Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 31 additions & 10 deletions package/kedro_viz/integrations/kedro/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import logging
import sys
from pathlib import Path
from typing import Any, Dict, Optional, Tuple
from typing import Any, Dict, Optional, Set, Tuple
from unittest.mock import patch

from kedro import __version__
Expand Down Expand Up @@ -80,7 +80,22 @@ def _load_data_helper(
extra_params: Optional[Dict[str, Any]] = None,
is_lite: bool = False,
):
"""Helper to load data from a Kedro project."""
"""Helper to load data from a Kedro project.

Args:
project_path: the path where the Kedro project is located.
env: the Kedro environment to load the data. If not provided.
it will use Kedro default, which is local.
include_hooks: A flag to include all registered hooks in your Kedro Project.
extra_params: Optional dictionary containing extra project parameters
for underlying KedroContext. If specified, will update (and therefore
take precedence over) the parameters retrieved from the project
configuration.
is_lite: A flag to run Kedro-Viz in lite mode.
Returns:
A tuple containing the data catalog, pipeline dictionary, session store
and dataset stats dictionary.
"""

with KedroSession.create(
project_path=project_path,
Expand Down Expand Up @@ -132,8 +147,8 @@ def load_data(
configuration.
is_lite: A flag to run Kedro-Viz in lite mode.
Returns:
A tuple containing the data catalog and the pipeline dictionary
and the session store.
A tuple containing the data catalog, pipeline dictionary, session store
and dataset stats dictionary.
"""
if package_name:
configure_project(package_name)
Expand All @@ -142,10 +157,19 @@ def load_data(
bootstrap_project(project_path)

if is_lite:
lite_parser = LiteParser(project_path, package_name)
mocked_modules = lite_parser.get_mocked_modules()
lite_parser = LiteParser(package_name)
unresolved_imports = lite_parser.parse(project_path)
sys_modules_patch = sys.modules.copy()

if unresolved_imports and len(unresolved_imports) > 0:
modules_to_mock: Set[str] = set()

for unresolved_module_set in unresolved_imports.values():
modules_to_mock = modules_to_mock.union(unresolved_module_set)

mocked_modules = lite_parser.create_mock_modules(modules_to_mock)
sys_modules_patch.update(mocked_modules)

if len(mocked_modules):
logger.warning(
"Kedro-Viz has mocked the following dependencies for lite-mode.\n"
"%s \n"
Expand All @@ -154,9 +178,6 @@ def load_data(
list(mocked_modules.keys()),
)

sys_modules_patch = sys.modules.copy()
sys_modules_patch.update(mocked_modules)

# Patch actual sys modules
with patch.dict("sys.modules", sys_modules_patch):
return _load_data_helper(
Expand Down
194 changes: 130 additions & 64 deletions package/kedro_viz/integrations/kedro/lite_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import importlib.util
import logging
from pathlib import Path
from typing import Dict, Union
from typing import Dict, List, Set, Union
from unittest.mock import MagicMock

logger = logging.getLogger(__name__)
Expand All @@ -14,16 +14,11 @@ class LiteParser:
"""Represents a Kedro Parser which uses AST

Args:
project_path (Path): the path where the Kedro project is located.
package_name (Union[str, None]): The name of the current package
"""

def __init__(
self, project_path: Path, package_name: Union[str, None] = None
) -> None:
self._project_path = project_path
def __init__(self, package_name: Union[str, None] = None) -> None:
self._package_name = package_name
self._project_file_paths = set(self._project_path.rglob("*.py"))

@staticmethod
def _is_module_importable(module_name: str) -> bool:
Expand Down Expand Up @@ -60,75 +55,114 @@ def _is_module_importable(module_name: str) -> bool:
)
return False

def _is_relative_import(self, module_name: str):
@staticmethod
def _get_module_parts(module_name: str) -> List[str]:
"""Creates a list of module parts to check for importability

Args:
module_name (str): The module name to split

Returns:
A list of module parts

Example:
>>> LiteParser._get_module_parts("kedro.framework.project")
["kedro", "kedro.framework", "kedro.framework.project"]

"""
module_split = module_name.split(".")
full_module_name = ""
module_parts = []

for idx, sub_module_name in enumerate(module_split):
full_module_name = (
sub_module_name if idx == 0 else f"{full_module_name}.{sub_module_name}"
)
module_parts.append(full_module_name)

return module_parts

def _is_relative_import(self, module_name: str, project_file_paths: Set[Path]):
"""Checks if a module is a relative import. This is needed
in dev or standalone mode when the package_name is None and
internal package files have unresolved external dependencies

Args:
module_name (str): The name of the module to check
importability

Example:
>>> lite_parser_obj = LiteParser("path/to/kedro/project")
>>> module_name = "kedro_project_package.pipelines.reporting.nodes"
>>> lite_parser_obj._is_relative_import(module_name)
True
project_file_paths (Set[Path]): A set of project file paths

Returns:
Whether the module is a relative import starting
from the root package dir

Example:
>>> lite_parser_obj = LiteParser()
>>> module_name = "kedro_project_package.pipelines.reporting.nodes"
>>> project_file_paths = set([Path("/path/to/relative/file")])
>>> lite_parser_obj._is_relative_import(module_name, project_file_paths)
True
"""
relative_module_path = module_name.replace(".", "/")

# Check if the relative_module_path
# is a substring of current project file path
is_relative_import_path = any(
relative_module_path in str(project_file_path)
for project_file_path in self._project_file_paths
for project_file_path in project_file_paths
)

return is_relative_import_path

def _create_mock_imports(
self, module_name: str, mocked_modules: Dict[str, MagicMock]
def _populate_missing_dependencies(
self, module_name: str, missing_dependencies: Set[str]
) -> None:
"""Creates mock modules for unresolvable imports and adds them to the
dictionary of mocked_modules
"""Helper to populate missing dependencies

Args:
module_name (str): The module name to be mocked
mocked_modules (Dict[str, MagicMock]): A dictionary of mocked imports
module_name (str): The module name to check if it is importable
missing_dependencies (Set[str]): A set of missing dependencies

"""
module_parts = module_name.split(".")
full_module_name = ""

# Try to import each sub-module starting from the root module
# Example: module_name = sklearn.linear_model
# We will try to find spec for sklearn, sklearn.linear_model
for idx, sub_module_name in enumerate(module_parts):
full_module_name = (
sub_module_name if idx == 0 else f"{full_module_name}.{sub_module_name}"
)
module_name_parts = self._get_module_parts(module_name)
for module_name_part in module_name_parts:
if (
not self._is_module_importable(full_module_name)
and full_module_name not in mocked_modules
not self._is_module_importable(module_name_part)
and module_name_part not in missing_dependencies
):
mocked_modules[full_module_name] = MagicMock()
missing_dependencies.add(module_name_part)

def _populate_mocked_modules(
self,
parsed_content_ast_node: ast.Module,
mocked_modules: Dict[str, MagicMock],
) -> None:
"""Populate mocked_modules with missing external dependencies
def _get_unresolved_imports(
self, file_path: Path, project_file_paths: Union[Set[Path], None] = None
) -> Set[str]:
"""Parse the file using AST and return any missing dependencies
in the current file

Args:
parsed_content_ast_node (ast.Module): The AST node to
extract import statements
mocked_modules (Dict[str, MagicMock]): A dictionary of mocked imports
file_path (Path): The file path to parse
project_file_paths Union[Set[Path], None]: A set of project file paths

Returns:
A set of missing dependencies
"""

missing_dependencies: Set[str] = set()

# Read the file
with open(file_path, "r", encoding="utf-8") as file:
file_content = file.read()

# parse file content using ast
parsed_content_ast_node: ast.Module = ast.parse(file_content)
file_path = file_path.resolve()

# Ensure the package name is in the file path
if self._package_name and self._package_name not in file_path.parts:
# we are only mocking the dependencies
# inside the package
return missing_dependencies

# Explore each node in the AST tree
for node in ast.walk(parsed_content_ast_node):
# Handling dependencies that starts with "import "
# Example: import logging
Expand All @@ -137,7 +171,9 @@ def _populate_mocked_modules(
if isinstance(node, ast.Import):
for alias in node.names:
module_name = alias.name
self._create_mock_imports(module_name, mocked_modules)
self._populate_missing_dependencies(
module_name, missing_dependencies
)

# Handling dependencies that starts with "from "
# Example: from typing import Dict, Union
Expand All @@ -160,7 +196,8 @@ def _populate_mocked_modules(
if (self._package_name and self._package_name in module_name) or (
# dev or standalone mode
not self._package_name
and self._is_relative_import(module_name)
and project_file_paths
and self._is_relative_import(module_name, project_file_paths)
):
continue

Expand All @@ -169,31 +206,60 @@ def _populate_mocked_modules(
# from typing import Dict, Union
# from sklearn.linear_model import LinearRegression
if level == 0:
self._create_mock_imports(module_name, mocked_modules)
self._populate_missing_dependencies(
module_name, missing_dependencies
)

def get_mocked_modules(self) -> Dict[str, MagicMock]:
"""Returns mocked modules for all the dependency errors
as a dictionary for each file in your Kedro project
return missing_dependencies

def create_mock_modules(self, unresolved_imports: Set[str]) -> Dict[str, MagicMock]:
"""Creates mock modules for unresolved imports

Args:
unresolved_imports (Set[str]): A set of unresolved imports

Returns:
A dictionary of mocked modules for the unresolved imports
"""
mocked_modules: Dict[str, MagicMock] = {}

for file_path in self._project_file_paths:
with open(file_path, "r", encoding="utf-8") as file:
file_content = file.read()
for unresolved_import in unresolved_imports:
mocked_modules[unresolved_import] = MagicMock()

# parse file content using ast
parsed_content_ast_node: ast.Module = ast.parse(file_content)
file_path = file_path.resolve()
return mocked_modules

# Ensure the package name is in the file path
if self._package_name and self._package_name not in file_path.parts:
# we are only mocking the dependencies
# inside the package
continue
def parse(self, target_path: Path) -> Union[Dict[str, Set[str]], None]:
"""Parses the file(s) in the target path and returns
any unresolved imports for all the dependency errors
as a dictionary of file(s) in the target path and a set of module names

self._populate_mocked_modules(
parsed_content_ast_node,
mocked_modules,
Args:
target_path (Path): The path to parse file(s)

Returns:
A dictionary of file(s) in the target path and a set of module names
"""

if not target_path.exists():
logger.warning("Path `%s` does not exist", str(target_path))
return None

unresolved_imports: Dict[str, Set[str]] = {}

if target_path.is_file():
missing_dependencies = self._get_unresolved_imports(target_path)
if len(missing_dependencies) > 0:
unresolved_imports[str(target_path)] = missing_dependencies
return unresolved_imports

# handling directories
_project_file_paths = set(target_path.rglob("*.py"))

for file_path in _project_file_paths:
missing_dependencies = self._get_unresolved_imports(
file_path, _project_file_paths
)
if len(missing_dependencies) > 0:
unresolved_imports[str(file_path)] = missing_dependencies

return mocked_modules
return unresolved_imports
Loading
Loading