Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX #62 - Migrate to 0.16.5 #94

Merged
merged 1 commit into from
Oct 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ ignore = E203, E266, E501, W503
max-line-length = 88
max-complexity = 18
select = B,C,E,F,W,T4,B9
exclude = kedro_mlflow/template/project/run.py
per-file-ignores = **/__init__.py:F401
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ line_length=88
ensure_newline_before_comments=True
sections=FUTURE,STDLIB,THIRDPARTY,FIRSTPARTY,LOCALFOLDER
known_first_party=kedro_mlflow
known_third_party=black,click,cookiecutter,flake8,isort,jinja2,kedro,mlflow,pandas,pytest,pytest_lazyfixture,setuptools,yaml
known_third_party=anyconfig,click,cookiecutter,jinja2,kedro,mlflow,packaging,pandas,pytest,pytest_lazyfixture,setuptools,yaml
9 changes: 7 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

### Added

-
- `kedro-mlflow` now supports kedro 0.16.5 (#62)
- `kedro-mlflow` hooks can now be declared in `.kedro.yml` or `pyproject.toml` by adding `kedro_mlflow.framework.hooks.mlflow_pipeline_hook` and `kedro_mlflow.framework.hooks.mlflow_node_hook` into the hooks entry. _Only for kedro>=0.16.5_

### Fixed

Expand All @@ -16,7 +17,11 @@

### Changed

- `MlflowNodeHook` have now a before_pipeline_run hook which stores the ProjectContext and enable to retrieve configuration.
- `MlflowNodeHook` now has a before_pipeline_run hook which stores the ProjectContext and enable to retrieve configuration.

### Removed

`kedro mlflow init` command is no longer declaring hooks in `run.py`. You must now [register your hooks manually](docs/source/03_tutorial/02_setup.md#declaring-kedro-mlflow-hooks) in the ``run.py`` (kedro > 0.16.0), ``.kedro.yml`` (kedro >= 0.16.5) or ``pyproject.toml`` (kedro >= 0.16.5)

## [0.3.0] - 2020-10-11

Expand Down
1 change: 0 additions & 1 deletion docs/source/02_hello_world_example/02_first_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ kedro mlflow init
You have the following message:
```console
'conf/base/mlflow.yml' successfully updated.
'run.py' successfully updated
```

The ``conf/base`` folder is updated:
Expand Down
91 changes: 42 additions & 49 deletions docs/source/03_tutorial/02_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,13 @@ This plugins must be used in an existing kedro project. If you do not have a ked

For this tutorial and if you do not have a real-world project, I strongly suggest that you accept to include the proposed example to make a demo of this plugin out of the box.

## Update the template of your kedro project
In order to use the ``kedro-mlflow`` plugin, you need to perform 2 actions:
1. Create an ``mlflow.yml`` file for [configuring mlflow in a dedicated file](../05_python_objects/05_Configuration.md).
2. Update the ``src/PYTHON_PACKAGE/run.py`` to add the [necessary hooks](../05_python_objects/02_Hooks.md) to the project context. The ``MlflowPipelineHook`` manages the configuration and registers the PipelineML, while the ``MlflowNodeHook`` autolog the parameters.
## Activate `kedro-mlflow` in your kedro project
In order to use the ``kedro-mlflow`` plugin, you need to set up the its configuration and declare its hooks. those 2 actions are detailled in the following paragraph.

## Automatic template update (recommended)
### Default situation
The first and recommended possibility to setup this context is to use a [dedicated command line](../05_python_objects/04_CLI.md) offered by the plugin.
Position yourself with at the root (i.e. the folder with the ``.kedro.yml`` file)
### Setting up the kedro-mlflow configuration file
``kedro-mlflow`` is [configured](../05_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the kedro-mlflow CLI](../05_python_objects/04_CLI.md).

Set the working directory at the root of your kedro project (i.e. the folder with the ``.kedro.yml`` file)

```console
$ cd path/to/your/project
Expand All @@ -44,48 +42,21 @@ Run the init command :
```console
$ kedro mlflow init
```

*Note : If the warning ``"You have not updated your template yet. This is mandatory to use 'kedro-mlflow' plugin. Please run the following command before you can access to other commands : '$ kedro mlflow init'`` is raised, this is a bug to be corrected and you can safely ignore it.*
If you have never modified your ``run.py`` manually, it should run smoothly and you should get the following message:
you should see the following message:
```console
'conf/base/mlflow.yml' successfully updated.
'run.py' successfully updated
```

### Special case: what happens if you have a custom ``run.py`` ?

You may have modified the ``run.py`` manually since the creation of the project. This may happen in the following situations:
- you have added ``hooks`` (of another plugin for instance)
- you have modified the ``ConfigLoader``, for instance to us a ``TemplatedConfigLoader`` to make your configuration dynamic and link the files with one another
- you have modified the ``get_pipelines`` functions to implement specific logic
-...
These are advanced features of ``Kedro`` and it if you have made such modifications they are very likely conscious; however some other plugins may have modified this file without any warning.

Whatever the reason is, if you ``run.py`` was modified since the project creation, the [previous process](#default-situation) will return the following warning message:
```console
You have modified your 'run.py' since project creation.
In order to use kedro-mlflow, you must either:
- set up your run.py with the following instructions :
INSERT_DOC_URL
- call the following command:
$ kedro mlflow init --force
```
In this situation, the ``mlflow.yml`` is still created, but the ``run.py`` is left unchanged to avoid messing up with your own changes. You can still erase your ``run.py`` and replace it with the one of the plugin with below command.

```console
kedro mlflow init --force
```
**USE AT YOUR OWN RISK: This will erase definitely all the modifications you made to your own ``run.py`` with no possible recovery.** In consequence, this is not the recommended way to setup the project if you have a custom ``run.py``. The best way to continue the setup is to [set up the hooks manually](#manual-update).
### Declaring kedro-mlflow hooks

## Manual update
``kedro_mlflow`` hooks implementations must be registered with Kedro. There are three ways of registring [hooks](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html?highlight=hooks).

The ``MlflowPipelineHook`` and ``MlflowNodeHook`` hooks need to be registered in the the ``run.py`` file. The kedro documenation explain sinde tail [how to register a hook](https://kedro.readthedocs.io/en/latest/04_user_guide/15_hooks.html#registering-your-hook-implementations-with-kedro).
#### - Declaring hooks through code, in ``ProjectContext``

Your run.py should look like the following code snippet :
By declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``(src/package_name/run.py) ProjectContext``:

```python
from kedro_mlflow.framework.hooks import MlflowNodeHook, MlflowPipelineHook
from <python_package>.pipeline import create_pipelines
from kedro_mlflow.framework.hooks import mlflow_pipeline_hook, mlflow_node_hook

class ProjectContext(KedroContext):
"""Users can override the remaining methods from the parent class here,
Expand All @@ -95,13 +66,35 @@ class ProjectContext(KedroContext):
project_name = "<project-name>"
project_version = "0.16.X" # must be >=0.16.0
hooks = (
MlflowNodeHook(flatten_dict_params=False),
MlflowPipelineHook(model_name="<python_package>",
conda_env="src/requirements.txt")
) # <-- the new lines to add
mlflow_pipeline_hook,
mlflow_node_hook
)
```
#### - Declaring hooks through static configuration in `.kedro.yml` or `pyproject.toml` **[Only for kedro >= 0.16.5]**

By declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``.kedro.yml`` :

```
context_path: km_example.run.ProjectContext
project_name: "km_example"
project_version: "0.16.5"
package_name: "km_example"
hooks:
- km_example.hooks.project_hooks
- kedro_mlflow.framework.hooks.mlflow_pipeline_hook
- kedro_mlflow.framework.hooks.mlflow_node_hook
```

Or by declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``pyproject.toml`` :

```
# <your_project>/pyproject.toml
[tool.kedro]
hooks=["kedro_mlflow.framework.hooks.mlflow_pipeline_hook",
"kedro_mlflow.framework.hooks.mlflow_node_hook"]
```

#### - Declaring hooks through auto-discovery **[Coming soon]**


Pay attention to the following elements:
- if you have other hooks (custom, from other plugins...), you can just add them to the hooks tuple
- you **must register both hooks** for the plugin to work
- the hooks are highly parametrizable, you can find a [detailed description of their parameters here](../05_python_objects/02_Hooks.md).
**Note that you must register both hooks for the plugin to work**
Binary file modified docs/source/imgs/initialized_project.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 12 additions & 68 deletions kedro_mlflow/framework/cli/cli.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
import os
import subprocess
from pathlib import Path

import click
from kedro import __file__ as KEDRO_PATH
from kedro import __version__ as kedro_version
from kedro.framework.context import load_context
from packaging import version

from kedro_mlflow.framework.cli.cli_utils import (
render_jinja_template,
write_jinja_template,
)
from kedro_mlflow.framework.cli.cli_utils import write_jinja_template
from kedro_mlflow.framework.context import get_mlflow_config
from kedro_mlflow.utils import _already_updated, _get_project_globals, _is_kedro_project
from kedro_mlflow.utils import _already_updated, _is_kedro_project

try:
from kedro.framework.context import get_static_project_data
except ImportError: # pragma: no cover
from kedro_mlflow.utils import _get_project_globals as get_static_project_data # pragma: no cover


TEMPLATE_FOLDER_PATH = Path(__file__).parent.parent.parent / "template" / "project"

Expand Down Expand Up @@ -88,7 +91,7 @@ def init(force, silent):

# get constants
project_path = Path().cwd()
project_globals = _get_project_globals()
project_globals = get_static_project_data(project_path)
context = load_context(project_path)
conf_root = context.CONF_ROOT

Expand All @@ -99,73 +102,14 @@ def init(force, silent):
src=TEMPLATE_FOLDER_PATH / mlflow_yml,
is_cookiecutter=False,
dst=project_path / conf_root / "base" / mlflow_yml,
python_package=project_globals["python_package"],
python_package=project_globals["package_name"],
)
if not silent:
click.secho(
click.style(
f"'{conf_root}/base/mlflow.yml' successfully updated.", fg="green"
)
)
# make a check whether the project run.py is strictly identical to the template
# if yes, replace the script by the template silently
# if no, raise a warning and send a message to INSERT_DOC_URL
flag_erase_runpy = force
runpy_project_path = (
project_path
/ "src"
/ (Path(project_globals["context_path"]).parent.as_posix() + ".py")
)
if not force:
kedro_path = Path(KEDRO_PATH).parent
runpy_template_path = (
kedro_path
/ "templates"
/ "project"
/ "{{ cookiecutter.repo_name }}"
/ "src"
/ "{{ cookiecutter.python_package }}"
/ "run.py"
)
kedro_runpy_template = render_jinja_template(
src=runpy_template_path,
is_cookiecutter=True,
python_package=project_globals["python_package"],
project_name=project_globals["project_name"],
kedro_version=project_globals["kedro_version"],
)

with open(runpy_project_path, mode="r") as file_handler:
kedro_runpy_project = file_handler.read()

# beware : black formatting could change slightly this test which is very strict
if kedro_runpy_project == kedro_runpy_template:
flag_erase_runpy = True

if flag_erase_runpy:
os.remove(runpy_project_path)
write_jinja_template(
src=TEMPLATE_FOLDER_PATH / "run.py",
dst=runpy_project_path,
is_cookiecutter=True,
python_package=project_globals["python_package"],
project_name=project_globals["project_name"],
kedro_version=project_globals["kedro_version"],
)
if not silent:
click.secho(click.style("'run.py' successfully updated", fg="green"))
else:
click.secho(
click.style(
"You have modified your 'run.py' since project creation.\n"
+ "In order to use kedro-mlflow, you must either:\n"
+ " - set up your run.py with the following instructions :\n"
+ "INSERT_DOC_URL\n"
+ " - call the following command:\n"
+ "$ kedro mlflow init --force",
fg="yellow",
)
)


@mlflow_commands.command()
Expand Down
4 changes: 2 additions & 2 deletions kedro_mlflow/framework/hooks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
from .node_hook import MlflowNodeHook
from .pipeline_hook import MlflowPipelineHook
from .node_hook import MlflowNodeHook, mlflow_node_hook
from .pipeline_hook import MlflowPipelineHook, mlflow_pipeline_hook
3 changes: 3 additions & 0 deletions kedro_mlflow/framework/hooks/node_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ def before_node_run(
mlflow.log_params(params_inputs)


mlflow_node_hook = MlflowNodeHook()


def flatten_dict(d, recursive: bool = True, sep="."):
def expand(key, value):
if isinstance(value, dict):
Expand Down
3 changes: 3 additions & 0 deletions kedro_mlflow/framework/hooks/pipeline_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ def on_pipeline_error(
mlflow.end_run()


mlflow_pipeline_hook = MlflowPipelineHook()


def _generate_kedro_command(
tags, node_names, from_nodes, to_nodes, from_inputs, load_versions, pipeline_name
):
Expand Down
69 changes: 0 additions & 69 deletions kedro_mlflow/template/project/run.py

This file was deleted.

Loading