Skip to content

Commit 3de5710

Browse files
deepyamanlvijnck
authored andcommitted
Document distribution of Kedro pipelines with Dask (kedro-org#1248)
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
1 parent 8fc667e commit 3de5710

10 files changed

+342
-19
lines changed

RELEASE.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Release 0.17.8
22

33
## Major features and improvements
4+
* Documented distribution of Kedro pipelines with Dask.
45

5-
* Added option to `SparkDataSet` to specify a `schema` load argument that allows for supplying a user-defined schema as opposed to relying on the schema inference of Spark.
6+
## Bug fixes and other changes
67

7-
## Thanks for supporting contributions
8-
[Laurens Vijnck](https://github.com/lvijnck)
8+
## Upcoming deprecations for Kedro 0.18.0
99

1010
# Release 0.17.7
1111

@@ -24,7 +24,6 @@
2424
* Added `astro-iris` as alias for `astro-airlow-iris`, so that old tutorials can still be followed.
2525
* Added details about [Kedro's Technical Steering Committee and governance model](https://kedro.readthedocs.io/en/0.17.7/14_contribution/technical_steering_committee.html).
2626

27-
2827
## Upcoming deprecations for Kedro 0.18.0
2928
* `kedro pipeline pull` and `kedro pipeline package` will be deprecated. Please use `kedro micropkg` instead.
3029

@@ -415,7 +414,7 @@ Check your source directory. If you defined a different source directory (`sourc
415414
416415
## Major features and improvements
417416
418-
* Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks
417+
* Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks.
419418
* Added [kedro-starter-spaceflights](https://github.com/kedro-org/kedro-starter-spaceflights/) alias for generating a project: `kedro new --starter spaceflights`.
420419
421420
## Bug fixes and other changes

docs/conf.py

+1
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,7 @@
192192
# some of these complain that the sections don't exist (which is not true),
193193
# too many requests, or forbidden URL
194194
linkcheck_ignore = [
195+
"http://127.0.0.1:8787/status", # Dask's diagnostics dashboard
195196
"https://datacamp.com/community/tutorials/docstrings-python", # "forbidden" url
196197
"https://github.com/argoproj/argo/blob/master/README.md#quickstart",
197198
"https://console.aws.amazon.com/batch/home#/jobs",

docs/source/03_tutorial/05_visualise_pipeline.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,7 @@ You should be in your project root directory, and once Kedro-Viz is installed yo
1616
kedro viz
1717
```
1818

19-
This command will run a server on http://127.0.0.1:4141 that will open up your visualisation on a browser. You should
20-
be able to see the following:
19+
This command will run a server on http://127.0.0.1:4141 that will open up your visualisation on a browser. You should be able to see the following:
2120

2221
![](../meta/images/pipeline_visualisation.png)
2322

@@ -113,7 +112,7 @@ We have also used the Plotly integration to allow users to [visualise metrics fr
113112

114113
You need to update requirements.txt in your Kedro project and add the following datasets to enable plotly for your project.
115114

116-
`kedro[plotly.PlotlyDataSet, plotly.JSONDataSet]==0.17.7`
115+
`kedro[plotly.PlotlyDataSet, plotly.JSONDataSet]==0.17.7`
117116

118117

119118
You can view Plotly charts in Kedro-Viz when you use Kedro's plotly datasets.

docs/source/10_deployment/01_deployment_guide.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,10 @@ We also provide information to help you deploy to the following:
1515
* to [Kubeflow Workflows](06_kubeflow.md)
1616
* to [AWS Batch](07_aws_batch.md)
1717
* to [Databricks](08_databricks.md)
18+
* to [Dask](dask.md)
1819

1920
<!--- There has to be some non-link text in the bullets above, if it's just links, there's a Sphinx bug that fails the build process-->
2021

2122
In addition, we also provide instructions on [how to integrate a Kedro project with Amazon SageMaker](09_aws_sagemaker.md).
2223

23-
![](../meta/images/deployments.png)
24+
![](../meta/images/deployments.png) <!-- TODO(deepyaman): Add Dask to deployment flowchart. -->

docs/source/10_deployment/04_argo.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Deployment with Argo Workflows
22

3-
This page explains how to convert your Kedro pipeline to use [Argo Workflows](https://github.com/argoproj/argo-workflows), an open source container-native workflow engine for orchestrating parallel jobs on [Kubernetes](https://kubernetes.io/).
3+
This page explains how to convert your Kedro pipeline to use [Argo Workflows](https://github.com/argoproj/argo-workflows), an open-source container-native workflow engine for orchestrating parallel jobs on [Kubernetes](https://kubernetes.io/).
44

55
## Why would you use Argo Workflows?
66

docs/source/10_deployment/05_prefect.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Deployment with Prefect
22

3-
This page explains how to run your Kedro pipeline using [Prefect Core](https://www.prefect.io/products/core/), an open source workflow management system.
3+
This page explains how to run your Kedro pipeline using [Prefect Core](https://www.prefect.io/products/core/), an open-source workflow management system.
44

5-
In scope of this deployment we are interested in [Prefect Server](https://docs.prefect.io/orchestration/server/overview.html#what-is-prefect-server) which is an open-source backend that makes it easy to monitor and execute your Prefect flows and automatically extends the Prefect Core.
5+
In scope of this deployment, we are interested in [Prefect Server](https://docs.prefect.io/orchestration/server/overview.html#what-is-prefect-server), an open-source backend that makes it easy to monitor and execute your Prefect flows and automatically extends the Prefect Core.
66

77
```eval_rst
88
.. note:: Prefect Server ships out-of-the-box with a fully featured user interface.

docs/source/10_deployment/07_aws_batch.md

+11-7
Original file line numberDiff line numberDiff line change
@@ -118,12 +118,14 @@ Now that all the resources are in place, it's time to submit jobs to Batch progr
118118

119119
#### Create a custom runner
120120

121-
Create a new Python package `runner` in your `src` folder, i.e. `kedro_tutorial/src/kedro_tutorial/runner/`. Make sure there is an `__init__.py` file at this location and add another file named `batch_runner.py`, which will contain the implementation of your custom runner, `AWSBatchRunner`. The `AWSBatchRunner` will submit and monitor jobs asynchronously, surfacing any errors that occur on Batch.
121+
Create a new Python package `runner` in your `src` folder, i.e. `kedro_tutorial/src/kedro_tutorial/runner/`. Make sure there is an `__init__.py` file at this location, and add another file named `batch_runner.py`, which will contain the implementation of your custom runner, `AWSBatchRunner`. The `AWSBatchRunner` will submit and monitor jobs asynchronously, surfacing any errors that occur on Batch.
122122

123-
Make sure the `__init__.py` file in the `runner` folder includes the following import:
123+
Make sure the `__init__.py` file in the `runner` folder includes the following import and declaration:
124124

125125
```python
126-
from .batch_runner import AWSBatchRunner # NOQA
126+
from .batch_runner import AWSBatchRunner
127+
128+
__all__ = ["AWSBatchRunner"]
127129
```
128130

129131
Copy the contents of the script below into `batch_runner.py`:
@@ -286,13 +288,13 @@ def _track_batch_job(job_id: str, client: Any) -> None:
286288

287289
#### Set up Batch-related configuration
288290

289-
You'll need to set the Batch-related configuration that the runner will use. Add a `parameters.yml` file inside the `conf/aws_batch/` directory created as part of the prerequistes steps, which will include the following keys:
291+
You'll need to set the Batch-related configuration that the runner will use. Add a `parameters.yml` file inside the `conf/aws_batch/` directory created as part of the prerequistes with the following keys:
290292

291293
```yaml
292294
aws_batch:
293-
job_queue: "spaceflights_queue"
294-
job_definition: "kedro_run"
295-
max_workers: 2
295+
job_queue: "spaceflights_queue"
296+
job_definition: "kedro_run"
297+
max_workers: 2
296298
```
297299

298300
#### Update CLI implementation
@@ -315,6 +317,7 @@ def run(tag, env, parallel, ...):
315317
node_names = _get_values_as_tuple(node_names) if node_names else node_names
316318
317319
with KedroSession.create(env=env, extra_params=params) as session:
320+
context = session.load_context()
318321
runner_instance = _instantiate_runner(runner, is_async, context)
319322
session.run(
320323
tags=tag,
@@ -323,6 +326,7 @@ def run(tag, env, parallel, ...):
323326
from_nodes=from_nodes,
324327
to_nodes=to_nodes,
325328
from_inputs=from_inputs,
329+
to_outputs=to_outputs,
326330
load_versions=load_version,
327331
pipeline_name=pipeline,
328332
)

0 commit comments

Comments
 (0)