You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: RELEASE.md
+1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
# Upcoming Release 0.19.7
2
2
3
3
## Major features and improvements
4
+
* Exposed `load` and `save` publicly for each dataset in the core `kedro` library, and enabled other datasets to do the same. If a dataset doesn't expose `load` or `save` publicly, Kedro will fall back to using `_load` or `_save`, respectively.
4
5
5
6
## Bug fixes and other changes
6
7
* Updated error message for invalid catalog entries.
Copy file name to clipboardexpand all lines: docs/source/data/how_to_create_a_custom_dataset.md
+17-17
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
## AbstractDataset
6
6
7
-
If you are a contributor and would like to submit a new dataset, you must extend the {py:class}`~kedro.io.AbstractDataset` interface or {py:class}`~kedro.io.AbstractVersionedDataset` interface if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.
7
+
If you are a contributor and would like to submit a new dataset, you must extend the {py:class}`~kedro.io.AbstractDataset` interface or {py:class}`~kedro.io.AbstractVersionedDataset` interface if you plan to support versioning. It requires subclasses to implement the `load` and `save` methods while providing wrappers that enrich the corresponding methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.
8
8
9
9
10
10
## Scenario
@@ -31,8 +31,8 @@ Consult the [Pillow documentation](https://pillow.readthedocs.io/en/stable/insta
31
31
32
32
At the minimum, a valid Kedro dataset needs to subclass the base {py:class}`~kedro.io.AbstractDataset` and provide an implementation for the following abstract methods:
33
33
34
-
*`_load`
35
-
*`_save`
34
+
*`load`
35
+
*`save`
36
36
*`_describe`
37
37
38
38
`AbstractDataset` is generically typed with an input data type for saving data, and an output data type for loading data.
@@ -70,15 +70,15 @@ class ImageDataset(AbstractDataset[np.ndarray, np.ndarray]):
70
70
"""
71
71
self._filepath = filepath
72
72
73
-
def_load(self) -> np.ndarray:
73
+
defload(self) -> np.ndarray:
74
74
"""Loads data from the image file.
75
75
76
76
Returns:
77
77
Data from the image file as a numpy array.
78
78
"""
79
79
...
80
80
81
-
def_save(self, data: np.ndarray) -> None:
81
+
defsave(self, data: np.ndarray) -> None:
82
82
"""Saves image data to the specified filepath"""
83
83
...
84
84
@@ -96,11 +96,11 @@ src/kedro_pokemon/datasets
96
96
└── image_dataset.py
97
97
```
98
98
99
-
## Implement the `_load` method with `fsspec`
99
+
## Implement the `load` method with `fsspec`
100
100
101
101
Many of the built-in Kedro datasets rely on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) as a consistent interface to different data sources, as described earlier in the section about the [Data Catalog](../data/data_catalog.md#dataset-filepath). In this example, it's particularly convenient to use `fsspec` in conjunction with `Pillow` to read image data, since it allows the dataset to work flexibly with different image locations and formats.
102
102
103
-
Here is the implementation of the `_load` method using `fsspec` and `Pillow` to read the data of a single image into a `numpy` array:
103
+
Here is the implementation of the `load` method using `fsspec` and `Pillow` to read the data of a single image into a `numpy` array:
104
104
105
105
<details>
106
106
<summary><b>Click to expand</b></summary>
@@ -130,7 +130,7 @@ class ImageDataset(AbstractDataset[np.ndarray, np.ndarray]):
130
130
self._filepath = PurePosixPath(path)
131
131
self._fs = fsspec.filesystem(self._protocol)
132
132
133
-
def_load(self) -> np.ndarray:
133
+
defload(self) -> np.ndarray:
134
134
"""Loads data from the image file.
135
135
136
136
Returns:
@@ -168,14 +168,14 @@ In [2]: from PIL import Image
168
168
In [3]: Image.fromarray(image).show()
169
169
```
170
170
171
-
## Implement the `_save` method with `fsspec`
171
+
## Implement the `save` method with `fsspec`
172
172
173
173
Similarly, we can implement the `_save` method as follows:
@@ -312,7 +312,7 @@ To add versioning support to the new dataset we need to extend the
312
312
{py:class}`~kedro.io.AbstractVersionedDataset` to:
313
313
314
314
* Accept a `version` keyword argument as part of the constructor
315
-
* Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively
315
+
* Adapt the `load` and `save` method to use the versioned data path obtained from `_get_load_path` and `_get_save_path` respectively
316
316
317
317
The following amends the full implementation of our basic `ImageDataset`. It now loads and saves data to and from a versioned subfolder (`data/01_raw/pokemon-images-and-types/images/images/pikachu.png/<version>/pikachu.png` with `version` being a datetime-formatted string `YYYY-MM-DDThh.mm.ss.sssZ` by default):
318
318
@@ -359,7 +359,7 @@ class ImageDataset(AbstractVersionedDataset[np.ndarray, np.ndarray]):
359
359
glob_function=self._fs.glob,
360
360
)
361
361
362
-
def_load(self) -> np.ndarray:
362
+
defload(self) -> np.ndarray:
363
363
"""Loads data from the image file.
364
364
365
365
Returns:
@@ -370,7 +370,7 @@ class ImageDataset(AbstractVersionedDataset[np.ndarray, np.ndarray]):
0 commit comments