Skip to content

Commit b9696ef

Browse files
authored
chore: Rename FileOfflineStore to DaskOfflineStore (feast-dev#4349)
rename file offline store to dask Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
1 parent 92d17de commit b9696ef

File tree

13 files changed

+46
-81
lines changed

13 files changed

+46
-81
lines changed

docs/SUMMARY.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676
* [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md)
7777
* [Offline stores](reference/offline-stores/README.md)
7878
* [Overview](reference/offline-stores/overview.md)
79-
* [File](reference/offline-stores/file.md)
79+
* [Dask](reference/offline-stores/dask.md)
8080
* [Snowflake](reference/offline-stores/snowflake.md)
8181
* [BigQuery](reference/offline-stores/bigquery.md)
8282
* [Redshift](reference/offline-stores/redshift.md)
@@ -119,7 +119,7 @@
119119
* [Feature servers](reference/feature-servers/README.md)
120120
* [Python feature server](reference/feature-servers/python-feature-server.md)
121121
* [\[Alpha\] Go feature server](reference/feature-servers/go-feature-server.md)
122-
* [Offline Feature Server](reference/feature-servers/offline-feature-server)
122+
* [Offline Feature Server](reference/feature-servers/offline-feature-server.md)
123123
* [\[Beta\] Web UI](reference/alpha-web-ui.md)
124124
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
125125
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)

docs/reference/offline-stores/file.md docs/reference/offline-stores/dask.md

+8-9
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
1-
# File offline store
1+
# Dask offline store
22

33
## Description
44

5-
The file offline store provides support for reading [FileSources](../data-sources/file.md).
6-
It uses Dask as the compute engine.
5+
The Dask offline store provides support for reading [FileSources](../data-sources/file.md).
76

87
{% hint style="warning" %}
98
All data is downloaded and joined using Python and therefore may not scale to production workloads.
@@ -17,28 +16,28 @@ project: my_feature_repo
1716
registry: data/registry.db
1817
provider: local
1918
offline_store:
20-
type: file
19+
type: dask
2120
```
2221
{% endcode %}
2322
24-
The full set of configuration options is available in [FileOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.file.FileOfflineStoreConfig).
23+
The full set of configuration options is available in [DaskOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.dask.DaskOfflineStoreConfig).
2524
2625
## Functionality Matrix
2726
2827
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
29-
Below is a matrix indicating which functionality is supported by the file offline store.
28+
Below is a matrix indicating which functionality is supported by the dask offline store.
3029
31-
| | File |
30+
| | Dask |
3231
| :-------------------------------- | :-- |
3332
| `get_historical_features` (point-in-time correct join) | yes |
3433
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
3534
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
3635
| `offline_write_batch` (persist dataframes to offline store) | yes |
3736
| `write_logged_features` (persist logged features to offline store) | yes |
3837

39-
Below is a matrix indicating which functionality is supported by `FileRetrievalJob`.
38+
Below is a matrix indicating which functionality is supported by `DaskRetrievalJob`.
4039

41-
| | File |
40+
| | Dask |
4241
| --------------------------------- | --- |
4342
| export to dataframe | yes |
4443
| export to arrow table | yes |

docs/reference/offline-stores/overview.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ The first three of these methods all return a `RetrievalJob` specific to an offl
2525

2626
## Functionality Matrix
2727

28-
There are currently four core offline store implementations: `FileOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
28+
There are currently four core offline store implementations: `DaskOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
2929
There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, and `TrinoOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations.
3030
Details for each specific offline store, such as how to configure it in a `feature_store.yaml`, can be found [here](README.md).
3131

3232
Below is a matrix indicating which offline stores support which methods.
3333

34-
| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
34+
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
3535
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
3636
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes |
3737
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes |
@@ -42,7 +42,7 @@ Below is a matrix indicating which offline stores support which methods.
4242

4343
Below is a matrix indicating which `RetrievalJob`s support what functionality.
4444

45-
| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
45+
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB |
4646
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- |
4747
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes |
4848
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes |

sdk/python/docs/source/feast.infra.contrib.rst

-8
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,6 @@ feast.infra.contrib package
44
Submodules
55
----------
66

7-
feast.infra.contrib.azure\_provider module
8-
------------------------------------------
9-
10-
.. automodule:: feast.infra.contrib.azure_provider
11-
:members:
12-
:undoc-members:
13-
:show-inheritance:
14-
157
feast.infra.contrib.grpc\_server module
168
---------------------------------------
179

sdk/python/docs/source/feast.infra.feature_servers.rst

-2
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@ Subpackages
77
.. toctree::
88
:maxdepth: 4
99

10-
feast.infra.feature_servers.aws_lambda
11-
feast.infra.feature_servers.gcp_cloudrun
1210
feast.infra.feature_servers.local_process
1311
feast.infra.feature_servers.multicloud
1412

sdk/python/docs/source/feast.infra.offline_stores.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -28,18 +28,18 @@ feast.infra.offline\_stores.bigquery\_source module
2828
:undoc-members:
2929
:show-inheritance:
3030

31-
feast.infra.offline\_stores.duckdb module
32-
-----------------------------------------
31+
feast.infra.offline\_stores.dask module
32+
---------------------------------------
3333

34-
.. automodule:: feast.infra.offline_stores.duckdb
34+
.. automodule:: feast.infra.offline_stores.dask
3535
:members:
3636
:undoc-members:
3737
:show-inheritance:
3838

39-
feast.infra.offline\_stores.file module
40-
---------------------------------------
39+
feast.infra.offline\_stores.duckdb module
40+
-----------------------------------------
4141

42-
.. automodule:: feast.infra.offline_stores.file
42+
.. automodule:: feast.infra.offline_stores.duckdb
4343
:members:
4444
:undoc-members:
4545
:show-inheritance:

sdk/python/docs/source/feast.infra.registry.contrib.rst

-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ Subpackages
88
:maxdepth: 4
99

1010
feast.infra.registry.contrib.azure
11-
feast.infra.registry.contrib.postgres
1211

1312
Module contents
1413
---------------

sdk/python/docs/source/feast.infra.rst

-24
Original file line numberDiff line numberDiff line change
@@ -19,22 +19,6 @@ Subpackages
1919
Submodules
2020
----------
2121

22-
feast.infra.aws module
23-
----------------------
24-
25-
.. automodule:: feast.infra.aws
26-
:members:
27-
:undoc-members:
28-
:show-inheritance:
29-
30-
feast.infra.gcp module
31-
----------------------
32-
33-
.. automodule:: feast.infra.gcp
34-
:members:
35-
:undoc-members:
36-
:show-inheritance:
37-
3822
feast.infra.infra\_object module
3923
--------------------------------
4024

@@ -51,14 +35,6 @@ feast.infra.key\_encoding\_utils module
5135
:undoc-members:
5236
:show-inheritance:
5337

54-
feast.infra.local module
55-
------------------------
56-
57-
.. automodule:: feast.infra.local
58-
:members:
59-
:undoc-members:
60-
:show-inheritance:
61-
6238
feast.infra.passthrough\_provider module
6339
----------------------------------------
6440

sdk/python/feast/infra/offline_stores/file.py sdk/python/feast/infra/offline_stores/dask.py

+14-14
Original file line numberDiff line numberDiff line change
@@ -39,20 +39,20 @@
3939
from feast.saved_dataset import SavedDatasetStorage
4040
from feast.utils import _get_requested_feature_views_to_features_dict
4141

42-
# FileRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1
42+
# DaskRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1
4343
# This is not the desired behavior for our use case, so we set the convert-string option to False
4444
# See (https://github.com/dask/dask/issues/10881#issuecomment-1923327936)
4545
dask.config.set({"dataframe.convert-string": False})
4646

4747

48-
class FileOfflineStoreConfig(FeastConfigBaseModel):
49-
"""Offline store config for local (file-based) store"""
48+
class DaskOfflineStoreConfig(FeastConfigBaseModel):
49+
"""Offline store config for dask store"""
5050

51-
type: Literal["file"] = "file"
51+
type: Union[Literal["dask"], Literal["file"]] = "dask"
5252
""" Offline store type selector"""
5353

5454

55-
class FileRetrievalJob(RetrievalJob):
55+
class DaskRetrievalJob(RetrievalJob):
5656
def __init__(
5757
self,
5858
evaluation_function: Callable,
@@ -122,7 +122,7 @@ def supports_remote_storage_export(self) -> bool:
122122
return False
123123

124124

125-
class FileOfflineStore(OfflineStore):
125+
class DaskOfflineStore(OfflineStore):
126126
@staticmethod
127127
def get_historical_features(
128128
config: RepoConfig,
@@ -133,7 +133,7 @@ def get_historical_features(
133133
project: str,
134134
full_feature_names: bool = False,
135135
) -> RetrievalJob:
136-
assert isinstance(config.offline_store, FileOfflineStoreConfig)
136+
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
137137
for fv in feature_views:
138138
assert isinstance(fv.batch_source, FileSource)
139139

@@ -283,7 +283,7 @@ def evaluate_historical_retrieval():
283283

284284
return entity_df_with_features.persist()
285285

286-
job = FileRetrievalJob(
286+
job = DaskRetrievalJob(
287287
evaluation_function=evaluate_historical_retrieval,
288288
full_feature_names=full_feature_names,
289289
on_demand_feature_views=OnDemandFeatureView.get_requested_odfvs(
@@ -309,7 +309,7 @@ def pull_latest_from_table_or_query(
309309
start_date: datetime,
310310
end_date: datetime,
311311
) -> RetrievalJob:
312-
assert isinstance(config.offline_store, FileOfflineStoreConfig)
312+
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
313313
assert isinstance(data_source, FileSource)
314314

315315
# Create lazy function that is only called from the RetrievalJob object
@@ -372,7 +372,7 @@ def evaluate_offline_job():
372372
return source_df[list(columns_to_extract)].persist()
373373

374374
# When materializing a single feature view, we don't need full feature names. On demand transforms aren't materialized
375-
return FileRetrievalJob(
375+
return DaskRetrievalJob(
376376
evaluation_function=evaluate_offline_job,
377377
full_feature_names=False,
378378
)
@@ -387,10 +387,10 @@ def pull_all_from_table_or_query(
387387
start_date: datetime,
388388
end_date: datetime,
389389
) -> RetrievalJob:
390-
assert isinstance(config.offline_store, FileOfflineStoreConfig)
390+
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
391391
assert isinstance(data_source, FileSource)
392392

393-
return FileOfflineStore.pull_latest_from_table_or_query(
393+
return DaskOfflineStore.pull_latest_from_table_or_query(
394394
config=config,
395395
data_source=data_source,
396396
join_key_columns=join_key_columns
@@ -410,7 +410,7 @@ def write_logged_features(
410410
logging_config: LoggingConfig,
411411
registry: BaseRegistry,
412412
):
413-
assert isinstance(config.offline_store, FileOfflineStoreConfig)
413+
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
414414
destination = logging_config.destination
415415
assert isinstance(destination, FileLoggingDestination)
416416

@@ -441,7 +441,7 @@ def offline_write_batch(
441441
table: pyarrow.Table,
442442
progress: Optional[Callable[[int], Any]],
443443
):
444-
assert isinstance(config.offline_store, FileOfflineStoreConfig)
444+
assert isinstance(config.offline_store, DaskOfflineStoreConfig)
445445
assert isinstance(feature_view.batch_source, FileSource)
446446

447447
pa_schema, column_names = get_pyarrow_schema_from_batch_source(

sdk/python/feast/repo_config.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@
6868
}
6969

7070
OFFLINE_STORE_CLASS_FOR_TYPE = {
71-
"file": "feast.infra.offline_stores.file.FileOfflineStore",
71+
"file": "feast.infra.offline_stores.dask.DaskOfflineStore",
72+
"dask": "feast.infra.offline_stores.dask.DaskOfflineStore",
7273
"bigquery": "feast.infra.offline_stores.bigquery.BigQueryOfflineStore",
7374
"redshift": "feast.infra.offline_stores.redshift.RedshiftOfflineStore",
7475
"snowflake.offline": "feast.infra.offline_stores.snowflake.SnowflakeOfflineStore",
@@ -205,7 +206,7 @@ def __init__(self, **data: Any):
205206
self.registry_config = data["registry"]
206207

207208
self._offline_store = None
208-
self.offline_config = data.get("offline_store", "file")
209+
self.offline_config = data.get("offline_store", "dask")
209210

210211
self._online_store = None
211212
self.online_config = data.get("online_store", "sqlite")
@@ -348,7 +349,7 @@ def _validate_offline_store_config(cls, values: Any) -> Any:
348349

349350
# Set the default type
350351
if "type" not in values["offline_store"]:
351-
values["offline_store"]["type"] = "file"
352+
values["offline_store"]["type"] = "dask"
352353

353354
offline_store_type = values["offline_store"]["type"]
354355

sdk/python/tests/integration/feature_repos/universal/data_sources/file.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
from feast.data_format import DeltaFormat, ParquetFormat
2121
from feast.data_source import DataSource
2222
from feast.feature_logging import LoggingDestination
23+
from feast.infra.offline_stores.dask import DaskOfflineStoreConfig
2324
from feast.infra.offline_stores.duckdb import DuckDBOfflineStoreConfig
24-
from feast.infra.offline_stores.file import FileOfflineStoreConfig
2525
from feast.infra.offline_stores.file_source import (
2626
FileLoggingDestination,
2727
SavedDatasetFileStorage,
@@ -84,7 +84,7 @@ def get_prefixed_table_name(self, suffix: str) -> str:
8484
return f"{self.project_name}.{suffix}"
8585

8686
def create_offline_store_config(self) -> FeastConfigBaseModel:
87-
return FileOfflineStoreConfig()
87+
return DaskOfflineStoreConfig()
8888

8989
def create_logged_features_destination(self) -> LoggingDestination:
9090
d = tempfile.mkdtemp(prefix=self.project_name)
@@ -334,7 +334,7 @@ def get_prefixed_table_name(self, suffix: str) -> str:
334334
return f"{suffix}"
335335

336336
def create_offline_store_config(self) -> FeastConfigBaseModel:
337-
return FileOfflineStoreConfig()
337+
return DaskOfflineStoreConfig()
338338

339339
def teardown(self):
340340
self.minio.stop()

sdk/python/tests/unit/infra/offline_stores/test_offline_store.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
from feast.infra.offline_stores.contrib.trino_offline_store.trino import (
2424
TrinoRetrievalJob,
2525
)
26-
from feast.infra.offline_stores.file import FileRetrievalJob
26+
from feast.infra.offline_stores.dask import DaskRetrievalJob
2727
from feast.infra.offline_stores.offline_store import RetrievalJob, RetrievalMetadata
2828
from feast.infra.offline_stores.redshift import (
2929
RedshiftOfflineStoreConfig,
@@ -100,7 +100,7 @@ def metadata(self) -> Optional[RetrievalMetadata]:
100100
@pytest.fixture(
101101
params=[
102102
MockRetrievalJob,
103-
FileRetrievalJob,
103+
DaskRetrievalJob,
104104
RedshiftRetrievalJob,
105105
SnowflakeRetrievalJob,
106106
AthenaRetrievalJob,
@@ -112,8 +112,8 @@ def metadata(self) -> Optional[RetrievalMetadata]:
112112
]
113113
)
114114
def retrieval_job(request, environment):
115-
if request.param is FileRetrievalJob:
116-
return FileRetrievalJob(lambda: 1, full_feature_names=False)
115+
if request.param is DaskRetrievalJob:
116+
return DaskRetrievalJob(lambda: 1, full_feature_names=False)
117117
elif request.param is RedshiftRetrievalJob:
118118
offline_store_config = RedshiftOfflineStoreConfig(
119119
cluster_id="feast-int-bucket",

sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import pytest
66
from moto import mock_dynamodb
77

8-
from feast.infra.offline_stores.file import FileOfflineStoreConfig
8+
from feast.infra.offline_stores.dask import DaskOfflineStoreConfig
99
from feast.infra.online_stores.dynamodb import (
1010
DynamoDBOnlineStore,
1111
DynamoDBOnlineStoreConfig,
@@ -40,7 +40,7 @@ def repo_config():
4040
provider=PROVIDER,
4141
online_store=DynamoDBOnlineStoreConfig(region=REGION),
4242
# online_store={"type": "dynamodb", "region": REGION},
43-
offline_store=FileOfflineStoreConfig(),
43+
offline_store=DaskOfflineStoreConfig(),
4444
entity_key_serialization_version=2,
4545
)
4646

0 commit comments

Comments
 (0)