From f0db9583f7c13dd4674657cf19dc8ef2a6ddec21 Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Wed, 2 Oct 2024 14:15:55 -0400 Subject: [PATCH 1/3] Update beta-on-demand-feature-view.md --- docs/reference/beta-on-demand-feature-view.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/beta-on-demand-feature-view.md b/docs/reference/beta-on-demand-feature-view.md index 6b4c3c667a0..1478e72d3b6 100644 --- a/docs/reference/beta-on-demand-feature-view.md +++ b/docs/reference/beta-on-demand-feature-view.md @@ -6,14 +6,14 @@ On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both historical retrieval and online retrieval paths. -Currently, these transformations are executed locally. This is fine for online serving, but does not scale well offline. +Currently, these transformations are executed on reads and locally. This is fine for online serving, but does not scale well offline. ### Why use on demand feature views? This enables data scientists to easily impact the online feature retrieval path. For example, a data scientist could 1. Call `get_historical_features` to generate a training dataframe -2. Iterate in notebook on feature engineering in Pandas +2. Iterate in notebook on feature engineering in Pandas/Python 3. Copy transformation logic into on demand feature views and commit to a dev branch of the feature repository 4. Verify with `get_historical_features` (on a small dataset) that the transformation gives expected output over historical data 5. Verify with `get_online_features` on dev branch that the transformation correctly outputs online features From 5c625480b6e68ee59d82e7603aa6b0b3fbcf562f Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Sun, 6 Oct 2024 21:44:54 -0400 Subject: [PATCH 2/3] updated docs Signed-off-by: Francisco Javier Arceo --- docs/reference/beta-on-demand-feature-view.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/reference/beta-on-demand-feature-view.md b/docs/reference/beta-on-demand-feature-view.md index 1478e72d3b6..6ccc648616c 100644 --- a/docs/reference/beta-on-demand-feature-view.md +++ b/docs/reference/beta-on-demand-feature-view.md @@ -4,9 +4,9 @@ ## Overview -On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both historical retrieval and online retrieval paths. - -Currently, these transformations are executed on reads and locally. This is fine for online serving, but does not scale well offline. +On demand feature views allows data scientists to use existing features and request time data (features only available +at request time) to transform and create new features at the time the data is read from the online store. Users define +python transformation logic which is executed in both historical retrieval and online retrieval paths. ### Why use on demand feature views? From a1fa7bda159a68162f8abdf9eea36f84d0409907 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Sun, 6 Oct 2024 22:10:09 -0400 Subject: [PATCH 3/3] feat: Adding documentation for On Demand Feature Transformations with writes Signed-off-by: Francisco Javier Arceo --- docs/reference/beta-on-demand-feature-view.md | 76 +++++++++++++++++-- 1 file changed, 69 insertions(+), 7 deletions(-) diff --git a/docs/reference/beta-on-demand-feature-view.md b/docs/reference/beta-on-demand-feature-view.md index 6ccc648616c..55fe534446e 100644 --- a/docs/reference/beta-on-demand-feature-view.md +++ b/docs/reference/beta-on-demand-feature-view.md @@ -4,9 +4,16 @@ ## Overview -On demand feature views allows data scientists to use existing features and request time data (features only available -at request time) to transform and create new features at the time the data is read from the online store. Users define -python transformation logic which is executed in both historical retrieval and online retrieval paths. +On Demand Feature Views (ODFVs) allow data scientists to use existing features and request-time data (features only +available at request time) to transform and create new features. Users define Python transformation logic which is +executed during both historical retrieval and online retrieval. Additionally, ODFVs provide flexibility in +applying transformations either during data ingestion (at write time) or during feature retrieval (at read time), +controlled via the `write_to_online_store` parameter. + +By setting `write_to_online_store=True`, transformations are applied during data ingestion, and the transformed +features are stored in the online store. This can improve online feature retrieval performance by reducing computation +during reads. Conversely, if `write_to_online_store=False` (the default if omitted), transformations are applied during +feature retrieval. ### Why use on demand feature views? @@ -14,10 +21,11 @@ This enables data scientists to easily impact the online feature retrieval path. 1. Call `get_historical_features` to generate a training dataframe 2. Iterate in notebook on feature engineering in Pandas/Python -3. Copy transformation logic into on demand feature views and commit to a dev branch of the feature repository +3. Copy transformation logic into ODFVs and commit to a development branch of the feature repository 4. Verify with `get_historical_features` (on a small dataset) that the transformation gives expected output over historical data -5. Verify with `get_online_features` on dev branch that the transformation correctly outputs online features -6. Submit a pull request to the staging / prod branches which impact production traffic +5. Decide whether to apply the transformation on writes or on reads by setting the `write_to_online_store` parameter accordingly. +6. Verify with `get_online_features` on dev branch that the transformation correctly outputs online features +7. Submit a pull request to the staging / prod branches which impact production traffic ## CLI @@ -32,10 +40,18 @@ See [https://github.com/feast-dev/on-demand-feature-views-demo](https://github.c ### **Registering transformations** -On Demand Transformations support transformations using Pandas and native Python. Note, Native Python is much faster but not yet tested for offline retrieval. +On Demand Transformations support transformations using Pandas and native Python. Note, Native Python is much faster +but not yet tested for offline retrieval. + +When defining an ODFV, you can control when the transformation is applied using the write_to_online_store parameter: + +- `write_to_online_store=True`: The transformation is applied during data ingestion (on write), and the transformed features are stored in the online store. +- `write_to_online_store=False` (default when omitted): The transformation is applied during feature retrieval (on read). We register `RequestSource` inputs and the transform in `on_demand_feature_view`: +## Example of an On Demand Transformation on Read + ```python from feast import Field, RequestSource from feast.types import Float64, Int64 @@ -100,6 +116,52 @@ def transformed_conv_rate_python(inputs: Dict[str, Any]) -> Dict[str, Any]: return output ``` +## Example of an On Demand Transformation on Write + +```python +from feast import Field, on_demand_feature_view +from feast.types import Float64 +import pandas as pd + +# Existing Feature View +driver_hourly_stats_view = ... + +# Define an ODFV without RequestSource +@on_demand_feature_view( + sources=[driver_hourly_stats_view], + schema=[ + Field(name='conv_rate_adjusted', dtype=Float64), + ], + mode="pandas", + write_to_online_store=True, # Apply transformation during write time +) +def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame: + df = pd.DataFrame() + df['conv_rate_adjusted'] = features_df['conv_rate'] * 1.1 # Adjust conv_rate by 10% + return df +``` +Then to ingest the data with the new feature view make sure to include all of the input features required for the +transformations: + +```python +from feast import FeatureStore +import pandas as pd + +store = FeatureStore(repo_path=".") + +# Data to ingest +data = pd.DataFrame({ + "driver_id": [1001], + "event_timestamp": [pd.Timestamp.now()], + "conv_rate": [0.5], + "acc_rate": [0.8], + "avg_daily_trips": [10], +}) + +# Ingest data to the online store +store.push("driver_hourly_stats_view", data) +``` + ### **Feature retrieval** {% hint style="info" %}