Support options in OpenSearch sink to prevent duplicates by querying OpenSearch #5442

graytaylor0 · 2025-02-18T22:07:55Z

Is your feature request related to a problem? Please describe.
As a user of the OpenSearch sink without for OpenSearch Serverless vector and time series collections that do not support custom document id, I would like to prevent duplicate data from entering OpenSearch.

Describe the solution you'd like
Configuration options in the OpenSearch sink that will enable querying OpenSearch for a documents that may already exist in OpenSearch to prevent duplicate documents.

- opensearch:
        ....
        query_for_existing_document:
          query_when: 'getMetadata("potential_duplicate") == true'
          query_term: 'id'
          action_on_found: drop // only option currently
          query_duration: PT3M

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

dlvenable · 2025-02-25T21:01:10Z

@graytaylor0 , Could we solve this by implementing OpenSearch as a processor to find it, then use the drop_events processor to drop it. We have #1984 opened as a request to query data from OpenSearch. That could help solve this more generically.

graytaylor0 added the untriaged label Feb 18, 2025

github-project-automation bot added this to Data Prepper Tracking Board Feb 18, 2025

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Feb 18, 2025

graytaylor0 added enhancement and removed untriaged labels Feb 18, 2025

github-actions bot added the untriaged label Feb 18, 2025

dlvenable removed the untriaged label Feb 25, 2025

dlvenable added the follow up label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support options in OpenSearch sink to prevent duplicates by querying OpenSearch #5442

Support options in OpenSearch sink to prevent duplicates by querying OpenSearch #5442

graytaylor0 commented Feb 18, 2025

dlvenable commented Feb 25, 2025

Support options in OpenSearch sink to prevent duplicates by querying OpenSearch #5442

Support options in OpenSearch sink to prevent duplicates by querying OpenSearch #5442

Comments

graytaylor0 commented Feb 18, 2025

dlvenable commented Feb 25, 2025