Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support options in OpenSearch sink to prevent duplicates by querying OpenSearch #5442

Open
graytaylor0 opened this issue Feb 18, 2025 · 1 comment
Labels
enhancement New feature or request follow up

Comments

@graytaylor0
Copy link
Member

Is your feature request related to a problem? Please describe.
As a user of the OpenSearch sink without for OpenSearch Serverless vector and time series collections that do not support custom document id, I would like to prevent duplicate data from entering OpenSearch.

Describe the solution you'd like
Configuration options in the OpenSearch sink that will enable querying OpenSearch for a documents that may already exist in OpenSearch to prevent duplicate documents.

- opensearch:
        ....
        query_for_existing_document:
          query_when: 'getMetadata("potential_duplicate") == true'
          query_term: 'id'
          action_on_found: drop // only option currently
          query_duration: PT3M

Additional context
Add any other context or screenshots about the feature request here.

@dlvenable
Copy link
Member

@graytaylor0 , Could we solve this by implementing OpenSearch as a processor to find it, then use the drop_events processor to drop it. We have #1984 opened as a request to query data from OpenSearch. That could help solve this more generically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request follow up
Projects
Development

No branches or pull requests

2 participants