You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There appears to be a significant difference in deduplication behavior between Pinot v1.2.0 and v1.3.0. The behavior change affects how records are deduplicated based on the dedupTimeColumn and metadataTTL settings.
Environment
Affected Pinot Versions:
v1.3.0 (new behavior)
v1.2.0 (previous behavior)
Deduplication Behavior Differences
In v1.3.0:
Records only get deduped if at least one insertion record's dedupTimeColumn value is at most metadataTTL older than current time
If a record within TTL is inserted, then deduping works
Records outside TTL are successfully inserted even if the data is the same (potential duplicates)
If one record is encountered within TTL value, then the primary key is created and all future records with the same primary key value get deduped
In v1.2.0:
The dedupTimeColumn doesn't seem to affect deduplication
Any record inserted into Pinot gets the primary key generated irrespective of time column value
Future records with the same primary key value get deduped consistently
Expected Behavior
Deduplication should work consistently across versions and should properly deduplicate records based on the primary key, regardless of the time column values.
When using v1.2.0, the following warning appears during table addition, suggesting that the dedupTimeColumn and metadataTTL properties might not be recognized or used in this version:
Issue Description
There appears to be a significant difference in deduplication behavior between Pinot v1.2.0 and v1.3.0. The behavior change affects how records are deduplicated based on the
dedupTimeColumn
andmetadataTTL
settings.Environment
Deduplication Behavior Differences
In v1.3.0:
dedupTimeColumn
value is at mostmetadataTTL
older than current timeIn v1.2.0:
dedupTimeColumn
doesn't seem to affect deduplicationExpected Behavior
Deduplication should work consistently across versions and should properly deduplicate records based on the primary key, regardless of the time column values.
Table Configuration
Table Schema
Table Config
Observations
When using v1.2.0, the following warning appears during table addition, suggesting that the
dedupTimeColumn
andmetadataTTL
properties might not be recognized or used in this version:Impact
This behavior change can lead to:
Proposed Solution
Either:
Additional Information
Related Slack thread with more info: https://apache-pinot.slack.com/archives/C011C9JHN7R/p1740757158048619
The text was updated successfully, but these errors were encountered: