Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix platform instance support on Druid ingestion #12716

Merged
merged 4 commits into from
Feb 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ This file documents any backwards-incompatible changes in DataHub and assists pe

- #12671: The `priority` field of the Incident entity is changed from an integer to an enum. This field was previously completely unused in UI and API, so this change should not affect existing deployments.

- #12716: Fix the `platform_instance` being added twice to the URN. If you want to have the previous behavior back, you need to add your platform_instance twice (i.e. `plat.plat`).


### Known Issues

Expand Down
6 changes: 1 addition & 5 deletions metadata-ingestion/src/datahub/ingestion/source/sql/druid.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,7 @@
"""

def get_identifier(self, schema: str, table: str) -> str:
return (
f"{self.platform_instance}.{table}"
if self.platform_instance
else f"{table}"
)
return f"{table}"

Check warning on line 53 in metadata-ingestion/src/datahub/ingestion/source/sql/druid.py

View check run for this annotation

Codecov / codecov/patch

metadata-ingestion/src/datahub/ingestion/source/sql/druid.py#L53

Added line #L53 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is restoring the original implementation, from 2021 https://github.com/datahub-project/datahub/pull/2284/files
which was updated on 2022 https://github.com/datahub-project/datahub/pull/3996/files#diff-d2b27025056d73e6bf1508f17c065c63dd4249caacf26de6199aaeaf446eec60

Have we been producing "duplicated" platform instance since then? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pretty sure that is the case, or something along the way changed.
But you can see that the default SQLAlchemy has no mention of a platform_instance:
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py#L576

If you check any other SQL ingestion with the platform_instance support, e.g. mssql, you can see that also there there is no need to specify the platform_instance at this level:
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/mssql/source.py#L731

The Druid ingestion is probably not used a lot, because you will other big inconsistencies on the ingestion. For example, the Druid table is always hardcoded to druid/v2/sql because of the sql alchemy implementation that by default will use the URI to compute the table name. This is probably something that should be fixed at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests for druid source were missed; I just added them here #12717
And indeed, the golden file shows the platform instance appears twice in the generated URNs for datasets.

Let's merge PR with integration tests first and then yours. Thanks for raising the issue and the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will wait on #12717 to get merged, and rebase this PR with your changes.
Thanks a lot on the support!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12717 merged



@platform_name("Druid")
Expand Down
Loading
Loading