Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL encoded partitions are getting re-encoded hence throwing not found errors #1896

Closed
mustafahasankhan opened this issue Nov 21, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@mustafahasankhan
Copy link

mustafahasankhan commented Nov 21, 2023

Environment

Delta-rs version: 0.13.0

Binding: Python

Environment: Local and K8s Pod both

  • Cloud provider: AWS
  • OS: MacOS Sonoma
  • Other:

Bug

What happened:

  • My delta lake is partitioned by a date column, hence my files look like:
    date_partition_col=2023-11-05 00%3A00%3A00/myfile.parquet

Notice that Spark automatically saves it in a URL encoded format.

  • When I am querying this Delta Table in delta-rs and listing files
dt = DeltaTable(delta_lake_path)
print(dt.files())

It gives output:
['date_partition_col=2023-11-05%2000%253A00%253A00/myfile.parquet']
Notice that the output is url encoded form of my actual file.
Thus, when reading these files I am getting 404 error:

HTTP status client error (404 Not Found) for url (https://s3.ap-south-1.amazonaws.com/my_bucket/my_delta_lake/date_partition_col=2023-11-05%2000%253A00%253A00/myfile.parquet)

What you expected to happen:

  • Not re-encode already encoded files.

How to reproduce it:

  • List/Read files from delta lake that have URL encoded partitions

More details:
I see the fix for this issue for filesystem was already included in v0.10.2 release, I am facing exact same issue even though I am using v0.13.0 (Python)
Thus had to create a new issue.

@mustafahasankhan mustafahasankhan added the bug Something isn't working label Nov 21, 2023
@ion-elgreco
Copy link
Collaborator

Should be resolved now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants