Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Expose setting basename_template_functor in Python or make basename_template padding-compatible #45851

Open
jonded94 opened this issue Mar 18, 2025 · 0 comments

Comments

@jonded94
Copy link
Contributor

jonded94 commented Mar 18, 2025

Describe the enhancement requested

In this PR, a basename_template_functor was added to the C++ dataset writer. With that, it's possible define arbitrary filenames, for example, have filenames with 0-padding, as even the documentation of the feature itself describes.

Writing datasets in Python however only exposes basename_template in the write_dataset method. This means, as far as I know, it's fundamentally impossible to write a dataset with 0-padding. This is a problem, since writing files without padding and reading them in does not preserve the order of rows, even though this could be trivially achievable.

For that, if the parameter basename_template even could be f-string-ish in the sense that users could define a custom 0-padding with 'part-{i:03d}.parquet' for example. Alternatively, if users could set any arbitrary Callable[[int], str] here, that would even be better.

Component(s)

Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant