You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this PR, a basename_template_functor was added to the C++ dataset writer. With that, it's possible define arbitrary filenames, for example, have filenames with 0-padding, as even the documentation of the feature itself describes.
Writing datasets in Python however only exposes basename_template in the write_dataset method. This means, as far as I know, it's fundamentally impossible to write a dataset with 0-padding. This is a problem, since writing files without padding and reading them in does not preserve the order of rows, even though this could be trivially achievable.
For that, if the parameter basename_template even could be f-string-ish in the sense that users could define a custom 0-padding with 'part-{i:03d}.parquet' for example. Alternatively, if users could set any arbitrary Callable[[int], str] here, that would even be better.
Component(s)
Python
The text was updated successfully, but these errors were encountered:
Describe the enhancement requested
In this PR, a
basename_template_functor
was added to the C++ dataset writer. With that, it's possible define arbitrary filenames, for example, have filenames with 0-padding, as even the documentation of the feature itself describes.Writing datasets in Python however only exposes
basename_template
in thewrite_dataset
method. This means, as far as I know, it's fundamentally impossible to write a dataset with 0-padding. This is a problem, since writing files without padding and reading them in does not preserve the order of rows, even though this could be trivially achievable.For that, if the parameter
basename_template
even could be f-string-ish in the sense that users could define a custom 0-padding with'part-{i:03d}.parquet'
for example. Alternatively, if users could set any arbitraryCallable[[int], str]
here, that would even be better.Component(s)
Python
The text was updated successfully, but these errors were encountered: