Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct truncation of AnyValues when using strings or bytes #9269

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

timsaucer
Copy link
Contributor

Related

Closes #8781

What

This PR adjusts the way we process AnyValues. WIthout this change we have cases where numpy is setting data types to have fixed length based on the first value it receives. This ends up truncating the data.

Additionally, we cannot simply call pa.array() because types like strings and bytes are iterable and will get turned into an array of characters or bytes, respectively.

This PR attempts three passes at converting values

  • Attempt to call pa.array() directly, but with a special case to ignore string and bytes
  • Attempt to cast to a pyarrow Scalar and make an array from the scalar
  • Fall back to handling with numpy, which handles a wide variety of mixes between lists, tuples, etc.

Added unit test to capture failure mode in attached issue.

@timsaucer timsaucer self-assigned this Mar 12, 2025
@timsaucer timsaucer added 🪳 bug Something isn't working include in changelog 🐍 Python API Python logging API labels Mar 12, 2025
@timsaucer
Copy link
Contributor Author

@rerun-bot full-check

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working include in changelog 🐍 Python API Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PythonAPI: Adding additional AnyValues with lists of strings sometimes give (empty) or truncated text
1 participant