Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: do not default the struct array length to 0 in Struct::try_new #7247

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

westonpace
Copy link
Member

Which issue does this PR close?

Closes #7246.

Rationale for this change

See PR

What changes are included in this PR?

  • StructArray::try_new will now return an error if there are no arrays provided
  • StructArray::new will panic if there are no arrays provided
  • StructArray::from(vec![]) will panic

Are there any user-facing changes?

BREAKING CHANGE: StructArray::try_new will now return an error if no child arrays are provided.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 6, 2025
@tustvold
Copy link
Contributor

tustvold commented Mar 6, 2025

I think as this is a breaking change regardless, we should probably just add the length as an explicit argument and avoid potential errors/panics. I don't really feel strongly though, I personally think empty StructArrays are something the arrow spec probably shouldn't permit, but that ship has sailed...

Perhaps @alamb has thoughts on the matter

@westonpace
Copy link
Member Author

Ah, as you were typing that I saw the suggestions from #6732 which suggested adding try_new_with_length (yet another alternative). I've implemented that and changed the docs for try_new to encourage try_new_with_length but I agree it might be more straightforward to force users to do the inference themselves.

@tustvold
Copy link
Contributor

tustvold commented Mar 6, 2025

which suggested adding try_new_with_length (yet another alternative)

I'm personally less a fan of this, as it isn't materially different from the current state of play - where there are multiple alternative methods.

I guess I was thinking something along the lines of

pub fn try_new(
        len: usize,
        fields: Fields,
        arrays: Vec<ArrayRef>,
        nulls: Option<NullBuffer>,
    ) -> Result<Self, ArrowError>

That being said, this would be potentially quite a disruptive change, and I am not sure how common empty StructArray actually are, I can't help feeling most of the time they'd arise from a deficiency in a projection system, rather than someone actually creating them...

For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StructArray::try_new behavior can be unexpected when there are no child arrays
2 participants