Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45862: Fix FixedSizeListBuilder behavior for null slots #45889

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Mar 22, 2025

Rationale for this change

vectorFromArray could produce an incorrect result when used on a FixedSizeList<T> with one or more null slots in trailing positions.

What changes are included in this PR?

Fixed behavior of the FixedSizeListBuilder.

Are these changes tested?

Yes. This PR includes specific test cases to cover this bug fix.

Are there any user-facing changes?

No.

Copy link

⚠️ GitHub issue #45862 has been automatically assigned in GitHub to PR creator.

@amoeba
Copy link
Member Author

amoeba commented Mar 22, 2025

Hi @domoritz @trxcllnt, do either of you have time to review this? This was my first real dive into the JS codebase so extra scrutiny is probably warranted.

Copy link
Contributor

@trxcllnt trxcllnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks for the PR! Just one suggestion on the implementation.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 23, 2025
Co-authored-by: Paul Taylor <178183+trxcllnt@users.noreply.github.com>
@amoeba amoeba requested a review from trxcllnt March 23, 2025 21:46
@amoeba
Copy link
Member Author

amoeba commented Mar 23, 2025

Thanks for the quick review @trxcllnt. I'll let CI run and I've re-requested a review in case you want another look or just to sign off.

@amoeba amoeba changed the title GH-45862: Modify FixedSizeListBuilder behavior for null slots GH-45862: Fix FixedSizeListBuilder behavior for null slots Mar 23, 2025
@pitrou
Copy link
Member

pitrou commented Mar 24, 2025

Not directly related, but the C++ FixedSizeListBuilder has null-related issues as well: #41188

@amoeba
Copy link
Member Author

amoeba commented Mar 24, 2025

Thanks for pointing that out @pitrou.

Comment on lines +333 to +338
new FixedSizeList(3, new Field('item', new Int32())),
);
let child = vector.getChildAt(0);

expect(child).toHaveLength(6);
expect(child?.nullCount).toBe(3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem here: In Typescript, Field() constructor defaults to non-nullable. So the child isn't supposed to have nulls in this case.

Technically, what I was expecting is that the child would just be filled with arbitrary (zero?) values, while the outer array masked them out in the bitmap. I think if the inner array is nullable, it's a good idea to pass down the nulls. But if it's not supposed to be nullable, I think it could cause problems with IPC if we do set nulls.

Copy link
Contributor

@trxcllnt trxcllnt Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue this PR fixes is that the child Builder's nullBitmap isn't being expanded to represent all the slots when a null value is written into the parent Builder. So if you append 1000 null values to the FixedSizeListBuilder, the child's nullBitmap remain zero-sized. Since we initialize the nullBitmap with zeroes, all we need to do is grow it both when we write non-nulls as well as nulls.

In terms of Field metadata, we don't use that anywhere in JS aside from faithfully propagating it in the IPC format. We assume the contents of the nullBitmap doesn't matter to other implementations if field.nullable == false (because the description of Field.nullable here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I reported this issue because I found the Arrow Rust implementation was rejecting the IPC batches being produced by this code path. I was worried that the nullability would cause incompatibilities as well, but I've just done some additional tests and it doesn't seem to be an issue. I'm good with merging these changes as is. 👍

Comment on lines 24 to 30
public setValue(index: number, value: T['TValue']) {
const [child] = this.children;
const start = index * this.stride;
for (let i = -1, n = value.length; ++i < n;) {
for (let i = -1, n = this.stride; ++i < n;) {
child.set(start + i, value[i]);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing where the nulls are handled. This seems like it would error if value was null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract is that setValue() is only called if the value is not null.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 24, 2025
Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @amoeba !

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants