-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45862: Fix FixedSizeListBuilder behavior for null slots #45889
base: main
Are you sure you want to change the base?
GH-45862: Fix FixedSizeListBuilder behavior for null slots #45889
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks for the PR! Just one suggestion on the implementation.
Co-authored-by: Paul Taylor <178183+trxcllnt@users.noreply.github.com>
Thanks for the quick review @trxcllnt. I'll let CI run and I've re-requested a review in case you want another look or just to sign off. |
Not directly related, but the C++ FixedSizeListBuilder has null-related issues as well: #41188 |
Thanks for pointing that out @pitrou. |
new FixedSizeList(3, new Field('item', new Int32())), | ||
); | ||
let child = vector.getChildAt(0); | ||
|
||
expect(child).toHaveLength(6); | ||
expect(child?.nullCount).toBe(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem here: In Typescript, Field()
constructor defaults to non-nullable. So the child isn't supposed to have nulls in this case.
Technically, what I was expecting is that the child would just be filled with arbitrary (zero?) values, while the outer array masked them out in the bitmap. I think if the inner array is nullable, it's a good idea to pass down the nulls. But if it's not supposed to be nullable, I think it could cause problems with IPC if we do set nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue this PR fixes is that the child Builder's nullBitmap isn't being expanded to represent all the slots when a null value is written into the parent Builder. So if you append 1000 null values to the FixedSizeListBuilder
, the child's nullBitmap remain zero-sized. Since we initialize the nullBitmap with zeroes, all we need to do is grow it both when we write non-nulls as well as nulls.
In terms of Field metadata, we don't use that anywhere in JS aside from faithfully propagating it in the IPC format. We assume the contents of the nullBitmap doesn't matter to other implementations if field.nullable == false
(because the description of Field.nullable
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I reported this issue because I found the Arrow Rust implementation was rejecting the IPC batches being produced by this code path. I was worried that the nullability would cause incompatibilities as well, but I've just done some additional tests and it doesn't seem to be an issue. I'm good with merging these changes as is. 👍
public setValue(index: number, value: T['TValue']) { | ||
const [child] = this.children; | ||
const start = index * this.stride; | ||
for (let i = -1, n = value.length; ++i < n;) { | ||
for (let i = -1, n = this.stride; ++i < n;) { | ||
child.set(start + i, value[i]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing where the nulls are handled. This seems like it would error if value
was null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract is that setValue()
is only called if the value is not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @amoeba !
Rationale for this change
vectorFromArray
could produce an incorrect result when used on aFixedSizeList<T>
with one or more null slots in trailing positions.What changes are included in this PR?
Fixed behavior of the FixedSizeListBuilder.
Are these changes tested?
Yes. This PR includes specific test cases to cover this bug fix.
Are there any user-facing changes?
No.