Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Incorrect FSL values length when last element is null #45862

Open
wjones127 opened this issue Mar 19, 2025 · 2 comments
Open

[JS] Incorrect FSL values length when last element is null #45862

wjones127 opened this issue Mar 19, 2025 · 2 comments

Comments

@wjones127
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

vectorFromArray can produce an invalid FixedSizeList array, where the values length doesn't match the list_size * length. This makes other implementations error when they receive an IPC batch from JS.

This only seems to happen when the null is at the end. If it's at the beginning, it works fine.

let arrow = require("apache-arrow");

let badArray = arrow.vectorFromArray(
    [[1, 2, 3], null],
    new arrow.FixedSizeList(3, new  arrow.Field("item", new  arrow.Float32())),
);
badArray.getChildAt(0)
Vector [FloatVector<Float>] {
  isValid: [Function (anonymous)],
  get: [Function (anonymous)],
  set: [Function (anonymous)],
  indexOf: [Function (anonymous)],
  _offsets: [ 0, 3 ],
  data: [
    Data {
      type: [Float32 [Float]],
      children: [],
      dictionary: undefined,
      offset: 0,
      length: 3,
      _nullCount: 0,
      stride: 1,
      values: [Float32Array],
      nullBitmap: Uint8Array(0) []
    }
  ],
  type: Float32 [Float] { typeId: 3, precision: 1 },
  stride: 1,
  numChildren: 0,
  length: 3 // <-- This is incorrect!
}
let goodArray = arrow.vectorFromArray(
    [null, [1, 2, 3]],
    new arrow.FixedSizeList(3, new  arrow.Field("item", new  arrow.Float32())),
);
goodArray.getChildAt(0) // null
Vector [FloatVector<Float>] {
  isValid: [Function (anonymous)],
  get: [Function (anonymous)],
  set: [Function (anonymous)],
  indexOf: [Function (anonymous)],
  _offsets: [ 0, 6 ],
  data: [
    Data {
      type: [Float32 [Float]],
      children: [],
      dictionary: undefined,
      offset: 0,
      length: 6,
      _nullCount: 3,
      stride: 1,
      values: [Float32Array],
      nullBitmap: [Uint8Array]
    }
  ],
  type: Float32 [Float] { typeId: 3, precision: 1 },
  stride: 1,
  numChildren: 0,
  length: 6 // Correct :)
}

We caught this when reading via IPC into arrow-rs.

Component(s)

JavaScript

@wjones127
Copy link
Member Author

This can be used as a workaround:

function patchedVectorFromArray(
    data,
    type
) {
    // If FSL type with float
    if (arrow.DataType.isFixedSizeList(type) && arrow.DataType.isFloat(type.valueType)) {
        let extendedData = [...data, new Array(type.listSize).fill(0.0)];
        let array = arrow.vectorFromArray(extendedData, type);
        return array.slice(0, data.length);
    } else {
        return arrow.vectorFromArray(data, type);
    }
}

@amoeba
Copy link
Member

amoeba commented Mar 22, 2025

Hi @wjones127, I put up a PR for this. When testing, I noticed the issue wasn't constrained to FixedSizeList and that it was a more general behavior of the FixedSizeListBuilder. I didn't tag you as a reviewer but if you wanted to have a look that'd be very welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants