Skip to content

Commit 689cd27

Browse files
committed
ARROW-245: add endianness to RecordBatch
Author: Julien Le Dem <julien@dremio.com> Closes #113 from julienledem/arrow_245_endianness and squashes the following commits: e4cd749 [Julien Le Dem] fix linter error c727844 [Julien Le Dem] Fix NOTICE; typo; doc wording 88aaee3 [Julien Le Dem] move endianness to Schema e5f7355 [Julien Le Dem] clarifying big endian support 36caf3c [Julien Le Dem] autodetect endianness 7477de1 [Julien Le Dem] update Layout.md endianness; add image source file eea3edd [Julien Le Dem] update cpp to use the new field 9b56874 [Julien Le Dem] ARROW-245: add endianness to RecordBatch
1 parent e8724f8 commit 689cd27

File tree

6 files changed

+42
-3
lines changed

6 files changed

+42
-3
lines changed

NOTICE.txt

+5
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,8 @@ The Apache Software Foundation (http://www.apache.org/).
77
This product includes software from the SFrame project (BSD, 3-clause).
88
* Copyright (C) 2015 Dato, Inc.
99
* Copyright (c) 2009 Carnegie Mellon University.
10+
11+
This product includes software from the Numpy project (BSD-new)
12+
https://github.com/numpy/numpy/blob/e1f191c46f2eebd6cb892a4bfe14d9dd43a06c4e/numpy/core/src/multiarray/multiarraymodule.c#L2910
13+
* Copyright (c) 1995, 1996, 1997 Jim Hugunin, hugunin@mit.edu
14+
* Copyright (c) 2005 Travis E. Oliphant oliphant@ee.byu.edu Brigham Young University

cpp/src/arrow/ipc/metadata-internal.cc

+18-2
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,17 @@ Status FieldFromFlatbuffer(const flatbuf::Field* field, std::shared_ptr<Field>*
243243

244244
// Implement MessageBuilder
245245

246+
// will return the endianness of the system we are running on
247+
// based the NUMPY_API function. See NOTICE.txt
248+
flatbuf::Endianness endianness() {
249+
union {
250+
uint32_t i;
251+
char c[4];
252+
} bint = {0x01020304};
253+
254+
return bint.c[0] == 1 ? flatbuf::Endianness_Big : flatbuf::Endianness_Little;
255+
}
256+
246257
Status MessageBuilder::SetSchema(const Schema* schema) {
247258
header_type_ = flatbuf::MessageHeader_Schema;
248259

@@ -254,7 +265,11 @@ Status MessageBuilder::SetSchema(const Schema* schema) {
254265
field_offsets.push_back(offset);
255266
}
256267

257-
header_ = flatbuf::CreateSchema(fbb_, fbb_.CreateVector(field_offsets)).Union();
268+
header_ = flatbuf::CreateSchema(
269+
fbb_,
270+
endianness(),
271+
fbb_.CreateVector(field_offsets))
272+
.Union();
258273
body_length_ = 0;
259274
return Status::OK();
260275
}
@@ -263,7 +278,8 @@ Status MessageBuilder::SetRecordBatch(int32_t length, int64_t body_length,
263278
const std::vector<flatbuf::FieldNode>& nodes,
264279
const std::vector<flatbuf::Buffer>& buffers) {
265280
header_type_ = flatbuf::MessageHeader_RecordBatch;
266-
header_ = flatbuf::CreateRecordBatch(fbb_, length, fbb_.CreateVectorOfStructs(nodes),
281+
header_ = flatbuf::CreateRecordBatch(fbb_, length,
282+
fbb_.CreateVectorOfStructs(nodes),
267283
fbb_.CreateVectorOfStructs(buffers))
268284
.Union();
269285
body_length_ = body_length;

format/Arrow.graffle

3.56 KB
Binary file not shown.

format/Arrow.png

84.6 KB
Loading

format/Layout.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,14 @@ Base requirements
7878

7979
## Byte Order ([Endianness][3])
8080

81-
The Arrow format is little endian.
81+
The Arrow format is little endian by default.
82+
The Schema metadata has an endianness field indicating endianness of RecordBatches.
83+
Typically this is the endianness of the system where the RecordBatch was generated.
84+
The main use case is exchanging RecordBatches between systems with the same Endianness.
85+
At first we will return an error when trying to read a Schema with an endianness
86+
that does not match the underlying system. The reference implementation is focused on
87+
Little Endian and provides tests for it. Eventually we may provide automatic conversion
88+
via byte swapping.
8289

8390
## Alignment and Padding
8491

format/Message.fbs

+11
Original file line numberDiff line numberDiff line change
@@ -87,10 +87,21 @@ table Field {
8787
children: [Field];
8888
}
8989

90+
/// ----------------------------------------------------------------------
91+
/// Endianness of the platform that produces the RecordBatch
92+
93+
enum Endianness:int { Little, Big }
94+
9095
/// ----------------------------------------------------------------------
9196
/// A Schema describes the columns in a row batch
9297

9398
table Schema {
99+
100+
/// endianness of the buffer
101+
/// it is Little Endian by default
102+
/// if endianness doesn't match the underlying system then the vectors need to be converted
103+
endianness: Endianness=Little;
104+
94105
fields: [Field];
95106
}
96107

0 commit comments

Comments
 (0)