Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big reboot #1

Merged
merged 182 commits into from
Sep 9, 2019
Merged

Big reboot #1

merged 182 commits into from
Sep 9, 2019

Conversation

alecmocatta
Copy link
Member

This mega-PR includes:

  • Updating dependencies
  • Migrating to 2018 edition
  • cargo clippy + fmt clean
  • Switch from travis/appveyor/circle to Azure Pipelines
  • Passing CI tests on Linux + macOS
  • Vendoring forked parquet implementation
  • Restructure into sub-crates amadeus-{aws, core, derive, parquet, postgres, serde, types}
  • #[derive(Data)]

andygrove and others added 30 commits March 31, 2018 13:37
… Arrow

This PR implements a subset of data types and array functionality, supporting primitives, strings, and structs.

I am actively developing this code and relying on it in my DataFusion project.

It isn't fully Arrow compatible yet but I think this is a solid foundation to build from.

Author: Andy Grove <andygrove73@gmail.com>

Closes #1804 from andygrove/agrove/rust_contribution and squashes the following commits:

0623c0a <Andy Grove> re-implement Utf8 using List<T>
6aeb605 <Andy Grove> add test for creating schema with nested struct type
20dacc5 <Andy Grove> clean up imports
0711d35 <Andy Grove> bitmap uses buffer
3f5a2fd <Andy Grove> Refactor ArrayData to use Buffer<T>
4a4d696 <Andy Grove> add buffer type
847451f <Andy Grove> add buffer type
1de2db4 <Andy Grove> implement math ops on two same-typed arrays
9d95183 <Andy Grove> implement binary comparison ops for arrays
9364934 <Andy Grove> compare array
04dae97 <Andy Grove> compare array
e23517b <Andy Grove> add comment
bfb07c3 <Andy Grove> use macros to remove duplicate boilerplate code per primitive type
dcc287d <Andy Grove> convert all primitive arrays to immutable aligned memory
86daf11 <Andy Grove> bug fixes
944e270 <Andy Grove> i32 now using aligned memory correctly
9905170 <Andy Grove> make validity bitmap optional
96699ec <Andy Grove> use allocated mem for array of i32 as example - need to do same for others, also fixed bitmap logic and default values
98b083f <Andy Grove> add memory util to allocate aligned memory using libc
693d269 <Andy Grove> packaging
3269d0d <Andy Grove> rename back to arrow
5c1dfee <Andy Grove> rename from arrow to apache-arrow
322539a <Andy Grove> update author and email to Apache Arrow and dev@arrow.apache.org
0e60948 <Andy Grove> add comments to address PR feedback
aefa3f4 <Andy Grove> update authors
7702da6 <Andy Grove> add README with license header
fb26398 <Andy Grove> remove readme for now - need to figure out how make rat allow it
ecfa371 <Andy Grove> add license to cargo.toml
fc3b5b7 <Andy Grove> example code
c0cee6d <Andy Grove> README
a2a51a5 <Andy Grove> packaging
c85fbbc <Andy Grove> save
04ba046 <Andy Grove> struct test
e80b440 <Andy Grove> save
e2d563d <Andy Grove> initial code
Note that this PR also moves some tests for comparing arrays from Array to Buffer<T> and removes some redundant code that was implemented before it was possible to get a type-safe Iterator from Buffer<T>.

This change was made in this PR because the serde_json crate's macros pretty much forced me to address this now.

Author: Andy Grove <andygrove73@gmail.com>

Closes #1829 from andygrove/schema_json and squashes the following commits:

6b5281f <Andy Grove> fix issues that stopped code compiling with Rust 1.25.0
6af8963 <Andy Grove> rustfmt
ce2e56d <Andy Grove> remove commented out code
0ba3a77 <Andy Grove> can parse types and fields from json
c9ace3f <Andy Grove> implement to_json for DataType and Field
… aligned memory

Also adds our first example

Author: Andy Grove <andygrove73@gmail.com>

Closes #1838 from andygrove/buffer_builder and squashes the following commits:

940ee5e <Andy Grove> add missing file, also rustfmt again
b649f3a <Andy Grove> add missing file
e4347c1 <Andy Grove> move builder into separate file
7fac96e <Andy Grove> rename Builder build() to finish()
ee29eab <Andy Grove> examples
00fe0da <Andy Grove> Improve examples, add support for creating Array from Buffer
c1383a8 <Andy Grove> ran rustfmt using nightly
89a9317 <Andy Grove> update README with real example
3d68f9c <Andy Grove> Create Builder<T> for building buffers with zero-copy on build
Author: Andy Grove <andygrove73@gmail.com>

Closes #1860 from andygrove/benches and squashes the following commits:

9cdfce1 <Andy Grove> rustfmt
123bc86 <Andy Grove> add benchmark for creating array from builder
cdfe796 <Andy Grove> Add first benches and fix bug where memory was never released
Author: Chao Sun <sunchao@apache.org>

Closes #2014 from sunchao/code-coverage-badge and squashes the following commits:

c39e91c8 <Chao Sun> ARROW-2557:  Add badge for code coverage in README
Author: Andy Grove <andygrove73@gmail.com>

Closes #2321 from andygrove/rust_0_10_0 and squashes the following commits:

d17eb73d <Andy Grove> update version to 0.10.0
Author: Paddy Horan <paddyhoran@hotmail.com>

Closes #2418 from paddyhoran/Issue-3035 and squashes the following commits:

cbe4b17 <Paddy Horan> Fixed typo
7a7b64c <Paddy Horan> Fixed issues with Rust examples
This changes the existing `Buffer` class to be non-generic over type `T`, since a `Buffer` class should just represent a plain byte array and interpretation of the data within the buffer should be done on a higher level, such as in `Array`.

While working on this, I found that I also need to make significant changes on the `Array` and `List` types, since they are currently heavily tied with the `Buffer<T>` implementation. The new implementation follows arrow-cpp and defines a `ArrayData` struct which provides the common operations on a Arrow array. Subtypes of `Array` then provide specific operations for the types they represent. For instance, one can get a primitive value at index `i` for `PrimitiveArray` type, or can get a column at index `i` for `StructArray`.

I removed `List` since it's no longer necessary. Removed `PrimitiveArray::{min,max}` for now but plan to add them back.

Author: Chao Sun <sunchao@uber.com>

Closes #2330 from sunchao/ARROW-2583 and squashes the following commits:

91c580b8 <Chao Sun> Fix lint
0e8a8dc9 <Chao Sun> Address review comments
21b8d1df <Chao Sun> Fix lint
2493d122 <Chao Sun> Fix a few more issues and add more tests
383cc3ef <Chao Sun> More refactoring
2ee3cf95 <Chao Sun> Fix lint
a29ae4a2 <Chao Sun> Fix test for is_aligned
c1941651 <Chao Sun> Fix Buffer offset and test for Array alignment
a3206cc5 <Chao Sun> Address review comments
18634481 <Chao Sun> Fix lint
1e8dab51 <Chao Sun> In is_aligned(), should use align_of instead of size_of
363e7cfc <Chao Sun> Fix bench. Change Buffer#copy() to Buffer#clone()
042796b4 <Chao Sun> Add check for pointer alignment
18e5dead <Chao Sun> Address comments
51327fed <Chao Sun> Address comments
ac782f14 <Chao Sun> Remove commented out code
08fb8479 <Chao Sun> Fix to_bytes() collision and test failure
c3c0f6c5 <Chao Sun> Fix style
83e1a1fd <Chao Sun> Bring back min and max for PrimitiveArray
7e57fd0d <Chao Sun>  ARROW-2583:  Buffer should be typeless
system. Add unit test label granularity options, ability to add component group
targets like 'make parquet' that build libraries and tests

Change-Id: I250fd10da3a9505952115b3cec18cd7cb5589bdb
This change enables support for DECIMALs backed by BYTE_ARRAYs on disk. It does this by creating a TransferFunctor routine that transforms a ByteArrayType to an ::arrow::Decimal128Type.

The routine does this by:
1. Creating an arrow::BinaryArray from the RecordReader's builder
2. Allocating a buffer for the arrow::Decimal128Array
3. Converting the big-endian bytes in each BinaryArray entry to two integers
   representing the high and low bits of each decimal value.

Author: Ted Haining <thaining@xcalar.com>

Closes #2646 from thaining/parquet-1160-byte-array-decimals and squashes the following commits:

0bad382e <Ted Haining> Updated parquet-testing to SHA that includes necessary test files.
30f3a278 <Ted Haining> This change enables support for DECIMALs backed by BYTE_ARRAYs on disk. It does this by creating a TransferFunctor routine that transforms a ByteArrayType to an ::arrow::Decimal128Type.
Author: Chao Sun <sunchao@apache.org>
Author: Krisztián Szűcs <szucs.krisztian@gmail.com>

Closes #2903 from sunchao/ARROW-3664 and squashes the following commits:

0cdc0e1f <Krisztián Szűcs> fmt
c9d94de7 <Chao Sun> ARROW-3664:  Add benchmark for PrimitiveArrayBuilder
Author: Andy Grove <andygrove73@gmail.com>
Author: Wes McKinney <wesm+git@apache.org>

Closes #2823 from andygrove/ARROW-3601 and squashes the following commits:

5e67eb6cb <Wes McKinney> Slight tweaks, formatting
062a338e4 <Andy Grove> fix typo
5e99cff89 <Andy Grove> fix typo
eb8c39296 <Andy Grove> Add instructions for publishing to crates.io
I will follow up with examples of `ListArrayBuilder` and `BinaryBuilder` when merged.  The info in the readme keeps going out of date so it's probably better to build up the examples (which are tested by CI) and re-direct new users there.

Author: Paddy Horan <paddyhoran@hotmail.com>

Closes #2969 from paddyhoran/ARROW-3796 and squashes the following commits:

46699f97 <Paddy Horan> Fixed lint and comment
26cec305 <Paddy Horan> Updated CI to run new example
72d6b3ee <Paddy Horan> Updated readme.
2a0fe8ae <Paddy Horan> Added example
This adds a CSV reader and an example that accessed the loaded data through the use of downcasting arrays to specific types. The CSV reader supports all primitive types + string (`List<u8>`).

Author: Andy Grove <andygrove73@gmail.com>

Closes #2992 from andygrove/ARROW-3726 and squashes the following commits:

4d1bf98 <Andy Grove> Exclude Rust test data csv files from rat
70140c6 <Andy Grove> re-export csv::Reader
4674651 <Andy Grove> add missing file after rename
3ab0a47 <Andy Grove> create module for csv::reader
26857a3 <Andy Grove> Update Windows CI to run new example
5d43b1f <Andy Grove> Remove Arc<>
ab1b20f <Andy Grove> use isEmpty() instead of len() == 0
2b9d9e1 <Andy Grove> Remove support for Float16
6167223 <Andy Grove> Remove support for Float16
80c44ca <Andy Grove> Use BinaryArray instead of List<u8>
f726559 <Andy Grove> Merge branch 'master' into ARROW-3726
3928d6d <Andy Grove> Add documentation, rename CsvFile to CsvReader
247092d <Andy Grove> cargo fmt
aae53aa <Andy Grove> add value_slice method, clean up example code
e539814 <Andy Grove> update example to print city names to demonstrate usage of List<u8>
8974c60 <Andy Grove> Example displays data
638159d <Andy Grove> fix test
9e88791 <Andy Grove> Update CI script
517da28 <Andy Grove> Implement csv reader
This PR makes `Schema`, `Field`, `DataType` serializable using the serde crate. This approach supports serialization to numerous binary and text formats supported by the serde crate.

The main benefit is to allow users of the Arrow crate to serialize structs that reference Arrow types (for example, allowing a logical query plan to be serialized and sent over the network).

Note that this does not change the custom JSON serialization that is already in place for serializing in the specific format specified in `format/Metadata.md`.

Author: Andy Grove <andygrove73@gmail.com>

Closes #3016 from andygrove/ARROW-3855 and squashes the following commits:

329da92 <Andy Grove> Merge branch 'master' into ARROW-3855
d988cc6 <Andy Grove> cargo fmt
bd8375d <Andy Grove> Schema/Field/Datatype now have derived serde traits
Author: Andy Grove <andygrove73@gmail.com>

Closes #3105 from andygrove/ARROW-3883 and squashes the following commits:

e01383548 <Andy Grove> update README
See https://hacks.mozilla.org/2018/12/rust-2018-is-here/ for more info.

Changes:

- Add `edition = "2018"` to Cargo.toml
- Change lots of import statements due to changes in module system in Rust 2018
- Remove `mut` from one variable assignment

Author: Andy Grove <andygrove73@gmail.com>

Closes #3119 from andygrove/ARROW-3952 and squashes the following commits:

bfebf9e4 <Andy Grove> cargo fmt
bda17699 <Andy Grove> remove unnecessary extern crate statements
048b00cd <Andy Grove> Upgrade to Rust 2018
f843ad46 <Andy Grove> Save
c60ac679 <Andy Grove> Specify Rust 2018
Author: Andy Grove <andygrove73@gmail.com>

Closes #3096 from andygrove/ARROW-3885 and squashes the following commits:

7d15ee77 <Andy Grove> add commit step
0d98c2cf <Andy Grove> revert to 0.11.0 ready for next prepare step
a7f60835 <Andy Grove> update release prepare step to increment Rust version
ac6e5fc0 <Andy Grove> Set version to 0.11.0 and update prepare script
b39b7c4b <Andy Grove> Update Rust version to 0.12.0
Author: Andy Grove <andygrove73@gmail.com>

Closes #3033 from andygrove/ARROW-3880 and squashes the following commits:

17cd418 <Andy Grove> merge from master
afb3518 <Andy Grove> Move min and max to array_ops
0c77c61 <Andy Grove> code cleanup
f8bfb41 <Andy Grove> move comparison ops to array_ops
7a5975e <Andy Grove> Move math ops into new array_ops source file
7946142 <Andy Grove> Address PR feedback
adfe4b0 <Andy Grove> merge from master and fix conflicts
5ed5f6e <Andy Grove> add comparison operations
42c68af <Andy Grove> re-implement with generics
963def6 <Andy Grove> Merge branch 'master' into ARROW-3880
729cd9a <Andy Grove> fix formatting
405c63e <Andy Grove> re-implement using macros
5876fb7 <Andy Grove> save work
a2b87e2 <Andy Grove> merge from master, comment out new methods
2a43b3f <Andy Grove> merge from master
06bbc4a <Andy Grove> improve handling of divide by zero, format for rust nightly
1ea98cf <Andy Grove> Improve error handling
dcad28a <Andy Grove> cargo fmt
12dc05b <Andy Grove> Implement simple math operations for numeric arrays
@alecmocatta alecmocatta merged commit 10399b2 into master Sep 9, 2019
@alecmocatta alecmocatta deleted the wip branch September 9, 2019 09:25
alecmocatta added a commit that referenced this pull request Aug 1, 2020
Improve performance including SIMD-acceleration
alecmocatta pushed a commit that referenced this pull request Aug 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.