-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big reboot #1
Merged
Merged
Big reboot #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… Arrow This PR implements a subset of data types and array functionality, supporting primitives, strings, and structs. I am actively developing this code and relying on it in my DataFusion project. It isn't fully Arrow compatible yet but I think this is a solid foundation to build from. Author: Andy Grove <andygrove73@gmail.com> Closes #1804 from andygrove/agrove/rust_contribution and squashes the following commits: 0623c0a <Andy Grove> re-implement Utf8 using List<T> 6aeb605 <Andy Grove> add test for creating schema with nested struct type 20dacc5 <Andy Grove> clean up imports 0711d35 <Andy Grove> bitmap uses buffer 3f5a2fd <Andy Grove> Refactor ArrayData to use Buffer<T> 4a4d696 <Andy Grove> add buffer type 847451f <Andy Grove> add buffer type 1de2db4 <Andy Grove> implement math ops on two same-typed arrays 9d95183 <Andy Grove> implement binary comparison ops for arrays 9364934 <Andy Grove> compare array 04dae97 <Andy Grove> compare array e23517b <Andy Grove> add comment bfb07c3 <Andy Grove> use macros to remove duplicate boilerplate code per primitive type dcc287d <Andy Grove> convert all primitive arrays to immutable aligned memory 86daf11 <Andy Grove> bug fixes 944e270 <Andy Grove> i32 now using aligned memory correctly 9905170 <Andy Grove> make validity bitmap optional 96699ec <Andy Grove> use allocated mem for array of i32 as example - need to do same for others, also fixed bitmap logic and default values 98b083f <Andy Grove> add memory util to allocate aligned memory using libc 693d269 <Andy Grove> packaging 3269d0d <Andy Grove> rename back to arrow 5c1dfee <Andy Grove> rename from arrow to apache-arrow 322539a <Andy Grove> update author and email to Apache Arrow and dev@arrow.apache.org 0e60948 <Andy Grove> add comments to address PR feedback aefa3f4 <Andy Grove> update authors 7702da6 <Andy Grove> add README with license header fb26398 <Andy Grove> remove readme for now - need to figure out how make rat allow it ecfa371 <Andy Grove> add license to cargo.toml fc3b5b7 <Andy Grove> example code c0cee6d <Andy Grove> README a2a51a5 <Andy Grove> packaging c85fbbc <Andy Grove> save 04ba046 <Andy Grove> struct test e80b440 <Andy Grove> save e2d563d <Andy Grove> initial code
…-data comment about memory alignment (#1817)
Note that this PR also moves some tests for comparing arrays from Array to Buffer<T> and removes some redundant code that was implemented before it was possible to get a type-safe Iterator from Buffer<T>. This change was made in this PR because the serde_json crate's macros pretty much forced me to address this now. Author: Andy Grove <andygrove73@gmail.com> Closes #1829 from andygrove/schema_json and squashes the following commits: 6b5281f <Andy Grove> fix issues that stopped code compiling with Rust 1.25.0 6af8963 <Andy Grove> rustfmt ce2e56d <Andy Grove> remove commented out code 0ba3a77 <Andy Grove> can parse types and fields from json c9ace3f <Andy Grove> implement to_json for DataType and Field
… aligned memory Also adds our first example Author: Andy Grove <andygrove73@gmail.com> Closes #1838 from andygrove/buffer_builder and squashes the following commits: 940ee5e <Andy Grove> add missing file, also rustfmt again b649f3a <Andy Grove> add missing file e4347c1 <Andy Grove> move builder into separate file 7fac96e <Andy Grove> rename Builder build() to finish() ee29eab <Andy Grove> examples 00fe0da <Andy Grove> Improve examples, add support for creating Array from Buffer c1383a8 <Andy Grove> ran rustfmt using nightly 89a9317 <Andy Grove> update README with real example 3d68f9c <Andy Grove> Create Builder<T> for building buffers with zero-copy on build
Author: Andy Grove <andygrove73@gmail.com> Closes #1860 from andygrove/benches and squashes the following commits: 9cdfce1 <Andy Grove> rustfmt 123bc86 <Andy Grove> add benchmark for creating array from builder cdfe796 <Andy Grove> Add first benches and fix bug where memory was never released
Author: Chao Sun <sunchao@apache.org> Closes #2014 from sunchao/code-coverage-badge and squashes the following commits: c39e91c8 <Chao Sun> ARROW-2557: Add badge for code coverage in README
Author: Andy Grove <andygrove73@gmail.com> Closes #2321 from andygrove/rust_0_10_0 and squashes the following commits: d17eb73d <Andy Grove> update version to 0.10.0
Author: Paddy Horan <paddyhoran@hotmail.com> Closes #2418 from paddyhoran/Issue-3035 and squashes the following commits: cbe4b17 <Paddy Horan> Fixed typo 7a7b64c <Paddy Horan> Fixed issues with Rust examples
This changes the existing `Buffer` class to be non-generic over type `T`, since a `Buffer` class should just represent a plain byte array and interpretation of the data within the buffer should be done on a higher level, such as in `Array`. While working on this, I found that I also need to make significant changes on the `Array` and `List` types, since they are currently heavily tied with the `Buffer<T>` implementation. The new implementation follows arrow-cpp and defines a `ArrayData` struct which provides the common operations on a Arrow array. Subtypes of `Array` then provide specific operations for the types they represent. For instance, one can get a primitive value at index `i` for `PrimitiveArray` type, or can get a column at index `i` for `StructArray`. I removed `List` since it's no longer necessary. Removed `PrimitiveArray::{min,max}` for now but plan to add them back. Author: Chao Sun <sunchao@uber.com> Closes #2330 from sunchao/ARROW-2583 and squashes the following commits: 91c580b8 <Chao Sun> Fix lint 0e8a8dc9 <Chao Sun> Address review comments 21b8d1df <Chao Sun> Fix lint 2493d122 <Chao Sun> Fix a few more issues and add more tests 383cc3ef <Chao Sun> More refactoring 2ee3cf95 <Chao Sun> Fix lint a29ae4a2 <Chao Sun> Fix test for is_aligned c1941651 <Chao Sun> Fix Buffer offset and test for Array alignment a3206cc5 <Chao Sun> Address review comments 18634481 <Chao Sun> Fix lint 1e8dab51 <Chao Sun> In is_aligned(), should use align_of instead of size_of 363e7cfc <Chao Sun> Fix bench. Change Buffer#copy() to Buffer#clone() 042796b4 <Chao Sun> Add check for pointer alignment 18e5dead <Chao Sun> Address comments 51327fed <Chao Sun> Address comments ac782f14 <Chao Sun> Remove commented out code 08fb8479 <Chao Sun> Fix to_bytes() collision and test failure c3c0f6c5 <Chao Sun> Fix style 83e1a1fd <Chao Sun> Bring back min and max for PrimitiveArray 7e57fd0d <Chao Sun> ARROW-2583: Buffer should be typeless
system. Add unit test label granularity options, ability to add component group targets like 'make parquet' that build libraries and tests Change-Id: I250fd10da3a9505952115b3cec18cd7cb5589bdb
This change enables support for DECIMALs backed by BYTE_ARRAYs on disk. It does this by creating a TransferFunctor routine that transforms a ByteArrayType to an ::arrow::Decimal128Type. The routine does this by: 1. Creating an arrow::BinaryArray from the RecordReader's builder 2. Allocating a buffer for the arrow::Decimal128Array 3. Converting the big-endian bytes in each BinaryArray entry to two integers representing the high and low bits of each decimal value. Author: Ted Haining <thaining@xcalar.com> Closes #2646 from thaining/parquet-1160-byte-array-decimals and squashes the following commits: 0bad382e <Ted Haining> Updated parquet-testing to SHA that includes necessary test files. 30f3a278 <Ted Haining> This change enables support for DECIMALs backed by BYTE_ARRAYs on disk. It does this by creating a TransferFunctor routine that transforms a ByteArrayType to an ::arrow::Decimal128Type.
Author: Chao Sun <sunchao@apache.org> Author: Krisztián Szűcs <szucs.krisztian@gmail.com> Closes #2903 from sunchao/ARROW-3664 and squashes the following commits: 0cdc0e1f <Krisztián Szűcs> fmt c9d94de7 <Chao Sun> ARROW-3664: Add benchmark for PrimitiveArrayBuilder
Author: Andy Grove <andygrove73@gmail.com> Author: Wes McKinney <wesm+git@apache.org> Closes #2823 from andygrove/ARROW-3601 and squashes the following commits: 5e67eb6cb <Wes McKinney> Slight tweaks, formatting 062a338e4 <Andy Grove> fix typo 5e99cff89 <Andy Grove> fix typo eb8c39296 <Andy Grove> Add instructions for publishing to crates.io
I will follow up with examples of `ListArrayBuilder` and `BinaryBuilder` when merged. The info in the readme keeps going out of date so it's probably better to build up the examples (which are tested by CI) and re-direct new users there. Author: Paddy Horan <paddyhoran@hotmail.com> Closes #2969 from paddyhoran/ARROW-3796 and squashes the following commits: 46699f97 <Paddy Horan> Fixed lint and comment 26cec305 <Paddy Horan> Updated CI to run new example 72d6b3ee <Paddy Horan> Updated readme. 2a0fe8ae <Paddy Horan> Added example
This adds a CSV reader and an example that accessed the loaded data through the use of downcasting arrays to specific types. The CSV reader supports all primitive types + string (`List<u8>`). Author: Andy Grove <andygrove73@gmail.com> Closes #2992 from andygrove/ARROW-3726 and squashes the following commits: 4d1bf98 <Andy Grove> Exclude Rust test data csv files from rat 70140c6 <Andy Grove> re-export csv::Reader 4674651 <Andy Grove> add missing file after rename 3ab0a47 <Andy Grove> create module for csv::reader 26857a3 <Andy Grove> Update Windows CI to run new example 5d43b1f <Andy Grove> Remove Arc<> ab1b20f <Andy Grove> use isEmpty() instead of len() == 0 2b9d9e1 <Andy Grove> Remove support for Float16 6167223 <Andy Grove> Remove support for Float16 80c44ca <Andy Grove> Use BinaryArray instead of List<u8> f726559 <Andy Grove> Merge branch 'master' into ARROW-3726 3928d6d <Andy Grove> Add documentation, rename CsvFile to CsvReader 247092d <Andy Grove> cargo fmt aae53aa <Andy Grove> add value_slice method, clean up example code e539814 <Andy Grove> update example to print city names to demonstrate usage of List<u8> 8974c60 <Andy Grove> Example displays data 638159d <Andy Grove> fix test 9e88791 <Andy Grove> Update CI script 517da28 <Andy Grove> Implement csv reader
…ly pre-empt already running tasks
This PR makes `Schema`, `Field`, `DataType` serializable using the serde crate. This approach supports serialization to numerous binary and text formats supported by the serde crate. The main benefit is to allow users of the Arrow crate to serialize structs that reference Arrow types (for example, allowing a logical query plan to be serialized and sent over the network). Note that this does not change the custom JSON serialization that is already in place for serializing in the specific format specified in `format/Metadata.md`. Author: Andy Grove <andygrove73@gmail.com> Closes #3016 from andygrove/ARROW-3855 and squashes the following commits: 329da92 <Andy Grove> Merge branch 'master' into ARROW-3855 d988cc6 <Andy Grove> cargo fmt bd8375d <Andy Grove> Schema/Field/Datatype now have derived serde traits
Author: Andy Grove <andygrove73@gmail.com> Closes #3105 from andygrove/ARROW-3883 and squashes the following commits: e01383548 <Andy Grove> update README
See https://hacks.mozilla.org/2018/12/rust-2018-is-here/ for more info. Changes: - Add `edition = "2018"` to Cargo.toml - Change lots of import statements due to changes in module system in Rust 2018 - Remove `mut` from one variable assignment Author: Andy Grove <andygrove73@gmail.com> Closes #3119 from andygrove/ARROW-3952 and squashes the following commits: bfebf9e4 <Andy Grove> cargo fmt bda17699 <Andy Grove> remove unnecessary extern crate statements 048b00cd <Andy Grove> Upgrade to Rust 2018 f843ad46 <Andy Grove> Save c60ac679 <Andy Grove> Specify Rust 2018
Author: Andy Grove <andygrove73@gmail.com> Closes #3096 from andygrove/ARROW-3885 and squashes the following commits: 7d15ee77 <Andy Grove> add commit step 0d98c2cf <Andy Grove> revert to 0.11.0 ready for next prepare step a7f60835 <Andy Grove> update release prepare step to increment Rust version ac6e5fc0 <Andy Grove> Set version to 0.11.0 and update prepare script b39b7c4b <Andy Grove> Update Rust version to 0.12.0
Author: Andy Grove <andygrove73@gmail.com> Closes #3033 from andygrove/ARROW-3880 and squashes the following commits: 17cd418 <Andy Grove> merge from master afb3518 <Andy Grove> Move min and max to array_ops 0c77c61 <Andy Grove> code cleanup f8bfb41 <Andy Grove> move comparison ops to array_ops 7a5975e <Andy Grove> Move math ops into new array_ops source file 7946142 <Andy Grove> Address PR feedback adfe4b0 <Andy Grove> merge from master and fix conflicts 5ed5f6e <Andy Grove> add comparison operations 42c68af <Andy Grove> re-implement with generics 963def6 <Andy Grove> Merge branch 'master' into ARROW-3880 729cd9a <Andy Grove> fix formatting 405c63e <Andy Grove> re-implement using macros 5876fb7 <Andy Grove> save work a2b87e2 <Andy Grove> merge from master, comment out new methods 2a43b3f <Andy Grove> merge from master 06bbc4a <Andy Grove> improve handling of divide by zero, format for rust nightly 1ea98cf <Andy Grove> Improve error handling dcad28a <Andy Grove> cargo fmt 12dc05b <Andy Grove> Implement simple math operations for numeric arrays
…sync; simplify the SerDe traits with ProcessSend
alecmocatta
added a commit
that referenced
this pull request
Aug 1, 2020
Improve performance including SIMD-acceleration
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This mega-PR includes:
cargo clippy
+fmt
clean#[derive(Data)]