-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-456: Add jemalloc based MemoryPool #270
Conversation
Change-Id: I649da045944f87571c2b84e0e3f710b93958bd2b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, very cool results, with the minor question about the PyArrow memory pool (which doesn't need to be addressed in this patch, but if not we should create a follow up JIRA)
bytes_allocated_ += new_size - old_size; | ||
|
||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should simply compose the default Arrow memory pool (but accept any implementation) in this class and account for our own Python memory allocations (effectively this is a suballocator, then, but we haven't implemented general child allocators yet)? This way we can use whatever allocator we want in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made https://issues.apache.org/jira/browse/ARROW-457 and https://issues.apache.org/jira/browse/ARROW-458 as follow-ups for this.
I am working on ARROW-865 which exposes these to Python users Closes apache#270 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#311 from wesm/PARQUET-915 and squashes the following commits: 0a89639 [Wes McKinney] Add test for time64[ns] 6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond] 37c1b42 [Wes McKinney] cpplint 5167a7a [Wes McKinney] Add unit test for date64->date32 cast 440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts e626ebd [Wes McKinney] Use inline visitor in LevelBuilder 2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types 3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS
I am working on ARROW-865 which exposes these to Python users Closes apache#270 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#311 from wesm/PARQUET-915 and squashes the following commits: 0a89639 [Wes McKinney] Add test for time64[ns] 6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond] 37c1b42 [Wes McKinney] cpplint 5167a7a [Wes McKinney] Add unit test for date64->date32 cast 440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts e626ebd [Wes McKinney] Use inline visitor in LevelBuilder 2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types 3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
I am working on ARROW-865 which exposes these to Python users Closes apache#270 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#311 from wesm/PARQUET-915 and squashes the following commits: 0a89639 [Wes McKinney] Add test for time64[ns] 6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond] 37c1b42 [Wes McKinney] cpplint 5167a7a [Wes McKinney] Add unit test for date64->date32 cast 440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts e626ebd [Wes McKinney] Use inline visitor in LevelBuilder 2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types 3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
I am working on ARROW-865 which exposes these to Python users Closes apache#270 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#311 from wesm/PARQUET-915 and squashes the following commits: 0a89639 [Wes McKinney] Add test for time64[ns] 6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond] 37c1b42 [Wes McKinney] cpplint 5167a7a [Wes McKinney] Add unit test for date64->date32 cast 440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts e626ebd [Wes McKinney] Use inline visitor in LevelBuilder 2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types 3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
I am working on ARROW-865 which exposes these to Python users Closes apache#270 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#311 from wesm/PARQUET-915 and squashes the following commits: 0a89639 [Wes McKinney] Add test for time64[ns] 6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond] 37c1b42 [Wes McKinney] cpplint 5167a7a [Wes McKinney] Add unit test for date64->date32 cast 440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts e626ebd [Wes McKinney] Use inline visitor in LevelBuilder 2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types 3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
Jemalloc with Reallocate shows an improvement on a single thread. What happens when there are multiple concurrent allocations? (my understanding is that jemalloc tends to maintain separate heaps, similar to glibc's arenas, partly to facilitate concurrrency. So concurrent allocations would not interfere? Especially with large-ish allocations and sufficient total memory pool size resulting in unused-page placements) Are there hints one can drop to jemalloc that one intends to grow the block to increase the chance of growing the allocation in-place |
I would suggest raising this on the mailing list or in a new GitHub issue |
Runtimes of the
builder-benchmark
:With an aligned
Reallocate
, the jemalloc version is 50% faster and even outperformsstd::vector
: