Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-456: Add jemalloc based MemoryPool #270

Closed
wants to merge 14 commits into from

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Jan 6, 2017

Runtimes of the builder-benchmark:

BM_BuildPrimitiveArrayNoNulls/repeats:3               901 ms        889 ms          1   576.196MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3               833 ms        829 ms          1     617.6MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3               825 ms        821 ms          1   623.855MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3_mean          853 ms        846 ms          1   605.884MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev         34 ms         30 ms          0    21.147MB/s
BM_BuildVectorNoNulls/repeats:3                       712 ms        701 ms          1   729.866MB/s
BM_BuildVectorNoNulls/repeats:3                       671 ms        670 ms          1   764.464MB/s
BM_BuildVectorNoNulls/repeats:3                       688 ms        681 ms          1   751.285MB/s
BM_BuildVectorNoNulls/repeats:3_mean                  690 ms        684 ms          1   748.538MB/s
BM_BuildVectorNoNulls/repeats:3_stddev                 17 ms         13 ms          0   14.2578MB/s

With an aligned Reallocate, the jemalloc version is 50% faster and even outperforms std::vector:

BM_BuildPrimitiveArrayNoNulls/repeats:3               565 ms        559 ms          1   916.516MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3               540 ms        537 ms          1   952.727MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3               544 ms        543 ms          1   942.948MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3_mean          550 ms        546 ms          1   937.397MB/s
BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev         11 ms          9 ms          0   15.2949MB/s

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, very cool results, with the minor question about the PyArrow memory pool (which doesn't need to be addressed in this patch, but if not we should create a follow up JIRA)

bytes_allocated_ += new_size - old_size;

return Status::OK();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should simply compose the default Arrow memory pool (but accept any implementation) in this class and account for our own Python memory allocations (effectively this is a suballocator, then, but we haven't implemented general child allocators yet)? This way we can use whatever allocator we want in Python

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asfgit asfgit closed this in 5bf6ae4 Jan 6, 2017
@xhochy xhochy deleted the ARROW-456 branch January 6, 2017 14:58
wesm added a commit to wesm/arrow that referenced this pull request Sep 2, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS
wesm added a commit to wesm/arrow that referenced this pull request Sep 4, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
wesm added a commit to wesm/arrow that referenced this pull request Sep 6, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
wesm added a commit to wesm/arrow that referenced this pull request Sep 7, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
wesm added a commit to wesm/arrow that referenced this pull request Sep 8, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
@jon-chuang
Copy link

jon-chuang commented Feb 5, 2022

Jemalloc with Reallocate shows an improvement on a single thread. What happens when there are multiple concurrent allocations?

(my understanding is that jemalloc tends to maintain separate heaps, similar to glibc's arenas, partly to facilitate concurrrency. So concurrent allocations would not interfere? Especially with large-ish allocations and sufficient total memory pool size resulting in unused-page placements)

Are there hints one can drop to jemalloc that one intends to grow the block to increase the chance of growing the allocation in-place

@wesm
Copy link
Member

wesm commented Feb 16, 2022

I would suggest raising this on the mailing list or in a new GitHub issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants