Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed get count threads for multi-cpu system with NUMA architecture #84842

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GermanAizek
Copy link

Fixed very old problem that only on any Windows NT and modern Windows Server 😆

https://developercommunity.visualstudio.com/t/hardware-concurrency-returns-an-incorrect-result/350854

https://stackoverflow.com/questions/31209256/reliable-way-to-programmatically-get-the-number-of-hardware-threads-on-windows

Why this commit is useful not only for server configurations, now a very cheap PC configuration is building from Xeon E54xx, X34xx, E3-xxxx, E5-xxxx, E7-xxx, any Silver, any Gold, any Platinum series, cheapest on LGA 2011v3 socket two-socket board with NUMA support is cheap on Alibaba, Baidu or Aliexpress.

Example: https://www.alibaba.com/product-detail/DDR4-x99-dual-cpu-Lga2011-V3_1600443686429.html

AThousandShips
AThousandShips previously approved these changes Nov 13, 2023
#if defined(_WIN32) || defined(_WIN64)
DWORD length = 0;
int concurrency = 0;
const auto validConcurrency = [&concurrency]() noexcept -> int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid lambdas unless necessary

}
DWORD i = 0;
while (i < length) {
const auto *proc = reinterpret_cast<PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX>(mem + i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto is not allowed

@akien-mga
Copy link
Member

core/os/os.h is the abstraction which all platform build upon. You should add Windows specific code there.

It's a virtual method, so you should update the Windows specific one in platform/windows/os_windows.cpp.

@AThousandShips AThousandShips dismissed their stale review November 13, 2023 13:15

Accidentally approved

Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, misclick as approve

@AThousandShips AThousandShips added this to the 4.x milestone Nov 13, 2023
Copy link
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetLogicalProcessorInformationEx usage would be reasonable (I do not have any hardware to test it) if the goal was to get physical core count. But limitation is likely intentional, since std::thread::hardware_concurrency is intended for thread creation (same for OS::get_processor_count()) and thread are usually limited to the one node.

It's something that should be accounted in #76932 implementation, but not in get_processor_count().

@@ -359,7 +362,46 @@ String OS::get_unique_id() const {
}

int OS::get_processor_count() const {
#if defined(_WIN32) || defined(_WIN64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows specific code should be override in OS_Widnows.

#if defined(_WIN32) || defined(_WIN64)
DWORD length = 0;
int concurrency = 0;
const auto validConcurrency = [&concurrency]() noexcept -> int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use auto, and lamdas only used if absolutely necessary.

if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) {
return validConcurrency();
}
std::unique_ptr<void, void (*)(void *)> buffer(std::malloc(length), std::free);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for std::unique_ptr, use memalloc / memfree instead of custom std::malloc. Or Vector<uint8_t>.

@GermanAizek
Copy link
Author

@bruvzg

It's something that should be accounted in #76932 implementation, but not in get_processor_count().

Should I continue fix my pr with your comments? Or have they already started doing this before me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants