Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails with CPU does not support x86-64-v2 #49

Closed
rrnewton opened this issue Apr 4, 2024 · 3 comments
Closed

Fails with CPU does not support x86-64-v2 #49

rrnewton opened this issue Apr 4, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@rrnewton
Copy link
Contributor

rrnewton commented Apr 4, 2024

Describe the bug

On some OS/arch/glibc, we now fail quickly with this message:

Fatal glibc error: CPU does not support x86-64-v2

To Reproduce

Running examples/rand.py is sufficient.

Environment

  • linux 5.12.0
  • Intel(R) Xeon(R) CPU E5-2680
  • glibc = stable release version 2.34.

Initial Investigation Notes

Running with hermit run --log=info shows that this is an issue with arch_prctl returning EINVAL:

2024-04-04T15:56:54.694800Z  INFO detcore: DETLOG [syscall][detcore, dtid 3] finish syscall #3: arch_prctl(12289, 0x7fffffffd340) = Err(Errno(EINVAL))
@rrnewton rrnewton added the bug Something isn't working label Apr 4, 2024
@CookieComputing
Copy link

Just a note when I was investigating this:

Seems like AMD hosts do not have this issue because they bypass CPUID interception:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  166
  On-line CPU(s) list:   0-165
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC-Milan Processor
$ ./hermit run date
2024-04-04T17:55:26.176575Z  WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.177386Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.178854Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.199113Z  WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.200272Z  WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.204414Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.209334Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.245555Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
Fri Dec 31 03:59:59 PM PST 2021
2024-04-04T17:55:26.283112Z  WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (2 tasks). Need to record this for reproducibility.

An Intel host, however, seems to intercept the cpuid properly and runs into the issue as described:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  72
  On-line CPU(s) list:   0-71
Vendor ID:               GenuineIntel
  Model name:            Intel Core Processor (Broadwell)
$ ./hermit run date
Fatal glibc error: CPU does not support x86-64-v2

If I run --no-virtualize-cpuid on the Intel host, I can confirm that it works:

$ ./hermit run --no-virtualize-cpuid date
Fri Dec 31 03:59:59 PM PST 2021
2024-04-04T17:58:23.540958Z  WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (3 tasks). Need to record this for reproducibility.

This makes me think that the CPUID bits in cpuid.rs are not toggled properly. It could, however, also be something related to intercept_cpuid maybe

@CookieComputing
Copy link

I think it's just an issue with what CPUID flags we expose. According to https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels, we need to enable flags for SSE3 and SSE4, among other things.

Using this page to find out the appropriate bits to toggle (https://www.felixcloutier.com/x86/cpuid), I was able to check this to work on an Intel host.

I can make hermit run with this change:

$ hg diff
diff --git a/hermetic_infra/hermit/detcore/src/cpuid.rs b/hermetic_infra/hermit/detcore/src/cpuid.rs
--- a/hermetic_infra/hermit/detcore/src/cpuid.rs
+++ b/hermetic_infra/hermit/detcore/src/cpuid.rs
@@ -40,7 +40,12 @@
 // masked off to prevent non-determinism.
 const CPUIDS: &[CpuIdResult] = &[
     cpuid_result(0x0000000D, 0x756E6547, 0x6C65746E, 0x49656E69),
-    cpuid_result(0x00000663, 0x00000800, 0x90202001, 0x078BFBFD),
+    cpuid_result(
+        0x00000663,
+        0x00000800,
+        0x90202001 | (1 << 0) | (1 << 9) | (1 << 13) | (1 << 19) | (1 << 20) | (1 << 23),
+        0x078BFBFD,
+    ),
     cpuid_result(0x00000001, 0x00000000, 0x0000004D, 0x002C307D),
     cpuid_result(0x00000000, 0x00000000, 0x00000000, 0x00000000),
     cpuid_result(0x00000120, 0x01C0003F, 0x0000003F, 0x00000001),

$ buck2 run //hermetic_infra/hermit/hermit-cli:hermit -- run date
...
BUILD SUCCEEDED
Fri Dec 31 03:59:59 PM PST 2021
2024-04-08T15:23:28.398395Z  WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (3 tasks). Need to record this for reproducibility.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  72
  On-line CPU(s) list:   0-71
Vendor ID:               GenuineIntel
  Model name:            Intel Core Processor (Broadwell)
...

facebook-github-bot pushed a commit that referenced this issue Apr 10, 2024
Summary:
Intel hosts without this diff are running into an issue where standard linux binaries are saying the following:

```
$ hermit run date
Fatal glibc error: CPU does not support x86-64-v2
```

This is unlike the AMD hosts because the AMD hosts do not actually properly intercept cpuid instructions (another issue we should deal with...). So they have this feature enabled since it's the equivalent of `no-virtualize-cpuid`.

Looking online, it seems like we need to just enable the flags required to support that architectural level. More details in #49 (comment)

Reviewed By: jasonwhite

Differential Revision: D55874608

fbshipit-source-id: 619116fe2f3a9d0bcfd667f1c6db26c031cee640
@CookieComputing
Copy link

bd3153b should close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants