Improved sha256 and sha512 assembler code versions available for x86 / x64 #82

invd · 2025-01-25T11:37:47Z

I recently noticed that @nayuki , the author of some of the assembler code used in this crate, has published performance-improved versions of their code at https://github.com/nayuki/Nayuki-web-published-code/tree/master/fast-sha2-hashes-in-x86-assembly in 2024. The different improvement steps are described in the git commit messages.

I'm aware of the "maintenance mode" status of the asm-hashes repository and the general goal of moving to Rust with inline assembly, but still wanted to flag this potential code improvement to the maintainers and potential other users who are interested in this crate.

For sha2/src/sha512_x64.S, the net performance improvement seen on two different AMD Zen3 CPUs was minor, in the range of ~1%. Performance improvements could be different on other CPUs or x86 architectures, though.

For modern CPUs produced in the last ~10 years, there are significant additional speedups possible, which I'll document in a separate issue.

The text was updated successfully, but these errors were encountered:

newpavlov · 2025-01-25T13:14:43Z

The new SHA512 instructions should be leveraged using intrinsics in the sha2 crate. Unfortunately, the relevant intrinsics and target features are currently unstable, so this new backend would have to be experimental (i.e gated on a crate feature or configuration flag).

The linked assembly could be a good reference point and we may use it for an asm!-based implementation in sha2, but we do not plan to use .S files in future versions of our crates.

invd · 2025-01-25T13:43:43Z

@newpavlov thanks for your quick feedback. To clarify, this particular issue #82 is about "there are slightly improved versions available for the code you already use from Nayuki", which would be a drop-in replacement except for minor details (#if snippets, whitespace). As far as I'm aware, they do not make use of any new instruction types.

#83 is about "you could use other asm code implementations with modern CPU acceleration instructions" in asm-hashes, which would probably lead to a +50% improvement for SHA512, and even more for SHA1/SHA256. I understand that this is a heavier lift and doesn't match your strategic plans, but wanted to document the possibility and related observations on the lower-than-expected performance of asm-hashes. From your strategic perspective of moving users to a native hashes solution, the documentation aspect of "asm-hashes is already slower than the native code on some hardware" is perhaps the most relevant to you.

The third aspect is potentially improving the native hashes implementation to be more competitive with modern asm code in other projects - that's probably best discussed in the other repository 🙂

tarcieri · 2025-01-25T17:32:48Z

he new SHA512 instructions should be leveraged using intrinsics in the sha2 crate. Unfortunately, the relevant intrinsics and target features are currently unstable, so this new backend would have to be experimental (i.e gated on crate feature or configuration flag).

We can do something similar to what we did for ARM for awhile, and "polyfill" the unstable intrinsics by making them wrappers for small bits of inline assembly which would otherwise be emitted by the intrinsics. Though, are ZMM registers still unstable?

newpavlov · 2025-01-25T21:50:49Z

_mm256_sha512* intrinsics work on YMM registers, so I guess it should be possible to polyfill them. Also, surprisingly, std does not have those intrinsics yet (or they were removed).

invd mentioned this issue Jan 25, 2025

sha256 and sha512 asm code does not use modern x64 CPU instructions #83

Open

invd mentioned this issue Jan 25, 2025

Potential speed improvements for SHA512 via BMI2 instructions RustCrypto/hashes#640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved sha256 and sha512 assembler code versions available for x86 / x64 #82

Improved sha256 and sha512 assembler code versions available for x86 / x64 #82

invd commented Jan 25, 2025 •

edited

Loading

newpavlov commented Jan 25, 2025 •

edited

Loading

invd commented Jan 25, 2025 •

edited

Loading

tarcieri commented Jan 25, 2025

newpavlov commented Jan 25, 2025

Improved sha256 and sha512 assembler code versions available for x86 / x64 #82

Improved sha256 and sha512 assembler code versions available for x86 / x64 #82

Comments

invd commented Jan 25, 2025 • edited Loading

newpavlov commented Jan 25, 2025 • edited Loading

invd commented Jan 25, 2025 • edited Loading

tarcieri commented Jan 25, 2025

newpavlov commented Jan 25, 2025

invd commented Jan 25, 2025 •

edited

Loading

newpavlov commented Jan 25, 2025 •

edited

Loading

invd commented Jan 25, 2025 •

edited

Loading