Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved sha256 and sha512 assembler code versions available for x86 / x64 #82

Open
invd opened this issue Jan 25, 2025 · 4 comments
Open

Comments

@invd
Copy link

invd commented Jan 25, 2025

I recently noticed that @nayuki , the author of some of the assembler code used in this crate, has published performance-improved versions of their code at https://github.com/nayuki/Nayuki-web-published-code/tree/master/fast-sha2-hashes-in-x86-assembly in 2024. The different improvement steps are described in the git commit messages.

I'm aware of the "maintenance mode" status of the asm-hashes repository and the general goal of moving to Rust with inline assembly, but still wanted to flag this potential code improvement to the maintainers and potential other users who are interested in this crate.

For sha2/src/sha512_x64.S, the net performance improvement seen on two different AMD Zen3 CPUs was minor, in the range of ~1%. Performance improvements could be different on other CPUs or x86 architectures, though.

For modern CPUs produced in the last ~10 years, there are significant additional speedups possible, which I'll document in a separate issue.

@newpavlov
Copy link
Member

newpavlov commented Jan 25, 2025

The new SHA512 instructions should be leveraged using intrinsics in the sha2 crate. Unfortunately, the relevant intrinsics and target features are currently unstable, so this new backend would have to be experimental (i.e gated on a crate feature or configuration flag).

The linked assembly could be a good reference point and we may use it for an asm!-based implementation in sha2, but we do not plan to use .S files in future versions of our crates.

@invd
Copy link
Author

invd commented Jan 25, 2025

@newpavlov thanks for your quick feedback. To clarify, this particular issue #82 is about "there are slightly improved versions available for the code you already use from Nayuki", which would be a drop-in replacement except for minor details (#if snippets, whitespace). As far as I'm aware, they do not make use of any new instruction types.

#83 is about "you could use other asm code implementations with modern CPU acceleration instructions" in asm-hashes, which would probably lead to a +50% improvement for SHA512, and even more for SHA1/SHA256. I understand that this is a heavier lift and doesn't match your strategic plans, but wanted to document the possibility and related observations on the lower-than-expected performance of asm-hashes. From your strategic perspective of moving users to a native hashes solution, the documentation aspect of "asm-hashes is already slower than the native code on some hardware" is perhaps the most relevant to you.

The third aspect is potentially improving the native hashes implementation to be more competitive with modern asm code in other projects - that's probably best discussed in the other repository 🙂

@tarcieri
Copy link
Member

he new SHA512 instructions should be leveraged using intrinsics in the sha2 crate. Unfortunately, the relevant intrinsics and target features are currently unstable, so this new backend would have to be experimental (i.e gated on crate feature or configuration flag).

We can do something similar to what we did for ARM for awhile, and "polyfill" the unstable intrinsics by making them wrappers for small bits of inline assembly which would otherwise be emitted by the intrinsics. Though, are ZMM registers still unstable?

@newpavlov
Copy link
Member

_mm256_sha512* intrinsics work on YMM registers, so I guess it should be possible to polyfill them. Also, surprisingly, std does not have those intrinsics yet (or they were removed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants