-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved sha256 and sha512 assembler code versions available for x86 / x64 #82
Comments
The new SHA512 instructions should be leveraged using intrinsics in the The linked assembly could be a good reference point and we may use it for an |
@newpavlov thanks for your quick feedback. To clarify, this particular issue #82 is about "there are slightly improved versions available for the code you already use from Nayuki", which would be a drop-in replacement except for minor details ( #83 is about "you could use other asm code implementations with modern CPU acceleration instructions" in The third aspect is potentially improving the native |
We can do something similar to what we did for ARM for awhile, and "polyfill" the unstable intrinsics by making them wrappers for small bits of inline assembly which would otherwise be emitted by the intrinsics. Though, are ZMM registers still unstable? |
|
I recently noticed that @nayuki , the author of some of the assembler code used in this crate, has published performance-improved versions of their code at https://github.com/nayuki/Nayuki-web-published-code/tree/master/fast-sha2-hashes-in-x86-assembly in 2024. The different improvement steps are described in the git commit messages.
I'm aware of the "maintenance mode" status of the
asm-hashes
repository and the general goal of moving to Rust with inline assembly, but still wanted to flag this potential code improvement to the maintainers and potential other users who are interested in this crate.For
sha2/src/sha512_x64.S
, the net performance improvement seen on two different AMD Zen3 CPUs was minor, in the range of ~1%. Performance improvements could be different on other CPUs or x86 architectures, though.For modern CPUs produced in the last ~10 years, there are significant additional speedups possible, which I'll document in a separate issue.
The text was updated successfully, but these errors were encountered: