-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in Sha512 AVX(2) implementation #344
Comments
You can check that solving the alignment actually fixes the bug by applying the following (hacky) patch: diff --git a/sha2/src/sha512/x86.rs b/sha2/src/sha512/x86.rs
index 5f555c2..c5c411f 100644
--- a/sha2/src/sha512/x86.rs
+++ b/sha2/src/sha512/x86.rs
@@ -25,6 +25,13 @@ pub fn compress(state: &mut [u64; 8], blocks: &[[u8; 128]]) {
}
}
+#[repr(C, align(16))]
+struct AlignedRounds(RoundStates);
+
+#[repr(C, align(16))]
+#[derive(Default)]
+struct AlignedMsgSchedule(MsgSchedule);
+
#[target_feature(enable = "avx2")]
unsafe fn sha512_compress_x86_64_avx2(state: &mut [u64; 8], blocks: &[[u8; 128]]) {
let mut start_block = 0;
@@ -34,22 +41,23 @@ unsafe fn sha512_compress_x86_64_avx2(state: &mut [u64; 8], blocks: &[[u8; 128]]
start_block += 1;
}
- let mut ms: MsgSchedule = Default::default();
- let mut t2: RoundStates = [0u64; SHA512_ROUNDS_NUM];
+ let mut ms: AlignedMsgSchedule = Default::default();
+ let mut t2 = AlignedRounds([0u64; SHA512_ROUNDS_NUM]);
let mut x = [_mm256_setzero_si256(); 8];
+ use core::mem::transmute;
for i in (start_block..blocks.len()).step_by(2) {
- load_data_avx2(&mut x, &mut ms, &mut t2, blocks.as_ptr().add(i) as *const _);
+ load_data_avx2(&mut x, transmute(&mut ms), transmute(&mut t2), blocks.as_ptr().add(i) as *const _);
// First block
let mut current_state = *state;
- rounds_0_63_avx2(&mut current_state, &mut x, &mut ms, &mut t2);
- rounds_64_79(&mut current_state, &ms);
+ rounds_0_63_avx2(&mut current_state, &mut x, transmute(&mut ms), transmute(&mut t2));
+ rounds_64_79(&mut current_state, transmute(&ms));
accumulate_state(state, ¤t_state);
// Second block
current_state = *state;
- process_second_block(&mut current_state, &t2);
+ process_second_block(&mut current_state, transmute(&t2));
accumulate_state(state, ¤t_state);
}
} |
@elichai thanks for the report! Would you mind turning your proposed fix (however hacky) into a PR for further discussion? |
Can't we simply redefine We probably should also backport fix to v0.9. cc @Rexagon |
The 0.9.8 release was yanked due to a bug in the AVX512 implementation. See RustCrypto/hashes#344
#### Problem The binary built with `cargo install --locked spl-token-cli` has a segfault, likely due to RustCrypto/hashes#344. The normal build with `cargo install spl-token-cli` does work, however. It's not clear why this issue has only started now, but the spl-token-cli build is not working on the Solana 2.0.4 and 2.0.5 releases. #### Solution Bump sha2 to 0.9.9.
#### Problem The binary built with `cargo install --locked spl-token-cli` has a segfault, likely due to RustCrypto/hashes#344. The normal build with `cargo install spl-token-cli` does work, however. It's not clear why this issue has only started now, but the spl-token-cli build is not working on the Solana 2.0.4 and 2.0.5 releases. #### Solution Bump sha2 to 0.9.9.
I found a bug in Sha512 AVX2 implementation and I'm not sure what's the nicest fix.
to trigger the bug I ran the following code:
With these flags:
RUSTFLAGS='-Clink-arg=-fuse-ld=lld -Clinker-plugin-lto' cargo r
Output:
Segmentation fault (core dumped)
The source of the bug is here: https://github.com/RustCrypto/hashes/blob/c478cbb/sha2/src/sha512/x86.rs#L117
_mm_store_si128
is used with a pointer to a u64 array, which is aligned to 8 and not to 16 as the instruction requires.(there are a lot of places that assume that
MsgSchedule
andRoundStates
are aligned to 16 bytes while this is in fact false)We need to make them aligned correctly, there are a few ways I see how this can be done:
repr(align(16))
and implement Index/IndexMut with all theRange*
variants.repr(align(16))
and transmute their mutable reference to a mutable reference to u64's.Align16
type and replace these with arrays of that type, and add a function to receive the lower and higher part of that type.(we can cut boilerplate of 1 in half via const generics)
Would appreciate thoughts before I implement anything as the first one has a lot of boilerplate code in it.
(P.S. You might notice that
&mut ms[2 * $i] as *mut u64 as *mut _
is also a bug because it takes a mutable reference to a single u64 and casts it into a pointer with size 2*64, so reading from that violates Stack Borrows, but that's easier to fix viams[2 * $i.. 2 * $i + 2].as_mut_ptr() as *mut u64 as *mut _
)The text was updated successfully, but these errors were encountered: