Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Sha512 AVX(2) implementation #344

Closed
elichai opened this issue Jan 6, 2022 · 3 comments · Fixed by #345
Closed

Bug in Sha512 AVX(2) implementation #344

elichai opened this issue Jan 6, 2022 · 3 comments · Fixed by #345

Comments

@elichai
Copy link

elichai commented Jan 6, 2022

I found a bug in Sha512 AVX2 implementation and I'm not sure what's the nicest fix.
to trigger the bug I ran the following code:

use sha2::{Digest, Sha512};
fn main() {
    let mut h = Sha512::new();
    h.update(&[0u8; 300]);
    println!("{:?}", h.finalize());
}

With these flags: RUSTFLAGS='-Clink-arg=-fuse-ld=lld -Clinker-plugin-lto' cargo r
Output:
Segmentation fault (core dumped)

The source of the bug is here: https://github.com/RustCrypto/hashes/blob/c478cbb/sha2/src/sha512/x86.rs#L117
_mm_store_si128 is used with a pointer to a u64 array, which is aligned to 8 and not to 16 as the instruction requires.
(there are a lot of places that assume that MsgSchedule and RoundStates are aligned to 16 bytes while this is in fact false)

We need to make them aligned correctly, there are a few ways I see how this can be done:

  1. Make 2 structs with repr(align(16)) and implement Index/IndexMut with all the Range* variants.
  2. Make 2 structs with repr(align(16)) and transmute their mutable reference to a mutable reference to u64's.
  3. Make a Align16 type and replace these with arrays of that type, and add a function to receive the lower and higher part of that type.

(we can cut boilerplate of 1 in half via const generics)

Would appreciate thoughts before I implement anything as the first one has a lot of boilerplate code in it.

(P.S. You might notice that &mut ms[2 * $i] as *mut u64 as *mut _ is also a bug because it takes a mutable reference to a single u64 and casts it into a pointer with size 2*64, so reading from that violates Stack Borrows, but that's easier to fix via ms[2 * $i.. 2 * $i + 2].as_mut_ptr() as *mut u64 as *mut _ )

@elichai
Copy link
Author

elichai commented Jan 6, 2022

You can check that solving the alignment actually fixes the bug by applying the following (hacky) patch:

diff --git a/sha2/src/sha512/x86.rs b/sha2/src/sha512/x86.rs
index 5f555c2..c5c411f 100644
--- a/sha2/src/sha512/x86.rs
+++ b/sha2/src/sha512/x86.rs
@@ -25,6 +25,13 @@ pub fn compress(state: &mut [u64; 8], blocks: &[[u8; 128]]) {
     }
 }
 
+#[repr(C, align(16))]
+struct AlignedRounds(RoundStates);
+
+#[repr(C, align(16))]
+#[derive(Default)]
+struct AlignedMsgSchedule(MsgSchedule);
+
 #[target_feature(enable = "avx2")]
 unsafe fn sha512_compress_x86_64_avx2(state: &mut [u64; 8], blocks: &[[u8; 128]]) {
     let mut start_block = 0;
@@ -34,22 +41,23 @@ unsafe fn sha512_compress_x86_64_avx2(state: &mut [u64; 8], blocks: &[[u8; 128]]
         start_block += 1;
     }
 
-    let mut ms: MsgSchedule = Default::default();
-    let mut t2: RoundStates = [0u64; SHA512_ROUNDS_NUM];
+    let mut ms: AlignedMsgSchedule = Default::default();
+    let mut t2 = AlignedRounds([0u64; SHA512_ROUNDS_NUM]);
     let mut x = [_mm256_setzero_si256(); 8];
+    use core::mem::transmute;
 
     for i in (start_block..blocks.len()).step_by(2) {
-        load_data_avx2(&mut x, &mut ms, &mut t2, blocks.as_ptr().add(i) as *const _);
+        load_data_avx2(&mut x, transmute(&mut ms), transmute(&mut t2), blocks.as_ptr().add(i) as *const _);
 
         // First block
         let mut current_state = *state;
-        rounds_0_63_avx2(&mut current_state, &mut x, &mut ms, &mut t2);
-        rounds_64_79(&mut current_state, &ms);
+        rounds_0_63_avx2(&mut current_state, &mut x, transmute(&mut ms), transmute(&mut t2));
+        rounds_64_79(&mut current_state, transmute(&ms));
         accumulate_state(state, &current_state);
 
         // Second block
         current_state = *state;
-        process_second_block(&mut current_state, &t2);
+        process_second_block(&mut current_state, transmute(&t2));
         accumulate_state(state, &current_state);
     }
 }

@tarcieri
Copy link
Member

tarcieri commented Jan 6, 2022

@elichai thanks for the report! Would you mind turning your proposed fix (however hacky) into a PR for further discussion?

@newpavlov
Copy link
Member

Can't we simply redefine RoundStates and MsgSchedule as arrays of __m256i? It also would allow us to replace _mm_store_si128/_mm_load_si128 calls with simple indexing, thus also addressing your P.S. concern. IIRC it should compile down to aligned loads/stores without any issues.

We probably should also backport fix to v0.9.

cc @Rexagon

rillian added a commit to brave/brave-core that referenced this issue Jan 19, 2022
The 0.9.8 release was yanked due to a bug in the AVX512 implementation.
See RustCrypto/hashes#344
joncinque added a commit to joncinque/solana-program-library that referenced this issue Aug 12, 2024
#### Problem

The binary built with `cargo install --locked spl-token-cli` has a
segfault, likely due to RustCrypto/hashes#344.
The normal build with `cargo install spl-token-cli` does work, however.
It's not clear why this issue has only started now, but the
spl-token-cli build is not working on the Solana 2.0.4 and 2.0.5
releases.

#### Solution

Bump sha2 to 0.9.9.
joncinque added a commit to solana-labs/solana-program-library that referenced this issue Aug 12, 2024
#### Problem

The binary built with `cargo install --locked spl-token-cli` has a
segfault, likely due to RustCrypto/hashes#344.
The normal build with `cargo install spl-token-cli` does work, however.
It's not clear why this issue has only started now, but the
spl-token-cli build is not working on the Solana 2.0.4 and 2.0.5
releases.

#### Solution

Bump sha2 to 0.9.9.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants