Bug in Sha512 AVX(2) implementation #344

elichai · 2022-01-06T13:44:12Z

I found a bug in Sha512 AVX2 implementation and I'm not sure what's the nicest fix.
to trigger the bug I ran the following code:

use sha2::{Digest, Sha512};
fn main() {
    let mut h = Sha512::new();
    h.update(&[0u8; 300]);
    println!("{:?}", h.finalize());
}

With these flags: RUSTFLAGS='-Clink-arg=-fuse-ld=lld -Clinker-plugin-lto' cargo r
Output:
Segmentation fault (core dumped)

The source of the bug is here: https://github.com/RustCrypto/hashes/blob/c478cbb/sha2/src/sha512/x86.rs#L117
_mm_store_si128 is used with a pointer to a u64 array, which is aligned to 8 and not to 16 as the instruction requires.
(there are a lot of places that assume that MsgSchedule and RoundStates are aligned to 16 bytes while this is in fact false)

We need to make them aligned correctly, there are a few ways I see how this can be done:

Make 2 structs with repr(align(16)) and implement Index/IndexMut with all the Range* variants.
Make 2 structs with repr(align(16)) and transmute their mutable reference to a mutable reference to u64's.
Make a Align16 type and replace these with arrays of that type, and add a function to receive the lower and higher part of that type.

(we can cut boilerplate of 1 in half via const generics)

Would appreciate thoughts before I implement anything as the first one has a lot of boilerplate code in it.

(P.S. You might notice that &mut ms[2 * $i] as *mut u64 as *mut _ is also a bug because it takes a mutable reference to a single u64 and casts it into a pointer with size 2*64, so reading from that violates Stack Borrows, but that's easier to fix via ms[2 * $i.. 2 * $i + 2].as_mut_ptr() as *mut u64 as *mut _ )

The text was updated successfully, but these errors were encountered:

elichai · 2022-01-06T13:44:20Z

You can check that solving the alignment actually fixes the bug by applying the following (hacky) patch:

diff --git a/sha2/src/sha512/x86.rs b/sha2/src/sha512/x86.rs
index 5f555c2..c5c411f 100644
--- a/sha2/src/sha512/x86.rs
+++ b/sha2/src/sha512/x86.rs
@@ -25,6 +25,13 @@ pub fn compress(state: &mut [u64; 8], blocks: &[[u8; 128]]) {
     }
 }
 
+#[repr(C, align(16))]
+struct AlignedRounds(RoundStates);
+
+#[repr(C, align(16))]
+#[derive(Default)]
+struct AlignedMsgSchedule(MsgSchedule);
+
 #[target_feature(enable = "avx2")]
 unsafe fn sha512_compress_x86_64_avx2(state: &mut [u64; 8], blocks: &[[u8; 128]]) {
     let mut start_block = 0;
@@ -34,22 +41,23 @@ unsafe fn sha512_compress_x86_64_avx2(state: &mut [u64; 8], blocks: &[[u8; 128]]
         start_block += 1;
     }
 
-    let mut ms: MsgSchedule = Default::default();
-    let mut t2: RoundStates = [0u64; SHA512_ROUNDS_NUM];
+    let mut ms: AlignedMsgSchedule = Default::default();
+    let mut t2 = AlignedRounds([0u64; SHA512_ROUNDS_NUM]);
     let mut x = [_mm256_setzero_si256(); 8];
+    use core::mem::transmute;
 
     for i in (start_block..blocks.len()).step_by(2) {
-        load_data_avx2(&mut x, &mut ms, &mut t2, blocks.as_ptr().add(i) as *const _);
+        load_data_avx2(&mut x, transmute(&mut ms), transmute(&mut t2), blocks.as_ptr().add(i) as *const _);
 
         // First block
         let mut current_state = *state;
-        rounds_0_63_avx2(&mut current_state, &mut x, &mut ms, &mut t2);
-        rounds_64_79(&mut current_state, &ms);
+        rounds_0_63_avx2(&mut current_state, &mut x, transmute(&mut ms), transmute(&mut t2));
+        rounds_64_79(&mut current_state, transmute(&ms));
         accumulate_state(state, &current_state);
 
         // Second block
         current_state = *state;
-        process_second_block(&mut current_state, &t2);
+        process_second_block(&mut current_state, transmute(&t2));
         accumulate_state(state, &current_state);
     }
 }

tarcieri · 2022-01-06T16:06:49Z

@elichai thanks for the report! Would you mind turning your proposed fix (however hacky) into a PR for further discussion?

newpavlov · 2022-01-06T16:26:26Z

Can't we simply redefine RoundStates and MsgSchedule as arrays of __m256i? It also would allow us to replace _mm_store_si128/_mm_load_si128 calls with simple indexing, thus also addressing your P.S. concern. IIRC it should compile down to aligned loads/stores without any issues.

We probably should also backport fix to v0.9.

cc @Rexagon

The 0.9.8 release was yanked due to a bug in the AVX512 implementation. See RustCrypto/hashes#344

#### Problem The binary built with `cargo install --locked spl-token-cli` has a segfault, likely due to RustCrypto/hashes#344. The normal build with `cargo install spl-token-cli` does work, however. It's not clear why this issue has only started now, but the spl-token-cli build is not working on the Solana 2.0.4 and 2.0.5 releases. #### Solution Bump sha2 to 0.9.9.

newpavlov mentioned this issue Jan 6, 2022

sha2: fix AVX backend #345

Merged

newpavlov closed this as completed in #345 Jan 6, 2022

mulkieran mentioned this issue Jan 10, 2022

Reminder regarding yanked sha2 crate stratis-storage/project#387

Closed

rillian added a commit to brave/brave-core that referenced this issue Jan 19, 2022

Update sku deps to sha2 0.9.9.

cf37392

The 0.9.8 release was yanked due to a bug in the AVX512 implementation. See RustCrypto/hashes#344

madadam mentioned this issue Jun 14, 2022

Fix stack overflow when using deadlock detection in Pool::acquire equalitie/ouisync#48

Closed

joncinque mentioned this issue Aug 12, 2024

deps: Bump sha2 to 0.9.9 solana-labs/solana-program-library#7138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in Sha512 AVX(2) implementation #344

Bug in Sha512 AVX(2) implementation #344

elichai commented Jan 6, 2022 •

edited

Loading

elichai commented Jan 6, 2022 •

edited

Loading

tarcieri commented Jan 6, 2022

newpavlov commented Jan 6, 2022

Bug in Sha512 AVX(2) implementation #344

Bug in Sha512 AVX(2) implementation #344

Comments

elichai commented Jan 6, 2022 • edited Loading

elichai commented Jan 6, 2022 • edited Loading

tarcieri commented Jan 6, 2022

newpavlov commented Jan 6, 2022

elichai commented Jan 6, 2022 •

edited

Loading

elichai commented Jan 6, 2022 •

edited

Loading