Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VecDeque's Hash::hash implementation is incorrect #80303

Closed
SvetlinZarev opened this issue Dec 22, 2020 · 11 comments · Fixed by #81170
Closed

VecDeque's Hash::hash implementation is incorrect #80303

SvetlinZarev opened this issue Dec 22, 2020 · 11 comments · Fixed by #81170
Assignees
Labels
A-collections Area: `std::collections` C-bug Category: This is a bug. P-medium Medium priority T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@SvetlinZarev
Copy link

SvetlinZarev commented Dec 22, 2020

VecDeque's hashcode changes after rotation. In the following example, the VecDeque should be in its original ordering, yet it produces a different hashcode.

I've also opened rust-lang/hashbrown#218 but I was advised to open a new issue here.

I tried this code:

use ahash::AHasher;
use std::collections::VecDeque;
use std::hash::{Hash, Hasher};

fn hash(x: &impl Hash) -> u64 {
    let mut hasher = AHasher::default();
    Hash::hash(x, &mut hasher);
    hasher.finish()
}

fn main() {
    let deque: VecDeque<i32> = (0..10).collect();

    let mut deque2 = deque.clone();
    deque2.rotate_left(5);
    deque2.rotate_left(5);

    assert_eq!(deque, deque2);
    assert_eq!(hash(&deque), hash(&deque2)); // fails!
}

I expected to see this happen:
The hashcode should not change and the example should terminate successfully

Instead, this happened:
The hashcode is different and the application panics.

Meta

Yes, the bug exists in beta and nightly.

rustc --version --verbose:

rustc 1.48.0 (7eac88abb 2020-11-16)
binary: rustc
commit-hash: 7eac88abb2e57e752f3302f02be5f3ce3d7adfb4
commit-date: 2020-11-16
host: x86_64-unknown-linux-gnu
release: 1.48.0
LLVM version: 11.0

The issue is caused by an ambiguity in the Hash::hash_slice() contract - it's not clear whether the hashcode should be sensitive to the slice's length. The default hasher is not sensitive, while ahash is.

@SvetlinZarev SvetlinZarev added the C-bug Category: This is a bug. label Dec 22, 2020
@camelid camelid added A-collections Area: `std::collections` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 22, 2020
@camelid

This comment has been minimized.

@camelid camelid added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Dec 22, 2020
@camelid
Copy link
Member

camelid commented Dec 22, 2020

Ugh, not another VecDeque bug 😄

@LeSeulArtichaut LeSeulArtichaut added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Dec 22, 2020
@cuviper
Copy link
Member

cuviper commented Dec 22, 2020

I also filed #80304 with code references, but I closed in favor of this issue. My conclusions:

  • If both Hash::hash_slice and Hasher::write may consider length, then Hash for VecDeque is incorrect.
  • If Hash::hash_slice should hash without length, but Hasher::write is allowed to use length, then the integer optimization (Hash::hash_slice calling Hasher::write as &[u8]) is incorrect.
  • If both Hash::hash_slice and Hasher::write are supposed to be independent of length, hashing identically whether it's one big slice or any partitioned slices of the same data, then AHasher::write is incorrect.

@LeSeulArtichaut
Copy link
Contributor

So ultimately this falls into the T-libs realm?

@Stupremee
Copy link
Member

@rustbot claim

I will work on this

@cuviper
Copy link
Member

cuviper commented Dec 22, 2020

@Stupremee I do think this needs a @rust-lang/libs decision first on the specified behavior.

@cuviper
Copy link
Member

cuviper commented Dec 22, 2020

  • If Hash::hash_slice should hash without length, but Hasher::write is allowed to use length, then the integer optimization (Hash::hash_slice calling Hasher::write as &[u8]) is incorrect.

This may also be suspect if a hasher is endian-sensitive -- e.g. if it would normalize write_u64 to big/little-endian, but the cast to &[u8] makes it always hash native-endian. But I suppose this is not a problem if you're always consistent in whether you use hash_slice or not.

@LeSeulArtichaut LeSeulArtichaut added I-needs-decision Issue: In need of a decision. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 22, 2020
@Amanieu
Copy link
Member

Amanieu commented Jan 16, 2021

We discussed this in the libs meeting 2 weeks ago and came to the conclusion that VecDeque is in the wrong here: Hasher only guarantees hash equivalence for the exact same set of calls to Hasher methods. This allows hasher to perform various optimizations when hashing slices, such as using unaligned load instructions.

@Amanieu Amanieu removed I-nominated I-needs-decision Issue: In need of a decision. labels Jan 16, 2021
@Amanieu
Copy link
Member

Amanieu commented Jan 16, 2021

Incidentally hash_slice could use some additional documentation to clarify that it is not necessarily equivalent to hashing each element individually in sequence.

@LeSeulArtichaut LeSeulArtichaut added I-prioritize Issue: Indicates that prioritization has been requested for this issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 16, 2021
@sfackler
Copy link
Member

We could potentially add a new method to Hasher that does behave that way, but I don't know if it'd really be useful in many contexts outside of something like VecDeque.

@apiraino apiraino added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jan 20, 2021
@apiraino
Copy link
Contributor

Assigning P-medium as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collections` C-bug Category: This is a bug. P-medium Medium priority T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants