-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FrozenDict
as an immutable wrapper around native dictionaries
#9366
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It doesn't work as well as we'd like. For example, `Type` (e.g. `str`) is not seen as hashable, even though it is hashable.
jsirois
approved these changes
Mar 21, 2020
…attribute John raises good points that the eager validation is very desirable and it's a pre-mature optimization to make this validation be lazy.
jsirois
approved these changes
Mar 21, 2020
src/python/pants/util/frozendict.py
Outdated
"""Creates a `FrozenDict` from a mapping object or a sequence of tuples representing | ||
entries. | ||
|
||
These values must be immutable and hashable, which we proactively validate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can perfectly well have a mutable object that's hashable - the mutable part just shouldn't (but could be :/) part of the hash. It might be best to just drop immutable here - you're not guaranteeing that.
[ci skip-jvm-tests] # No JVM changes made.
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] # No JVM changes made.
huonw
added a commit
that referenced
this pull request
Nov 30, 2023
…#20221) This adjusts `FrozenDict` to have the comparisons like `__eq__` and `__lt__` and hashing operate in a slightly less error-prone way, to fix #20219: - they're order-insensitive, For instance, `FrozenDict({1: 2, 3: 4}) == FrozenDict({3: 4, 1: 2})` now compares `True`, but previously was `False` - `__eq__` works against `dict`. For instance, `FrozenDict({1: 2}) == {1: 2}` now compares `True`, but previously was `False`. (This matches behaviour of stdlib `dict` subclasses, e.g. `__import__('collections').Counter([1, 1]) == {1: 2}` is `True`.) This requires two notable adjustments: - `__lt__` is significantly more expensive now, because it has to sort both LHS and RHS (discussed and potentially revised in #20243) - Computing the hash changes value and is arguably less "safe", because it now aggregates multiple independent hashes with xor/`^` rather than hashing one huge tuple and letting Python's tuple hashing magic handle it (https://github.com/python/cpython/blob/3faf8e586d36e73faba13d9b61663afed6a24cb4/Objects/tupleobject.c#L291-L344). I don't think this matters, as, I believe, this hash is only used for (effectively) being a dict key, and so collisions are at worst a denial-of-service, because they're always double-checked with `__eq__`; cryptographic hashing is handled via sha256 etc. in the Rust code. This potentially improves `@rule` memoisation, because previously some rule inputs might've been computed differently, but be semantically identical. Reminder of history, in case there's a critical reason why this behaviour should be preserved: - `FrozenDict` implemented in #9366, including tests that explicitly call out "Order matters" - Order-sensitive `__lt__` added in #12898, including tests that explicitly call out "Order matters" - `xfail`ed test about `__eq__` being order dependent added in #13389 In theory, a low-level data structure like this would be a good candidate for property testing with a tool like Hypothesis, with properties like "if `a == b`, then `hash(a) == hash(b)`". It seems like Pants doesn't currently have any tests using Hypothesis so I'm not inclined to add one here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
We have no ergonomic way to work with dictionaries with the engine, as
dict
is mutable and not hashable. We've resorted to using custom constructors in dataclasses that allow the user to pass aDict
, and then, in the constructor, converting it to aTuple[Tuple[K, V], ...]
, which is still clunky to use.pants/src/python/pants/engine/isolated_process.py
Lines 24 to 62 in eeb9d74
Solution
Add
FrozenDict
as a very lightweight wrapper around nativedict
, with the only differences that it removes all entry points for mutation (e.g.d.update()
) and it implements__hash__()
.Rejected alternative: proactive casting of unhashable elements
Toolchain has a
FrozenDict
implementation that proactively attempts to convert hashable elements to hashable, and has additional guarantees to ensure the dict truly is frozen. For example, it will convert{'a': [1]}
to{'a': (1,)}
.Instead, we take a much simpler approach to validate and give a nice error message when using unhashable elements, so that the rule author may fix their code. This is simpler, more performant, and more transparent.
Does not use
typing.Hashable
This is a way in MyPy to express that something must must be hashable. However, it does not work as well as expected, e.g. complaining about
typing.Type
, such asstr
, not being hashable, even thoughhash(str)
works.Result
In a followup, we will audit the current engine code to see where we can clean things up by using
FrozenDict
.