Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FrozenDict as an immutable wrapper around native dictionaries #9366

Merged
merged 6 commits into from
Mar 24, 2020

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Mar 21, 2020

Problem

We have no ergonomic way to work with dictionaries with the engine, as dict is mutable and not hashable. We've resorted to using custom constructors in dataclasses that allow the user to pass a Dict, and then, in the constructor, converting it to a Tuple[Tuple[K, V], ...], which is still clunky to use.

@frozen_after_init
@dataclass(unsafe_hash=True)
class ExecuteProcessRequest:
"""Request for execution with args and snapshots to extract."""
# TODO: add a method to hack together a `process_executor` invocation command line which
# reproduces this process execution request to make debugging remote executions effortless!
argv: Tuple[str, ...]
input_files: Digest
description: str
working_directory: Optional[str]
env: Tuple[str, ...]
output_files: Tuple[str, ...]
output_directories: Tuple[str, ...]
timeout_seconds: Union[int, float]
unsafe_local_only_files_because_we_favor_speed_over_correctness_for_this_rule: Digest
jdk_home: Optional[str]
is_nailgunnable: bool
def __init__(
self,
argv: Tuple[str, ...],
*,
input_files: Digest,
description: str,
working_directory: Optional[str] = None,
env: Optional[Dict[str, str]] = None,
output_files: Optional[Tuple[str, ...]] = None,
output_directories: Optional[Tuple[str, ...]] = None,
timeout_seconds: Union[int, float] = _default_timeout_seconds,
unsafe_local_only_files_because_we_favor_speed_over_correctness_for_this_rule: Digest = EMPTY_DIRECTORY_DIGEST,
jdk_home: Optional[str] = None,
is_nailgunnable: bool = False,
) -> None:
self.argv = argv
self.input_files = input_files
self.description = description
self.working_directory = working_directory
self.env = tuple(itertools.chain.from_iterable((env or {}).items()))

Solution

Add FrozenDict as a very lightweight wrapper around native dict, with the only differences that it removes all entry points for mutation (e.g. d.update()) and it implements __hash__().

Rejected alternative: proactive casting of unhashable elements

Toolchain has a FrozenDict implementation that proactively attempts to convert hashable elements to hashable, and has additional guarantees to ensure the dict truly is frozen. For example, it will convert {'a': [1]} to {'a': (1,)}.

Instead, we take a much simpler approach to validate and give a nice error message when using unhashable elements, so that the rule author may fix their code. This is simpler, more performant, and more transparent.

Does not use typing.Hashable

This is a way in MyPy to express that something must must be hashable. However, it does not work as well as expected, e.g. complaining about typing.Type, such as str, not being hashable, even though hash(str) works.

Result

In a followup, we will audit the current engine code to see where we can clean things up by using FrozenDict.

It doesn't work as well as we'd like. For example, `Type` (e.g. `str`) is not seen as hashable, even though it is hashable.
…attribute

John raises good points that the eager validation is very desirable and it's a pre-mature optimization to make this validation be lazy.
"""Creates a `FrozenDict` from a mapping object or a sequence of tuples representing
entries.

These values must be immutable and hashable, which we proactively validate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can perfectly well have a mutable object that's hashable - the mutable part just shouldn't (but could be :/) part of the hash. It might be best to just drop immutable here - you're not guaranteeing that.

# Delete this line to force CI to run the JVM tests.
[ci skip-jvm-tests]  # No JVM changes made.
@Eric-Arellano Eric-Arellano merged commit 1ab05a1 into pantsbuild:master Mar 24, 2020
@Eric-Arellano Eric-Arellano deleted the frozendict branch March 24, 2020 06:42
huonw added a commit that referenced this pull request Nov 30, 2023
…#20221)

This adjusts `FrozenDict` to have the comparisons like `__eq__` and
`__lt__` and hashing operate in a slightly less error-prone way, to fix
#20219:

- they're order-insensitive, For instance, `FrozenDict({1: 2, 3: 4}) ==
FrozenDict({3: 4, 1: 2})` now compares `True`, but previously was
`False`
- `__eq__` works against `dict`. For instance, `FrozenDict({1: 2}) ==
{1: 2}` now compares `True`, but previously was `False`. (This matches
behaviour of stdlib `dict` subclasses, e.g.
`__import__('collections').Counter([1, 1]) == {1: 2}` is `True`.)

This requires two notable adjustments:

- `__lt__` is significantly more expensive now, because it has to sort
both LHS and RHS (discussed and potentially revised in #20243)
- Computing the hash changes value and is arguably less "safe", because
it now aggregates multiple independent hashes with xor/`^` rather than
hashing one huge tuple and letting Python's tuple hashing magic handle
it
(https://github.com/python/cpython/blob/3faf8e586d36e73faba13d9b61663afed6a24cb4/Objects/tupleobject.c#L291-L344).
I don't think this matters, as, I believe, this hash is only used for
(effectively) being a dict key, and so collisions are at worst a
denial-of-service, because they're always double-checked with `__eq__`;
cryptographic hashing is handled via sha256 etc. in the Rust code.

This potentially improves `@rule` memoisation, because previously some
rule inputs might've been computed differently, but be semantically
identical.

Reminder of history, in case there's a critical reason why this
behaviour should be preserved:

- `FrozenDict` implemented in #9366, including tests that explicitly
call out "Order matters"
- Order-sensitive `__lt__` added in #12898, including tests that
explicitly call out "Order matters"
- `xfail`ed test about `__eq__` being order dependent added in #13389

In theory, a low-level data structure like this would be a good
candidate for property testing with a tool like Hypothesis, with
properties like "if `a == b`, then `hash(a) == hash(b)`". It seems like
Pants doesn't currently have any tests using Hypothesis so I'm not
inclined to add one here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants