Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow StringTable to use the full 32-bit address space #137

Merged
merged 4 commits into from
Oct 5, 2020

Conversation

michaelwoerister
Copy link
Member

This PR changes the encoding for string references within the StringTable to use 5 bytes instead of 4 bytes. This way we can address 4 GB of data instead of 1 GB. The code becomes quite a bit simpler because we can remove a lot of manual bit-fiddling operations.

The only downside of the new approach is that string reference heavy data (e.g. from recording query keys) becomes about 5% larger. Between the larger address space support and the maintainability improvements this seems like an acceptable tradeoff to me. (I also have a half-done implementation of the same functionality without the additional space requirements -- but that turned out to make for such messy de-/serialization code that I abandoned it)

The size of profiling data that is collected without query keys should not be affected by this change.

r? @wesleywiser

@wesleywiser wesleywiser self-assigned this Oct 1, 2020

&mut bytes[0..4].copy_from_slice(&tagged.to_be_bytes());
&mut bytes[4..]
assert!(STRING_REF_ENCODED_SIZE == 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the assert we actually want here? (It's trivially true and rustc actually seems to optimize it out before codegen even occurs but I'm not sure why we'd assert that in this method.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the assert! is supposed to be a self-testing explanation of the magic numbers below. I think the best thing to do is (1) leaving the assertion in, (2) adding a comment that explains its purpose, and (3) applying the changes you suggest below.

For this kind of low-level byte shuffling stuff I like to add these kinds of tripwires that decrease the possibility of overlooking necessary adaptations when the encoding changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment but ended up not replacing the literal 5 with STRING_REF_ENCODED_SIZE. I don't know about you but writing STRING_REF_ENCODED_SIZE did not actually seem to make the code more readable.

What might make it more readable would be to have a bunch of constants like STRING_REF_TAG_OFFSET, STRING_REF_VALUE_START_OFFSET, and STRING_REF_VALUE_END_OFFSET, so that the encoding is kind of explained in one place. I don't think that's quite worth the trouble though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable to me 👍

@wesleywiser wesleywiser merged commit 6792001 into rust-lang:master Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants