Allow StringTable to use the full 32-bit address space #137

michaelwoerister · 2020-10-01T13:18:50Z

This PR changes the encoding for string references within the StringTable to use 5 bytes instead of 4 bytes. This way we can address 4 GB of data instead of 1 GB. The code becomes quite a bit simpler because we can remove a lot of manual bit-fiddling operations.

The only downside of the new approach is that string reference heavy data (e.g. from recording query keys) becomes about 5% larger. Between the larger address space support and the maintainability improvements this seems like an acceptable tradeoff to me. (I also have a half-done implementation of the same functionality without the additional space requirements -- but that turned out to make for such messy de-/serialization code that I abandoned it)

The size of profiling data that is collected without query keys should not be affected by this change.

r? @wesleywiser

…ress space.

…coding.

… for string refs.

wesleywiser · 2020-10-03T19:18:30Z

measureme/src/stringtable.rs

-
-                &mut bytes[0..4].copy_from_slice(&tagged.to_be_bytes());
-                &mut bytes[4..]
+                assert!(STRING_REF_ENCODED_SIZE == 5);


Is this the assert we actually want here? (It's trivially true and rustc actually seems to optimize it out before codegen even occurs but I'm not sure why we'd assert that in this method.)

Yeah, the assert! is supposed to be a self-testing explanation of the magic numbers below. I think the best thing to do is (1) leaving the assertion in, (2) adding a comment that explains its purpose, and (3) applying the changes you suggest below.

For this kind of low-level byte shuffling stuff I like to add these kinds of tripwires that decrease the possibility of overlooking necessary adaptations when the encoding changes.

I added a comment but ended up not replacing the literal 5 with STRING_REF_ENCODED_SIZE. I don't know about you but writing STRING_REF_ENCODED_SIZE did not actually seem to make the code more readable.

What might make it more readable would be to have a bunch of constants like STRING_REF_TAG_OFFSET, STRING_REF_VALUE_START_OFFSET, and STRING_REF_VALUE_END_OFFSET, so that the encoding is kind of explained in one place. I don't think that's quite worth the trouble though.

That seems reasonable to me 👍

measureme/src/stringtable.rs

michaelwoerister added 3 commits October 1, 2020 15:00

Use 5-byte encoding for string-refs so we can use the full 32 bit add…

ea03582

…ress space.

Simplify StringTable decoding after switching to 5-byte string ref en…

ce5c7e7

…coding.

Bump measureme file format version after switching to 5-byte encoding…

72cb8fa

… for string refs.

wesleywiser self-assigned this Oct 1, 2020

wesleywiser reviewed Oct 3, 2020

View reviewed changes

measureme/src/stringtable.rs Show resolved Hide resolved

Address PR feedback for 5-byte encoding of string refs.

f80765f

wesleywiser approved these changes Oct 5, 2020

View reviewed changes

wesleywiser merged commit 6792001 into rust-lang:master Oct 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow StringTable to use the full 32-bit address space #137

Allow StringTable to use the full 32-bit address space #137

michaelwoerister commented Oct 1, 2020

wesleywiser Oct 3, 2020

michaelwoerister Oct 5, 2020

michaelwoerister Oct 5, 2020

wesleywiser Oct 5, 2020

Allow StringTable to use the full 32-bit address space #137

Allow StringTable to use the full 32-bit address space #137

Conversation

michaelwoerister commented Oct 1, 2020

wesleywiser Oct 3, 2020

Choose a reason for hiding this comment

michaelwoerister Oct 5, 2020

Choose a reason for hiding this comment

michaelwoerister Oct 5, 2020

Choose a reason for hiding this comment

wesleywiser Oct 5, 2020

Choose a reason for hiding this comment