-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow StringTable to use the full 32-bit address space #137
Allow StringTable to use the full 32-bit address space #137
Conversation
|
||
&mut bytes[0..4].copy_from_slice(&tagged.to_be_bytes()); | ||
&mut bytes[4..] | ||
assert!(STRING_REF_ENCODED_SIZE == 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the assert we actually want here? (It's trivially true and rustc actually seems to optimize it out before codegen even occurs but I'm not sure why we'd assert that in this method.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the assert!
is supposed to be a self-testing explanation of the magic numbers below. I think the best thing to do is (1) leaving the assertion in, (2) adding a comment that explains its purpose, and (3) applying the changes you suggest below.
For this kind of low-level byte shuffling stuff I like to add these kinds of tripwires that decrease the possibility of overlooking necessary adaptations when the encoding changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment but ended up not replacing the literal 5
with STRING_REF_ENCODED_SIZE
. I don't know about you but writing STRING_REF_ENCODED_SIZE
did not actually seem to make the code more readable.
What might make it more readable would be to have a bunch of constants like STRING_REF_TAG_OFFSET
, STRING_REF_VALUE_START_OFFSET
, and STRING_REF_VALUE_END_OFFSET
, so that the encoding is kind of explained in one place. I don't think that's quite worth the trouble though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable to me 👍
This PR changes the encoding for string references within the
StringTable
to use 5 bytes instead of 4 bytes. This way we can address 4 GB of data instead of 1 GB. The code becomes quite a bit simpler because we can remove a lot of manual bit-fiddling operations.The only downside of the new approach is that string reference heavy data (e.g. from recording query keys) becomes about 5% larger. Between the larger address space support and the maintainability improvements this seems like an acceptable tradeoff to me. (I also have a half-done implementation of the same functionality without the additional space requirements -- but that turned out to make for such messy de-/serialization code that I abandoned it)
The size of profiling data that is collected without query keys should not be affected by this change.
r? @wesleywiser