-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize HashPrefixStore #28099
base: master
Are you sure you want to change the base?
Optimize HashPrefixStore #28099
Conversation
// Serialize the vectors and the main table. | ||
auto flat_offsets = builder.CreateVector(offsets.data(), offsets.size()); | ||
auto all_suffixes = builder.CreateVector( | ||
reinterpret_cast<const uint8_t*>(all_suffixes_data.data()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Using reinterpret_cast
against some data types may lead to undefined behaviour. In general, when needing to do these conversions, check how Chromium upstream does them. Most of the times a reinterpret_cast is wrong and there's no guarantee the compiler will generate the code that you thought it would.
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/reinterpret_cast.yaml
Cc @stoletheminerals @thypon @cdesouza-chromium
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flatbuffers is sad... can any of this be realistically spanified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdesouza-chromium
Flatbuffers supports snap and I use it here.
The problem is FB doesn't support char
, only int8_t
or uint8_t
for array.
Because the surrounding code uses std::string_view, we have to convert somewhere span<uint8_t>
to span/string_view<char>
.
I've managed to remove reinterpret_cast
. Also guarded the rest with static_assert for the source type.
efadeed
to
1b367cc
Compare
std::string hash = crypto::SHA256HashString(value); | ||
hash.resize(prefix_size_); | ||
static_assert(std::is_same_v<const uint8_t*, decltype(all_suffixes.data())>); | ||
std::string_view data(reinterpret_cast<const char*>(all_suffixes.data()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Using reinterpret_cast
against some data types may lead to undefined behaviour. In general, when needing to do these conversions, check how Chromium upstream does them. Most of the times a reinterpret_cast is wrong and there's no guarantee the compiler will generate the code that you thought it would.
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/reinterpret_cast.yaml
Cc @stoletheminerals @thypon @cdesouza-chromium
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the reinterpret_cast
here. It should be something like:
auto data = base::as_string_view(base::span(all_suffixes));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example corrected above. But I suspect it could be even simpler as base::as_string_view(all_suffixes);
. You're gonna need to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no as_string_view()
in base::span<const uint8_t>. It's available only for
span`
const uint16_t first_bytes = | ||
256u * static_cast<uint8_t>(hash[0]) + static_cast<uint8_t>(hash[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these ops safe? Is there any chance this could overflow?
for (size_t i = 0; i < prefix_count.value(); i++) { | ||
const auto prefix = prefixes_view.substr(i * prefix_size, prefix_size); | ||
const auto [index, suffix] = GetIndexAndSuffix(prefix); | ||
suffix_arrays[index].insert(suffix_arrays[index].end(), suffix.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing double searches with suffix_arrays[index]
being repeated twice.
const std::string_view prefixes_view(prefixes); | ||
|
||
for (size_t i = 0; i < prefix_count.value(); i++) { | ||
const auto prefix = prefixes_view.substr(i * prefix_size, prefix_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty tricky code. TBH, I'm leaning towards preferring the simpler current approach for now. What would you think about leaving as-is for a couple of releases, and then introducing the optimization later, if necessary?
Also, since we're changing the file format, we're going to need another pref migration to reset the fetch timer.
The PR:
The structure:
Each prefix is considered as the first 2 bytes (are used as index) + suffix.
Prefixes are stored as a collection where the first 2 bytes are used as an
index into the 'offsets' array (size 256*256 for all possible 2-byte values).
Each offset points to the position in 'all_suffixes' where the suffixes
for that prefix index begin.
Suffixes are stored contiguously for efficient binary search lookup.
Memory usage before = 1.7M * 4 = 6.5 MB
Memory usage after = Header + 256KB (offsets) + 1.7M * 2 (prefix_count * (prefix_size - 2)) = 3.5MB
When this version is installed the old
RewardsCreators.db
will be dropped, the prefixes will be re-downloaded.Submitter Checklist:
QA/Yes
orQA/No
;release-notes/include
orrelease-notes/exclude
;OS/...
) to the associated issuenpm run test -- brave_browser_tests
,npm run test -- brave_unit_tests
wikinpm run presubmit
wiki,npm run gn_check
,npm run tslint
git rebase master
(if needed)Reviewer Checklist:
gn
After-merge Checklist:
changes has landed on
Test Plan: