Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code structure follow up #4 #439

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Code structure follow up #4 #439

wants to merge 7 commits into from

Conversation

boocmp
Copy link
Collaborator

@boocmp boocmp commented Mar 3, 2025

  1. The implementation of NetworkFilterList has been moved to network_filter_list.rs.
  2. The bitflags dependency version has been bumped to 2.9.0.
  3. The flatbuffers dependency has been added.
  4. FlatBuffers schema of the network filter list and the corresponding generated file have been added.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: d73f091 Previous: af0c65a Ratio
rule-match-browserlike/brave-list 1699804376 ns/iter (± 9731234) 1657920369 ns/iter (± 13523152) 1.03
rule-match-first-request/brave-list 1001595 ns/iter (± 7687) 970929 ns/iter (± 12529) 1.03
blocker_new/brave-list 207292155 ns/iter (± 4960395) 227719857 ns/iter (± 8671111) 0.91
memory-usage/brave-list-initial 41408849 ns/iter (± 3) 41408849 ns/iter (± 3) 1
memory-usage/brave-list-after-1000-requests 44004875 ns/iter (± 3) 44004875 ns/iter (± 3) 1

This comment was automatically generated by workflow using github-action-benchmark.

@@ -28,7 +28,7 @@ url = "2.5"
percent-encoding = "2.1"
once_cell = "1.8"
regex = "1.5"
bitflags = "1.3"
bitflags = { version = "2.9.0", features = ["serde"] }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonok-edm I’ve bumped the bitflags version because flatbuffers depends on it, and I want to avoid depending on two different versions. However, this change breaks backward compatibility. There is a crate called bitflags-serde-legacy that provides compatibility for older versions. Should I revert this change and stick with 1.3? What do you think?

Copy link
Collaborator

@atuchin-m atuchin-m Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it makes sense to use the recent 2.9.0 because we're going to break backward compatibility anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a crate called bitflags-serde-legacy that provides compatibility for older versions. Should I revert this change and stick with 1.3? What do you think?

I recall looking into this previously - that crate wasn't even working correctly in my testing, but I wasn't able to find the reason for it.

IMHO, it makes sense to use the recent 2.9.0 because we're going to break backward compatibility anyway.

Technically there's backward compatibility w/ the API, and then there's backward compatibility with the serialized format. So far we've been careful to keep the serialized format at least continuously usable across minor adblock-rust versions.

We no longer strictly require either for the sake of Brave Browser. I'm not sure how other downstream dependents of adblock-rust treat the serialized formats but I'm not convinced it's worth it to maintain that compatibility going forwards. It only makes sense in a scenario where the underlying lists are not being distributed; other projects likely also have access to the list text.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, if we're doing this, we may as well also upgrade seahash which was held back for a similar reason (doesn't have to be in the same PR)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'll update bitflags and seahash crates and fix the corresponding tests.

table NetworkFilter {
mask: uint32; // NetworkFilterMask (network.rs)

// These arrays contain sorted(asc) indecies in the |unique_domains_hashes|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indecies => indices
itself => themselves

Maybe something like:
These arrays contain the sorted (ascending) indices of elements in the |unique_domains_hashes| array, rather than the hashes themselves.

@boocmp boocmp force-pushed the code_structure_5 branch from 7d17397 to 5633dd8 Compare March 3, 2025 12:14
self._tab
.get::<u32>(NetworkFilter::VT_MASK, Some(0))
.unwrap()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

NetworkFilter::VT_OPT_DOMAINS,
None,
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

NetworkFilter::VT_OPT_NOT_DOMAINS,
None,
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

self._tab.get::<flatbuffers::ForwardsUOffset<
flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(NetworkFilter::VT_PATTERNS, None)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

NetworkFilter::VT_MODIFIER_OPTION,
None,
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

unsafe {
self._tab
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_HOSTNAME, None)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

unsafe {
self._tab
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_TAG, None)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<NetworkFilter>>,
>>(NetworkFilterList::VT_NETWORK_FILTERS, None)
.unwrap()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

None,
)
.unwrap()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

src/blocker.rs Outdated
use crate::filters::network::{NetworkFilter, NetworkFilterMaskHelper, NetworkMatchable};
use crate::optimizer;
use crate::filters::network::{NetworkFilter, NetworkFilterMaskHelper};
pub(crate) use crate::network_filter_list::NetworkFilterList;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's drop the pub(crate) here, and change other imports to refer to it directly from crate::network_filter_list

src/lib.rs Outdated
@@ -24,6 +24,7 @@ mod data_format;
mod engine;
pub mod filters;
pub mod lists;
pub mod network_filter_list;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove pub here; nothing inside the module is public

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover? Do we use it somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iOS: I’ve just made the live tests an optional step in the GitHub workflow. At some point, we’ll need to update the CI (or the place where the .dat file for iOS is generated, I’m not sure) to generate the new version. This might also require an update to the browser code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonok-edm Do you know how and where the latest.dat file for iOS is generated? Is it part of the iOS browser, or is it a downloadable component?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover? Do we use it somewhere?

Right, we can remove rs-ABPFilterParserData.dat and related tests safely if we are no longer guaranteeing that older serialized versions can be deserialized. We only need to continue to test that the round trip from list text -> serialized -> Engine still works as expected.

Do you know how and where the latest.dat file for iOS is generated?

iOS switched to using plaintext files quite some time ago so that should be safe to remove as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve just made the live tests an optional step in the GitHub workflow.

IIRC some of the live tests use the list text, we should keep those but anything using .dat files can be removed.

@boocmp boocmp force-pushed the code_structure_5 branch from d8f525f to 0c9f488 Compare March 14, 2025 08:47
@boocmp boocmp force-pushed the code_structure_5 branch from 0c9f488 to cf5b754 Compare March 14, 2025 08:51
Copy link

[puLL-Merge] - brave/adblock-rust@439

Description

This PR makes several significant changes to the adblock-rust codebase:

  1. Removes Brave-specific tests and deserialization data
  2. Moves NetworkFilterList into its own module
  3. Upgrades dependencies (notably bitflags and seahash)
  4. Adds FlatBuffers support for potential future storage format

The main motivation appears to be code organization and modernization while removing company-specific test cases.

Changes

Changes

  1. .github/workflows/ci.yml:
  • Removes Brave-specific test configuration and execution
  1. Cargo.toml and Cargo.lock:
  • Updates bitflags to 2.9.0 with serde support
  • Updates seahash to 4.1.0
  • Adds flatbuffers 25.2.10 as optional dependency
  • Adds new feature flag flatbuffers-storage
  1. Source code organization:
  • Moves NetworkFilterList from blocker.rs to new network_filter_list.rs
  • Updates related imports and dependencies
  • Adds FlatBuffers schema and generated code
  • Updates test organization to match new module structure
  1. Dependencies:
  • Data files:
    • Removes rs-ABPFilterParserData.dat
sequenceDiagram
    participant Client
    participant Blocker
    participant NetworkFilterList
    participant Request
    participant Filter
    participant RegexManager

    Client->>Blocker: check_network_request(request)
    Blocker->>NetworkFilterList: check(request, tags, regex_mgr)
    NetworkFilterList->>Request: get_tokens_for_match()
    Request-->>NetworkFilterList: tokens
    loop For each token
        NetworkFilterList->>Filter: matches(request, regex_mgr)
        Filter->>RegexManager: check_matches(pattern)
        RegexManager-->>Filter: match result
        Filter-->>NetworkFilterList: match result
    end
    NetworkFilterList-->>Blocker: matching filter
    Blocker-->>Client: block decision
Loading

Security Hotspots

The seahash version upgrade from v3 to v4 introduces a breaking hash algorithm change, which could affect filter matching behavior and introduce edge cases where previously blocked content may pass through.

@atuchin-m
Copy link
Collaborator

@antonok-edm Could you please clarify what happens on iOS if we merge this version to b-c?

I suppose the browser will fail to decode the store .dat files and compile the engine from plaintext lists?
https://github.com/brave/brave-core/blob/master/ios/brave-ios/Sources/Brave/WebFilters/AdBlock/AdBlockGroupsManager.swift#L147-L164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants