-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Handling, Update Stability, Improved Java SDK #402
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes ClickHouse/ClickHouse#61780 Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
Relates to the #377 and the comment: #377 (comment) This temporarily disables the failing CI pipeline to generate and update docs.
This commit drops `std::vector` dependency, making compilation time shorter and error handling universal across abstraction layers.
In the past, if we got "too lucky" traversing the graph, we could exit early before accumulating K top matches, even if the index had more than K entries. This patch changes that behavior, making output more predicatable.
This patch addresses the issue #399, originally observed in the Swift layer. Reimplementing it in C++ helped locate the issue and lead to refactoring the `update` procedure in the lowest-layer `index_gt`. Now, `add` and `update` share less code. The `add` is one branch shorter (not that it would be noticeable), and `update` brings additional logic to avoid spilling `updated_slot` into top-results and consequently introducing self-loops. Closes #399
Relates to #355
Both `view` and `load` would `reset` the thread contexts. After that, the very first `search` and `add` would fail, as no thread-local contexts are initialized. It would require a `reserve` call with a non-zero second arcgument to define the number of concurrent threads, for which the queues & buffers need to be allocated. That design is counter-intuitive, so this patch re-inits the same number of threads as before the `load` & `view` or one, if none existed.
Co-authored-by: Adolfo Garcia <1250775+adolfogc@users.noreply.github.com>
As noted by Robert Schulze, we can avoid populating `slot_lookup_` during insertions, if `enable_key_lookups` is not set. This would lead to lower memory consumption for large indexes of tiny vectors, particularly common in GIS. Co-authored-by: Robert Schulze <robert@clickhouse.com>
This fixes a GitHub CI warning about the deprecated NodeJS version.
Closes #453 Co-authored-by: Andrew Dirksen <2702854+bddap@users.noreply.github.com>"
Closes #453 Co-authored-by: Andrew Dirksen <2702854+bddap@users.noreply.github.com>
The lack of capacity data is intended. Reserving memory is a non-persistent operation by nature, and we shouldn't save that metadata on the disk. Closes #452 Co-authored-by: Christopher Yim <4638193+GoodKnight@users.noreply.github.com>
This bug was tough to spot. Apple Clang was the only one that caught it. The `-O0` flags were explicitly added to expose more symbols for debugging. More `uint40_t` tests were added.
Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com>
19 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
USearch implementation had 2 layers, the core HNSW structure implemented in
index.hpp
and the high-level wrapper for dense equidimensional vectors inindex_dense.hpp
. In this release, we've made the top layer thinner and cleaner, also making this APIs more similar, error-handling more consistent, and builds faster.Reducing Dependencies & Accelerating Builds
Previously
index_dense.hpp
had the following STL includes:Those are some of the most common includes in C++ projects, also the ones I like the least. Here are a couple of reasons to hate them, taken from the "C++ Compile Health Watchdog":
<functional>
<vector>
<numeric>
<thread>
Reduced Memory Consumption for DBMS-like Users
Most databases using USearch, would prefer to have a smaller index at the cost of some functionality. As mentioned by @rschu1ze, if
enable_key_lookups
is disabled, and some external DBMS is responsible for "key to vector" mappings, the memory consumption of theindex_dense_gt
can be further reduced.Other Improvements
get
,loadFromPath
, andviewFromPath
APIs for Java by @mccullocht.#pragma region
warning was resolved by @mbautin.uint40_t
slot IDs have been fixed by @Ngalstyan4.b1x8_t
vectors have been fixed by @Ngalstyan4.