-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: emit INFO when performing kmeans #219
Conversation
Signed-off-by: usamoi <usamoi@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds logging for build phase names when performing kmeans, updates error emission and validation checks (including for residual_quantization), and renames operator classes and related functions from the deprecated “maxsim_ip” variants to their current “maxsim” forms.
- Introduces a new BuildPhase enum and emits progress information during index build.
- Adjusts k-means iteration thresholds and updates threadpool and check function signatures.
- Renames operator functions and their callers, and refines validation messages related to vector options.
Reviewed Changes
Copilot reviewed 9 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/index/functions.rs | Refactored index opening code to encapsulate the raw pointer, improving resource management. |
crates/algorithm/src/types.rs | Modified vector options validation range, removing an explicit upper bound and tightening valid limits. |
crates/k_means/src/lib.rs | Updated k-means iteration thresholds and adjusted the check callback to accept an index parameter. |
src/index/scanners/maxsim.rs | Updated to use renamed maxsim variants instead of the deprecated “maxsim_ip” operator classes. |
src/index/opclass.rs | Renamed functions and matching operator classes to reflect the removal of “_ip” suffixes. |
src/datatype/operators_vector.rs | Renamed the maxsim operator function consistent with other changes. |
src/datatype/operators_halfvec.rs | Similar renaming applied for halfvec operator functions. |
src/index/am/am_build.rs | Added build phase emission for progress reporting and updated various validation and logging calls. |
Files not reviewed (3)
- src/sql/finalize.sql: Language not supported
- tests/logic/multivector.slt: Language not supported
- tests/logic/partition.slt: Language not supported
Comments suppressed due to low confidence (4)
crates/algorithm/src/types.rs:54
- The previous upper range limit (max = 1_048_575) was removed. Please verify that removing the maximum boundary is intentional and that inputs exceeding 60000 are handled appropriately in downstream logic.
#[validate(range(min = 1))]
crates/k_means/src/lib.rs:43
- Thresholds in the condition have been updated from 1000 to 1024. Confirm that this change aligns with the intended algorithm behavior and that all related components reflect this new threshold.
if n >= 1024 && c >= 1024 {
src/index/am/am_build.rs:20
- [nitpick] The BuildPhase names (e.g. 'Initializing', 'InternalBuild', 'Build') are somewhat similar. Consider providing more distinct names or descriptions to clearly differentiate each phase in the logs and progress updates.
pub enum BuildPhase {
src/index/am/am_build.rs:170
- The explicit checks for 'vector_options.dims' being 0 or exceeding an upper limit have been removed. Ensure that the new validation logic in Validate::validate fully covers these conditions.
if vector_options.dims == 0 {
WARNING should be fine?
Why? How should prewarm work with partition table and index? |
It never works. This pr improves error message. It could work as long as prewarm function prewarms all partitions. |
Dot
andresidual_quantiztion
are usedtuples_done
by estimated value before insertionstuples_done
after insertions are donevchordrq_prewarm
is given a partition indexmaxsim_ip
tomaxsim