Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: emit INFO when performing kmeans #219

Merged
merged 1 commit into from
Mar 24, 2025

Conversation

usamoi
Copy link
Contributor

@usamoi usamoi commented Mar 24, 2025

  1. add build phase names
  2. emit kmeans log
  3. emit errors if both Dot and residual_quantiztion are used
  4. update tuples_done by estimated value before insertions
  5. update tuples_done after insertions are done
  6. emit errors if vchordrq_prewarm is given a partition index
  7. rename maxsim_ip to maxsim

Signed-off-by: usamoi <usamoi@outlook.com>
@usamoi usamoi requested a review from Copilot March 24, 2025 10:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds logging for build phase names when performing kmeans, updates error emission and validation checks (including for residual_quantization), and renames operator classes and related functions from the deprecated “maxsim_ip” variants to their current “maxsim” forms.

  • Introduces a new BuildPhase enum and emits progress information during index build.
  • Adjusts k-means iteration thresholds and updates threadpool and check function signatures.
  • Renames operator functions and their callers, and refines validation messages related to vector options.

Reviewed Changes

Copilot reviewed 9 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/index/functions.rs Refactored index opening code to encapsulate the raw pointer, improving resource management.
crates/algorithm/src/types.rs Modified vector options validation range, removing an explicit upper bound and tightening valid limits.
crates/k_means/src/lib.rs Updated k-means iteration thresholds and adjusted the check callback to accept an index parameter.
src/index/scanners/maxsim.rs Updated to use renamed maxsim variants instead of the deprecated “maxsim_ip” operator classes.
src/index/opclass.rs Renamed functions and matching operator classes to reflect the removal of “_ip” suffixes.
src/datatype/operators_vector.rs Renamed the maxsim operator function consistent with other changes.
src/datatype/operators_halfvec.rs Similar renaming applied for halfvec operator functions.
src/index/am/am_build.rs Added build phase emission for progress reporting and updated various validation and logging calls.
Files not reviewed (3)
  • src/sql/finalize.sql: Language not supported
  • tests/logic/multivector.slt: Language not supported
  • tests/logic/partition.slt: Language not supported
Comments suppressed due to low confidence (4)

crates/algorithm/src/types.rs:54

  • The previous upper range limit (max = 1_048_575) was removed. Please verify that removing the maximum boundary is intentional and that inputs exceeding 60000 are handled appropriately in downstream logic.
    #[validate(range(min = 1))]

crates/k_means/src/lib.rs:43

  • Thresholds in the condition have been updated from 1000 to 1024. Confirm that this change aligns with the intended algorithm behavior and that all related components reflect this new threshold.
if n >= 1024 && c >= 1024 {

src/index/am/am_build.rs:20

  • [nitpick] The BuildPhase names (e.g. 'Initializing', 'InternalBuild', 'Build') are somewhat similar. Consider providing more distinct names or descriptions to clearly differentiate each phase in the logs and progress updates.
pub enum BuildPhase {

src/index/am/am_build.rs:170

  • The explicit checks for 'vector_options.dims' being 0 or exceeding an upper limit have been removed. Ensure that the new validation logic in Validate::validate fully covers these conditions.
if vector_options.dims == 0 {

@usamoi usamoi merged commit 7edb287 into tensorchord:main Mar 24, 2025
15 checks passed
@VoVAllen
Copy link
Member

VoVAllen commented Mar 24, 2025

emit errors if both Dot and residual_quantiztion are used

WARNING should be fine?

emit errors if vchordrq_prewarm is given a partition index

Why? How should prewarm work with partition table and index?

@usamoi
Copy link
Contributor Author

usamoi commented Mar 25, 2025

Why? How should prewarm work with partition table and index?

It never works. This pr improves error message.

It could work as long as prewarm function prewarms all partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants