-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add notes on indexing to kNN search guide #83188
Conversation
This change adds a new 'indexing considerations' section that explains why index calls can be slow and how force merge can help search latency.
|
||
Similarly, you can decrease `num_candidates` for faster searches with | ||
potentially less accurate results. | ||
|
||
[discrete] | ||
[[knn-indexing-considerations]] | ||
==== Indexing considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very open to suggestions for a better name :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we put this section in knnSearch guide
? Isn't a better place for it the indexing section of dense-vector
document?
Or we provide these kind of suggestions only in reference guides as this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering this myself and would appreciate @jrodewig's opinion here. I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very open to suggestions for a better name :)
I think Indexing considerations
is fine. However, you could do Indexing speed
if wanted.
I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.
Yep. That's pretty much my model of these docs. I'm okay with linking from the dense-vector
docs to this section, but I think this content is a little too dense and specific to approximate kNN search for the dense-vector
reference docs.
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-docs (Team:Docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks Julie, the changes LGTM. I am just not sure what's the best place for this next section.
|
||
Similarly, you can decrease `num_candidates` for faster searches with | ||
potentially less accurate results. | ||
|
||
[discrete] | ||
[[knn-indexing-considerations]] | ||
==== Indexing considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we put this section in knnSearch guide
? Isn't a better place for it the indexing section of dense-vector
document?
Or we provide these kind of suggestions only in reference guides as this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall with @mayya-sharipova's edits.
I left some suggestions that swap the ANN
reference to HNSW graphs, but I'd like for you to take a look before accepting it. The other suggestions are non-blocking. Feel free to ignore those if wanted.
Thanks @jtibshirani!
|
||
Similarly, you can decrease `num_candidates` for faster searches with | ||
potentially less accurate results. | ||
|
||
[discrete] | ||
[[knn-indexing-considerations]] | ||
==== Indexing considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very open to suggestions for a better name :)
I think Indexing considerations
is fine. However, you could do Indexing speed
if wanted.
I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.
Yep. That's pretty much my model of these docs. I'm okay with linking from the dense-vector
docs to this section, but I think this content is a little too dense and specific to approximate kNN search for the dense-vector
reference docs.
Indexing vectors for approximate kNN search can take substantial time because | ||
of how expensive it is to build the ANN index structures. You may need to | ||
increase the client request timeout for index and bulk requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ANN
, we may want to mention HNSW graphs here. We talk about them in the next paragraph without much of an introduction.
I took a stab at a suggestion, but feel free to edit or ignore if wanted.
Indexing vectors for approximate kNN search can take substantial time because | |
of how expensive it is to build the ANN index structures. You may need to | |
increase the client request timeout for index and bulk requests. | |
{es} shards are composed of segments, which are internal storage elements in the | |
index. For approximate kNN search, {es} stores the dense vector values of each | |
segment as an https://arxiv.org/abs/1603.09320[HNSW graph]. Indexing vectors for | |
approximate kNN search can take substantial time because of how expensive it is | |
to build these graphs. You may need to increase the client request timeout for | |
index and bulk requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reorganization works nicely, I'll adopt it.
Thanks for the helpful comments! |
This change adds a new 'indexing considerations' section that explains why index calls can be slow and how force merge can help search latency.
* upstream/master: (100 commits) Avoid duplicate _type fields in v7 compat layer (elastic#83239) Bump bundled JDK to 17.0.2+8 (elastic#83243) [DOCS] Correct header syntax (elastic#83275) Add unit tests for indices.recovery.max_bytes_per_sec default values (elastic#83261) [DOCS] Add note that write indices are not replicated (elastic#82997) Add notes on indexing to kNN search guide (elastic#83188) Fix get-snapshot-api :docs:integTest (elastic#83273) FilterPathBasedFilter support match fieldname with dot (elastic#83178) Fix compilation issues in example-plugins (elastic#83258) fix ClusterStateListener javadoc (elastic#83246) Speed up Building Indices Lookup in Metadata (elastic#83241) Mute whole suite for elastic#82502 (elastic#83252) Make PeerFinder log messages happier (elastic#83222) [Docs] Add supported _terms_enum field types (elastic#83244) Add an aggregator for IPv4 and IPv6 subnets (elastic#82410) [CI] Fix 70_time_series/default sort yaml test failures (elastic#83217) Update test-failure Issue Template to include "needs:triage" label elastic#83226 Add an index->step cache to the PolicyStepsRegistry (elastic#82316) Improve support for joda datetime to java datetime transition in Painless (elastic#83099) Fix joda migration for week based methods in Painless (elastic#83232) ... # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
This change adds a new 'indexing considerations' section that explains why index
calls can be slow and how force merge can help search latency.