Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add notes on indexing to kNN search guide #83188

Merged
merged 3 commits into from
Jan 28, 2022

Conversation

jtibshirani
Copy link
Contributor

This change adds a new 'indexing considerations' section that explains why index
calls can be slow and how force merge can help search latency.

This change adds a new 'indexing considerations' section that explains why index
calls can be slow and how force merge can help search latency.

Similarly, you can decrease `num_candidates` for faster searches with
potentially less accurate results.

[discrete]
[[knn-indexing-considerations]]
==== Indexing considerations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very open to suggestions for a better name :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we put this section in knnSearch guide? Isn't a better place for it the indexing section of dense-vector document?
Or we provide these kind of suggestions only in reference guides as this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering this myself and would appreciate @jrodewig's opinion here. I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very open to suggestions for a better name :)

I think Indexing considerations is fine. However, you could do Indexing speed if wanted.

I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.

Yep. That's pretty much my model of these docs. I'm okay with linking from the dense-vector docs to this section, but I think this content is a little too dense and specific to approximate kNN search for the dense-vector reference docs.

@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories >docs General docs changes v8.0.0 labels Jan 27, 2022
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Docs Meta label for docs team labels Jan 27, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@jrodewig jrodewig self-requested a review January 27, 2022 02:10
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks Julie, the changes LGTM. I am just not sure what's the best place for this next section.


Similarly, you can decrease `num_candidates` for faster searches with
potentially less accurate results.

[discrete]
[[knn-indexing-considerations]]
==== Indexing considerations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we put this section in knnSearch guide? Isn't a better place for it the indexing section of dense-vector document?
Or we provide these kind of suggestions only in reference guides as this one?

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with @mayya-sharipova's edits.

I left some suggestions that swap the ANN reference to HNSW graphs, but I'd like for you to take a look before accepting it. The other suggestions are non-blocking. Feel free to ignore those if wanted.

Thanks @jtibshirani!


Similarly, you can decrease `num_candidates` for faster searches with
potentially less accurate results.

[discrete]
[[knn-indexing-considerations]]
==== Indexing considerations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very open to suggestions for a better name :)

I think Indexing considerations is fine. However, you could do Indexing speed if wanted.

I was thinking of this reference as a combined "how to" for kNN search, where we offer guidance about what method to choose, other best practices, etc. I think of the API and field type docs more as specific references that don't give high-level guidance like this.

Yep. That's pretty much my model of these docs. I'm okay with linking from the dense-vector docs to this section, but I think this content is a little too dense and specific to approximate kNN search for the dense-vector reference docs.

Comment on lines 166 to 168
Indexing vectors for approximate kNN search can take substantial time because
of how expensive it is to build the ANN index structures. You may need to
increase the client request timeout for index and bulk requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ANN, we may want to mention HNSW graphs here. We talk about them in the next paragraph without much of an introduction.

I took a stab at a suggestion, but feel free to edit or ignore if wanted.

Suggested change
Indexing vectors for approximate kNN search can take substantial time because
of how expensive it is to build the ANN index structures. You may need to
increase the client request timeout for index and bulk requests.
{es} shards are composed of segments, which are internal storage elements in the
index. For approximate kNN search, {es} stores the dense vector values of each
segment as an https://arxiv.org/abs/1603.09320[HNSW graph]. Indexing vectors for
approximate kNN search can take substantial time because of how expensive it is
to build these graphs. You may need to increase the client request timeout for
index and bulk requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reorganization works nicely, I'll adopt it.

@jtibshirani
Copy link
Contributor Author

Thanks for the helpful comments!

@jtibshirani jtibshirani merged commit e7ba03e into elastic:master Jan 28, 2022
@jtibshirani jtibshirani deleted the knn-docs branch January 28, 2022 18:23
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Jan 28, 2022
This change adds a new 'indexing considerations' section that explains why index
calls can be slow and how force merge can help search latency.
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jan 31, 2022
* upstream/master: (100 commits)
  Avoid duplicate _type fields in v7 compat layer (elastic#83239)
  Bump bundled JDK to 17.0.2+8 (elastic#83243)
  [DOCS] Correct header syntax (elastic#83275)
  Add unit tests for indices.recovery.max_bytes_per_sec default values (elastic#83261)
  [DOCS] Add note that write indices are not replicated (elastic#82997)
  Add notes on indexing to kNN search guide (elastic#83188)
  Fix get-snapshot-api :docs:integTest (elastic#83273)
  FilterPathBasedFilter support match fieldname with dot (elastic#83178)
  Fix compilation issues in example-plugins (elastic#83258)
  fix ClusterStateListener javadoc (elastic#83246)
  Speed up Building Indices Lookup in Metadata (elastic#83241)
  Mute whole suite for elastic#82502 (elastic#83252)
  Make PeerFinder log messages happier (elastic#83222)
  [Docs] Add supported _terms_enum field types (elastic#83244)
  Add an aggregator for IPv4 and IPv6 subnets (elastic#82410)
  [CI] Fix 70_time_series/default sort yaml test failures (elastic#83217)
  Update test-failure Issue Template to include "needs:triage" label elastic#83226
  Add an index->step cache to the PolicyStepsRegistry (elastic#82316)
  Improve support for joda datetime to java datetime transition in Painless (elastic#83099)
  Fix joda migration for week based methods in Painless (elastic#83232)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/TransportRollupAction.java
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Relevance/Vectors Vector search Team:Docs Meta label for docs team Team:Search Meta label for search team v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants