Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Lucene Based k-NN search support - merge feature branch #486

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Aug 2, 2022

Description

Backport of #485 to 2.x, need to do it manually as CI bot cannot do multiple individual commits.

Issues Resolved

#380

Check List

  • New functionality includes testing.
    • All tests pass
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 and others added 8 commits August 2, 2022 16:01
Adds lucene as another k-NN engine that can be used. Adds one supported
method to the engine, "hnsw", with parameters "ef_construction" and "m".
In addition, adds some layers of abstraction to reduce code
depuplication.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit 7a76d11)
…project#454)

Adds support for Lucene's KnnVectorQuery type in KNNQueryBuilder.
KNNQueryBuilder logic was enhanced to detect when an engine does not
need to use KNNQuery type and build the Lucene KnnVectorQuery
instead.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit 67a1bce)
…h-project#456)

* Add field mapper and per field format

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit 1fb8047)
…-project#458)

* Refactor per field format to a class, add tests

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit 77d3b2e)
…#457)

Builds VectorField for LuceneFieldMapper so that indices using Lucene
can still use painless scripting.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit 7a7be09)
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit 39a07dd)
Adds query integration tests for the lucene engine. Moves spacetype to
lucene vectorsimilarityfunction translation to SpaceType enum for
testability. Makes fields in KNNMethodContext non-null.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit 37bdc74)
…rch-project#483)

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit 8624725)
@martin-gaievski martin-gaievski added Enhancements Increases software capabilities beyond original client specifications 2.2.0 labels Aug 2, 2022
@martin-gaievski martin-gaievski requested a review from a team August 2, 2022 23:06
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2022

Codecov Report

Merging #486 (8493cac) into 2.x (b217105) will decrease coverage by 0.03%.
The diff coverage is 85.51%.

@@             Coverage Diff              @@
##                2.x     #486      +/-   ##
============================================
- Coverage     84.09%   84.06%   -0.04%     
- Complexity      979     1019      +40     
============================================
  Files           140      146       +6     
  Lines          4031     4186     +155     
  Branches        362      373      +11     
============================================
+ Hits           3390     3519     +129     
- Misses          475      491      +16     
- Partials        166      176      +10     
Impacted Files Coverage Δ
...n/java/org/opensearch/knn/common/KNNConstants.java 93.33% <ø> (ø)
...ava/org/opensearch/knn/index/KNNMethodContext.java 92.50% <ø> (ø)
...earch/knn/index/memory/NativeMemoryAllocation.java 82.92% <ø> (ø)
.../java/org/opensearch/knn/index/query/KNNQuery.java 83.33% <ø> (ø)
...org/opensearch/knn/index/query/KNNQueryResult.java 100.00% <ø> (ø)
...java/org/opensearch/knn/index/query/KNNScorer.java 69.23% <ø> (ø)
...java/org/opensearch/knn/index/query/KNNWeight.java 78.94% <ø> (ø)
...main/java/org/opensearch/knn/jni/FaissService.java 85.71% <ø> (ø)
...c/main/java/org/opensearch/knn/jni/JNIService.java 85.71% <ø> (ø)
...ain/java/org/opensearch/knn/jni/NmslibService.java 85.71% <ø> (ø)
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

for (int j = 0; j < k; j++) {
float[] primitiveArray = Floats.toArray(Arrays.stream(knnResults.get(j).getVector()).collect(Collectors.toList()));
float distance = TestUtils.computeDistFromSpaceType(spaceType, primitiveArray, queryVector);
float rawScore = VECTOR_SIMILARITY_TO_SCORE.get(spaceType.getVectorSimilarityFunction()).apply(distance);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is required to address breaking change in Lucene 9.3. Method convertToScore
existis in Lucene 9.2 but has been removed in Lucene 9.3

@martin-gaievski martin-gaievski merged commit aebff80 into opensearch-project:2.x Aug 3, 2022
@junqiu-lei junqiu-lei added Features Introduces a new unit of functionality that satisfies a requirement and removed Enhancements Increases software capabilities beyond original client specifications labels Aug 4, 2022
@heemin32 heemin32 added v2.2.0 and removed 2.2.0 labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement v2.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants