Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchParameters support for IndexBinaryFlat #4055

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

gustavz
Copy link

@gustavz gustavz commented Dec 4, 2024

Context issue: #3503

We need search params support for binary flat index to be able to use it in RAG applications that support search with pre-filtering.

@facebook-github-bot
Copy link
Contributor

Hi @gustavz!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@gustavz gustavz changed the title Search params support for IndexBinaryFlat SearchParameters support for IndexBinaryFlat Dec 4, 2024
@gustavz
Copy link
Author

gustavz commented Dec 11, 2024

@mnorris11 whats the process of getting this PR reviewed and merged?

@mnorris11
Copy link

@gustavz We triaged it internally (the implementation tag), so someone will review it soon. We have an internal task tracking its review.

Copy link
Contributor

@asadoughi asadoughi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request! It looks like the PR is missing the changes to hamming.h for the additional IDSelector argument. Could you add those, please?

@gustavz gustavz force-pushed the gustavz/search_params_support_for_index_binary_flat branch from 2717b86 to d70e64b Compare December 12, 2024 08:23
@gustavz gustavz requested a review from asadoughi December 12, 2024 08:23
Copy link
Contributor

@asadoughi asadoughi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few more tweaks

@gustavz gustavz force-pushed the gustavz/search_params_support_for_index_binary_flat branch from 539dfae to cd8b5d3 Compare December 16, 2024 09:00
@gustavz gustavz requested a review from asadoughi December 16, 2024 09:13
@gustavz gustavz requested a review from asadoughi December 24, 2024 08:07
@mnorris11
Copy link

Hi @gustavz seems like there is a small compilation error, it's asking for faiss::IDSelector instead of IDSelector?

@gustavz
Copy link
Author

gustavz commented Jan 14, 2025

sorry was on vacation over the new year, pls re-review @asadoughi @mnorris11

@mnorris11
Copy link

(except in the last step, instead of conda installing faiss, run the cmake commands from INSTALL.md)

@mnorris11 after creating the conda env with the steps from your comment from which step needs the INSTALL.md need to be run? all?

Update: It hangs currently at make -C build install with

make -C build install
(faiss_env) (base) gustav.vonzitzewitz@gvonzitzewitz-me-PVYYY faiss % make -C build install
make: Entering directory '/Users/gustav.vonzitzewitz/oss/faiss/build'
[ 55%] Built target faiss
[ 55%] Built target faiss_python_callbacks
[ 58%] Built target swigfaiss_swig_compilation
[ 58%] Built target swigfaiss
[ 58%] Built target faiss_example_external_module_swig_compilation
[ 58%] Built target faiss_example_external_module
[ 62%] Built target gtest
[ 62%] Built target gtest_main
[ 62%] Building CXX object tests/CMakeFiles/faiss_test.dir/test_hamming.cpp.o
/Users/gustav.vonzitzewitz/oss/faiss/tests/test_hamming.cpp:277:24: error: cannot initialize a member subobj
ect of type 'TI *' (aka 'long long *') with an rvalue of type 'value_type *' (aka 'long *')
  277 |                 na, k, ids_gen.data(), dist_gen.data()};
      |                        ^~~~~~~~~~~~~~
/Users/gustav.vonzitzewitz/oss/faiss/tests/test_hamming.cpp:286:13: error: no viable overloaded '='
  286 |         res = {na, k, ids_ham_knn.data(), dist_ham_knn.data()};
      |         ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/gustav.vonzitzewitz/oss/faiss/faiss/utils/Heap.h:471:8: note: candidate function (the implicit copy a
ssignment operator) not viable: cannot convert initializer list argument to 'const HeapArray<CMax<int, long 
long>>'
  471 | struct HeapArray {
      |        ^~~~~~~~~
/Users/gustav.vonzitzewitz/oss/faiss/faiss/utils/Heap.h:471:8: note: candidate function (the implicit move a
ssignment operator) not viable: cannot convert initializer list argument to 'HeapArray<CMax<int, long long>>
'
  471 | struct HeapArray {
      |        ^~~~~~~~~
2 errors generated.
make[2]: *** [tests/CMakeFiles/faiss_test.dir/build.make:513: tests/CMakeFiles/faiss_test.dir/test_hamming.c
pp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2440: tests/CMakeFiles/faiss_test.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
make: Leaving directory '/Users/gustav.vonzitzewitz/oss/faiss/build'

Got it, will hold off on running CI until compilation error is fixed.

@gustavz
Copy link
Author

gustavz commented Feb 20, 2025

@mnorris11 I dont really get this error, it does not seem to be related to my PR.
could it be that I built faiss with the wrong flags or something? Also this error didnt show up in CI so far, right?

Can you share a full flow of how to build and install faiss locally and how to run tests on an ARM macbook?

In your comment #3865 (comment) your command conda install -y -q python=3.11 cmake make swig mkl=2023 mkl-devel=2023 numpy scipy pytest gxx_linux-64=11.2 sysroot_linux-64 gflags cant be executed on ARM mac, also this should install python 3.11, but in your comment you show an env with Python 3.12.5 running. Also gxx_linux-64=11.2 sysroot_linux-64 doesnt look like a fit for ARM macs...

It can very well be that I am missing something, but so far I wasnt able to get to running tests :/

@mnorris11
Copy link

@mnorris11 I dont really get this error, it does not seem to be related to my PR. could it be that I built faiss with the wrong flags or something? Also this error didnt show up in CI so far, right?

Can you share a full flow of how to build and install faiss locally and how to run tests on an ARM macbook?

In your comment #3865 (comment) your command conda install -y -q python=3.11 cmake make swig mkl=2023 mkl-devel=2023 numpy scipy pytest gxx_linux-64=11.2 sysroot_linux-64 gflags cant be executed on ARM mac, also this should install python 3.11, but in your comment you show an env with Python 3.12.5 running. Also gxx_linux-64=11.2 sysroot_linux-64 doesnt look like a fit for ARM macs...

It can very well be that I am missing something, but so far I wasnt able to get to running tests :/

Yes, mkl is an Intel library, so it won't work on ARM. Instead try those steps but with openblas instead of mkl=2023 mkl-devel=2023.

@mnorris11
Copy link

mnorris11 commented Feb 20, 2025

You can also find the dependencies that build libfaiss and faiss-cpu here: https://github.com/facebookresearch/faiss/blob/main/conda/faiss/meta.yaml

You would be interested in items marked
# [osx and arm64]
or
# [not x86_64]
or having no selector (the comment on the same line as a dependency).

@gustavz
Copy link
Author

gustavz commented Feb 21, 2025

So I found a playbook that works to get tests running. You might want to share this with other macOS users @mnorris11 :

Prepare Environment

  1. brew install miniconda llvm openblas libomp
  2. conda remove --name faiss_env --all -y in case there's an existing one
  3. conda create -n faiss_env
  4. conda activate faiss_env
  5. conda install -y -q -c conda-forge python=3.11 cmake make swig numpy scipy pytest gflags blas-devel openblas
  6. which python3 should print somewhere like miniconda3/bin/python3

Build faiss and run tests

  1. cmake -B build . -DBUILD_TESTING=OFF -DFAISS_ENABLE_GPU=OFF
  2. make -C build -j faiss
  3. make -C build -j swigfaiss
  4. (cd build/faiss/python && python setup.py install)
  5. PYTHONPATH="$(ls -d ./build/faiss/python/build/lib*/)" pytest tests/test_*.py

@gustavz
Copy link
Author

gustavz commented Feb 21, 2025

In terms of tests, there seems to be something wrong with search result order.

@mnorris11
Copy link

Thanks for documenting the Mac steps!

In terms of tests, there seems to be something wrong with search result order.

Let us know if you want assistance with the test. Feel free to attach more details.

@gustavz
Copy link
Author

gustavz commented Feb 24, 2025

@mnorris11 I wasnt able to fix the tests. Out of all binary test cases only the range case passes:

    def test_BinaryFlat(self):
        self.do_test_id_selector("BinaryFlat")

    def test_BinaryFlat_id_range(self):
        self.do_test_id_selector("BinaryFlat", id_selector_type="range")

    def test_BinaryFlat_id_array(self):
        self.do_test_id_selector("BinaryFlat", id_selector_type="array")

    def test_BinaryFlat_no_heap(self):
        self.do_test_id_selector("BinaryFlat", use_heap=False)

To my current understanding the key issue appears to be in how the binary index handles search results with equal distances:

  • For binary (Hamming) distances, many vectors can have exactly the same distance since distances are discrete integers
  • When multiple vectors have the same distance, their ordering becomes important
  • The range selector preserves order because it's just checking a simple numeric range
  • The other selectors may change the effective ordering of results when filtering, leading to different results than the reference implementation

Is my assessment correct and do you know how to best solve this?

@facebook-github-bot
Copy link
Contributor

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mnorris11
Copy link

@mnorris11 I wasnt able to fix the tests. Out of all binary test cases only the range case passes:

    def test_BinaryFlat(self):
        self.do_test_id_selector("BinaryFlat")

    def test_BinaryFlat_id_range(self):
        self.do_test_id_selector("BinaryFlat", id_selector_type="range")

    def test_BinaryFlat_id_array(self):
        self.do_test_id_selector("BinaryFlat", id_selector_type="array")

    def test_BinaryFlat_no_heap(self):
        self.do_test_id_selector("BinaryFlat", use_heap=False)

To my current understanding the key issue appears to be in how the binary index handles search results with equal distances:

  • For binary (Hamming) distances, many vectors can have exactly the same distance since distances are discrete integers
  • When multiple vectors have the same distance, their ordering becomes important
  • The range selector preserves order because it's just checking a simple numeric range
  • The other selectors may change the effective ordering of results when filtering, leading to different results than the reference implementation

Is my assessment correct and do you know how to best solve this?

You're right that they are equal distances, printing out Dref vs Dnew is exactly the same.

Since the distances are equal, either can be returned and it is still correct. There is nothing technically wrong with the results.

For binary indexes in do_test_id_selector, we can just relax the id check. As long as the returned IDs are valid, and as long as the distances match, the test should pass.

@gustavz
Copy link
Author

gustavz commented Feb 28, 2025

@mnorris11 great, updated tests, all pass now, gtg!

@mnorris11 mnorris11 dismissed asadoughi’s stale review February 28, 2025 17:02

changes were made. Clicking "dismiss" to try to bring back Import to fbsource button...

@facebook-github-bot
Copy link
Contributor

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@gustavz
Copy link
Author

gustavz commented Mar 4, 2025

Anything to do on my side @mnorris11 ?

@gtwang01
Copy link
Contributor

gtwang01 commented Mar 6, 2025

Hey Gustav - can you fix the lint errors and update the branch? I'll help you merge this PR from there.

@gustavz
Copy link
Author

gustavz commented Mar 7, 2025

@gtwang01 I cant see the tests
Screenshot 2025-03-07 at 08 29 15

@gustavz
Copy link
Author

gustavz commented Mar 10, 2025

Hey Gustav - can you fix the lint errors and update the branch? I'll help you merge this PR from there.

Please provide info how I can reproduce/see the lint errors locally @gtwang01

@gtwang01
Copy link
Contributor

I'll just provide them here:

Trailing whitespace
fbcode/faiss/tests/test_search_params.py
Line 26
Line 27
Line 28
Line 29
Line 30

Blank line contains whitespace
fbcode/faiss/tests/test_search_params.py
Line 154
Line 159
Line 169
Line 171
Line 183
Line 190
Line 192

@gustavz
Copy link
Author

gustavz commented Mar 11, 2025

@gtwang01 I think it's good to merge now

@gtwang01 gtwang01 requested review from gtwang01 and removed request for mnorris11 March 11, 2025 21:06
@gtwang01 gtwang01 self-assigned this Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants