-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RaBitQ implementation #4235
base: main
Are you sure you want to change the base?
RaBitQ implementation #4235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for deferring the SIMD optimization to later.
I left a few comments.
faiss/IndexIVF.h
Outdated
*/ | ||
virtual InvertedListScanner* get_InvertedListScanner( | ||
bool store_pairs = false, | ||
const IDSelector* sel = nullptr) const; | ||
|
||
/** Get a scanner for this index (store_pairs means ignore labels). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to replace get_InvertedListScanner altogether with the version that takes IVFSearchParameters (and no sel, since IDSelector is a field of IVFSearchParameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the backward compatibility? This is why I introduced get_InvertedListScanner_2
. I mean that I can upgrade the method signature Faiss-wide, but what about the external code?
FAISS_ASSERT(codes != nullptr); | ||
FAISS_ASSERT(x != nullptr); | ||
|
||
if (n == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this test before the asserts
faiss/impl/RaBitQuantizer.cpp
Outdated
|
||
struct FactorsData { | ||
// ||or - c|| | ||
float factor_0 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not give the fields proper names? Otherwise you might as well use float factors[4]
faiss/impl/RaBitQuantizer.cpp
Outdated
const uint8_t* query_j = rearranged_rotated_qq.data() + j * di_8b; | ||
|
||
// process 64-bit popcounts | ||
unsigned long long count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use explicity sized integer types (eg uint64_t)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do. As far as I remember, I've used unsigned long long
, bcz there were problems with compilations on MacOS
for these __builtin_popcount
functions
float factor_2 = 0; | ||
// ||or||^2 | ||
float factor_3 = 0; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my implementation there are only 2 floats per database vector. Do you store additional ones for efficiency?
@mdouze two more comments after the discussion First. virtual InvertedListScanner* get_InvertedListScanner(
bool store_pairs = false,
const IDSelector* sel = nullptr,
const IVFSearchParameters* params = nullptr) const; because of the following logic that can override void IndexIVF::search_preassigned(...) const {
...
IDSelector* sel = params ? params->sel : nullptr;
const IDSelectorRange* selr = dynamic_cast<const IDSelectorRange*>(sel);
if (selr) {
if (selr->assume_sorted) {
sel = nullptr; // use special IDSelectorRange processing
} else {
selr = nullptr; // use generic processing
}
}
....
} Please let me know your thoughts. Second. RaBitQ uses 3 factors struct FactorsData {
float or_minus_c_l2sqr = 0;
float dp_multiplier = 0;
// this is needed to support BOTH L2 and IP on the same data
float or_l2sqr = 0;
}; The third one |
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
This is a reference implementation of the https://arxiv.org/pdf/2405.12497
The goal is to correctly set up the internals using Faiss.
The following comments for the implementation:
RaBitQ
includes random matrix rotation as a part of it, but I've decided to rely on externalfaiss::IndexPreTransform
andfaiss::RandomRotationMatrix
facilities.faiss::IndexIVF
, but I did that as least invasive as possible, without breaking the backward compatibility.METRIC_INNER_PRODUCT
is supported as wellsimdlib
facilities may be added later, if neededHere's how to use IndexRaBitQ
Here's how to use IndexIVFRaBitQ