-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VectorChord Queries Extremely Slow Compared to pgvecto.rs (~7.5M Entries, v0.2.1 in Docker) #208
Comments
The nlists value is too large, for 7.5M vectors, I think 8192 is enough. And for the query, can you show the results of like
|
returns:
(The two query plans in my initial message were run with I think the table count will probably reach around 20-30 million at some point, that is why I already set it to 20000, is it unwise to already have it at this count now? |
Does you get the result as the first query or it's done by multiple rounds? Can you also add result with
6s is too slow. It should be with in 10ms level for 7.5m vectors. One possible reason is that some IO occured to read data from HDD, so it's slow. |
I am not quite sure what you mean with your first question.
Edit: Ah, and for this query JIT is off, vchordrq.probes = 2000, effective_io_concurrency = 10. Edit 2: This is a second query, with probes = 1000, with a an embedding from another random image:
|
It's expected to be cached. Otherwise it need to read pages from HDD, which is expected to be super slow. You can see It's the similar case for pgvecto.rs, the latency is based on everything already in memory, without any read from the disk. |
What would be the correct syntax for Running |
Not |
vchordrq_prewarm is for the disk-based scenario, which put quantized vector in memory, and full vector on disk. However, we expect user using performant SSD in this scenario. HDD is too slow |
Thank you for your time and all the information! It completed the commands, but it seems the DB has insufficient memory available. The previous Explain command with buffers for another embedding:
I guess to obtain high query speeds, I will either need to allocate more RAM (but the mainboard does not support more than 128GB) or add some SSDs to the server and move the database to a purely SSD volume. As this is expected behavior and I know what next steps I can take, you can feel free to close this. |
Yes. You can try increase shared_buffer size or add more hardwares. And do |
I've recently migrated my image similarity database at home from
pgvecto.rs
toVectorChord
, using the latest Docker image (tensorchord/vchord-postgres:pg16-v0.2.1
). Since migrating, I’ve experienced significantly slower query times—often magnitudes slower than before—despite utilizing indexed queries.Database & Setup Details:
(click to expand)
tensorchord/vchord-postgres:pg16-v0.2.1
Table & Index Definition:
(click to expand)
Example Query (Simple case):
Even the simplest similarity query:
is very slow with an execution time of typically 6-60 seconds (previously near-instant with pgvecto.rs).
CPU and disk I/O utilization remain close to non-existent during the query.
Query Plan (500 probes):
Query Plan (2000 probes, ~10% lists, >120s):
(click to expand)
Additional Notes:
probes
(tested ranges 15-3000) andepsilon
(tested 1.0 - 1.9).pgvecto.rs
.Expected Behavior:
I expect queries utilizing the VChordRQ index to be at least in a comparable performance range to the previously used HNSW index provided by
pgvecto.rs
. Currently, query performance is prohibitively slow for practical use.But the indexing speed is much higher, the disk space much lower, and the CPU usage also much lower, so that is great 👍.
Questions:
I assume I am doing something wrong, but as far as I can tell from the currently available documentation I did everything as advised / documented.
Do you perhaps have any idea if I am missing something crucial in my configuration or usage?
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered: