Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorChord Queries Extremely Slow Compared to pgvecto.rs (~7.5M Entries, v0.2.1 in Docker) #208

Closed
NetroScript opened this issue Mar 14, 2025 · 10 comments

Comments

@NetroScript
Copy link

NetroScript commented Mar 14, 2025

I've recently migrated my image similarity database at home from pgvecto.rs to VectorChord, using the latest Docker image (tensorchord/vchord-postgres:pg16-v0.2.1). Since migrating, I’ve experienced significantly slower query times—often magnitudes slower than before—despite utilizing indexed queries.

Database & Setup Details:
(click to expand)

  • Docker image: tensorchord/vchord-postgres:pg16-v0.2.1
  • Database Size: ~7.5 million entries.
  • Hardware:
    • 12-core CPU
    • 128 GB RAM
    • HDDs using additional memory cache
  • Docker Compose Configuration:
version: "3.8"
name: image-db
services:
  db:
    image: tensorchord/vchord-postgres:pg16-v0.2.1
    restart: always
    shm_size: 72gb
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DB}
    ports:
      - 5569:5432
    volumes:
      - /mnt/data/databases/image-data:/var/lib/postgresql/data/
    command:
      - postgres
      - -c
      - max_connections=72
      - -c
      - shared_buffers=48GB
      - -c
      - effective_cache_size=112GB
      - -c
      - maintenance_work_mem=2GB
      - -c
      - checkpoint_completion_target=0.9
      - -c
      - wal_buffers=16MB
      - -c
      - default_statistics_target=100
      - -c
      - random_page_cost=4
      - -c
      - effective_io_concurrency=2
      - -c
      - work_mem=209715kB
      - -c
      - huge_pages=try
      - -c
      - min_wal_size=1GB
      - -c
      - max_wal_size=4GB
      - -c
      - max_worker_processes=4
      - -c
      - max_parallel_workers_per_gather=2
      - -c
      - max_parallel_workers=4
      - -c
      - max_parallel_maintenance_workers=2
      - -c
      - vchordrq.prewarm_dim=64,128,256,384,512,768,1024,1152,1536
      - -c
      - shared_preload_libraries=vchord.so,pg_stat_statements
      - -c
      - search_path="public,vchord"

Table & Index Definition:
(click to expand)

CREATE TABLE "ImageEmbedding" (
    id serial PRIMARY KEY,
    "mediaId" text NOT NULL REFERENCES "Media" ON UPDATE CASCADE ON DELETE CASCADE,
    embeddings vector(1152) NOT NULL
);

ALTER TABLE "ImageEmbedding" OWNER TO postgresql_db_user;

CREATE UNIQUE INDEX "ImageEmbedding_mediaId_key"
    ON "ImageEmbedding" ("mediaId");

CREATE INDEX "ImageEmbedding_mediaId_idx"
    ON "ImageEmbedding" ("mediaId");

CREATE INDEX "ImageEmbedding_embeddings_vchordrq" 
ON "ImageEmbedding" USING vchordrq ("embeddings" vector_cosine_ops) 
WITH (options = $$
residual_quantization = false
[build.internal]
lists = [20000]
spherical_centroids = true
$$);

Example Query (Simple case):

Even the simplest similarity query:

SET vchordrq.probes = 500; -- <3% of lists

SELECT * FROM "ImageEmbedding"
ORDER BY "embeddings" <=> ${embedding}::vector
LIMIT 25;

is very slow with an execution time of typically 6-60 seconds (previously near-instant with pgvecto.rs).
CPU and disk I/O utilization remain close to non-existent during the query.


Query Plan (500 probes):

Limit  (cost=0.00..0.60 rows=25 width=67) (actual time=349.245..6114.847 rows=25 loops=1)
  Output: id, "mediaId", embeddings, ((embeddings <=> '[ ... ]'::vector))
  ->  Index Scan using "ImageEmbedding_embeddings_vchordrq" on public."ImageEmbedding"  (cost=0.00..174736.06 rows=7313205 width=67) (actual time=349.243..6114.812 rows=25 loops=1)
        Output: id, "mediaId", embeddings, (embeddings <=> '[ ... ]'::vector)
        Order By: ("ImageEmbedding".embeddings <=> '[ ... ]'::vector)
Planning Time: 0.735 ms
Execution Time: 6115.028 ms

Query Plan (2000 probes, ~10% lists, >120s):
(click to expand)

Limit  (cost=0.00..0.60 rows=25 width=67) (actual time=2210.406..122746.671 rows=25 loops=1)
  Output: id, "mediaId", embeddings, ((embeddings <=> '[ ... ]'::vector))
  ->  Index Scan using "ImageEmbedding_embeddings_vchordrq" on public."ImageEmbedding"  (cost=0.00..174752.84 rows=7313907 width=67) (actual time=2210.400..122746.392 rows=25 loops=1)
        Output: id, "mediaId", embeddings, (embeddings <=> '[ ... ]'::vector)
        Order By: ("ImageEmbedding".embeddings <=> '[ ... ]'::vector)
Planning Time: 0.526 ms
Execution Time: 122748.494 ms

Additional Notes:

  • Query speed doesn't significantly improve with different combinations of probes (tested ranges 15-3000) and epsilon (tested 1.0 - 1.9).
  • The index was rebuilt after adding multiple million new embeddings to ensure balanced lists. However, performance did not improve noticeably.
  • Hardware does not appear bottlenecked.
  • I have a second table with an embedding with 512 dimensions. It is slightly faster, but similarly very slow compared to the previous HNSW-Index in pgvecto.rs.

Expected Behavior:

I expect queries utilizing the VChordRQ index to be at least in a comparable performance range to the previously used HNSW index provided by pgvecto.rs. Currently, query performance is prohibitively slow for practical use.
But the indexing speed is much higher, the disk space much lower, and the CPU usage also much lower, so that is great 👍.


Questions:

I assume I am doing something wrong, but as far as I can tell from the currently available documentation I did everything as advised / documented.

Do you perhaps have any idea if I am missing something crucial in my configuration or usage?


Thanks in advance for your help!

@VoVAllen
Copy link
Member

The nlists value is too large, for 7.5M vectors, I think 8192 is enough. And for the query, can you show the results of EXPLAIN?

like

EXPLAIN 
SELECT * FROM "ImageEmbedding"
ORDER BY "embeddings" <=> ${embedding}::vector
LIMIT 25;

@NetroScript
Copy link
Author

NetroScript commented Mar 14, 2025

EXPLAIN SELECT * FROM "ImageEmbedding" ORDER BY "embeddings" <=> ${embedding}::vector LIMIT 25;

returns:

Limit  (cost=0.00..0.60 rows=25 width=67)
  ->  Index Scan using "ImageEmbedding_embeddings_vchordrq" on "ImageEmbedding"  (cost=0.00..174838.83 rows=7317506 width=67)
        Order By: (embeddings <=> '[-0.0202994,  ... ,0.04425988]'::vector)

(The two query plans in my initial message were run with EXPLAIN (ANALYSE,VERBOSE) SELECT * FROM "ImageEmbedding" ORDER BY "embeddings" <=> ${embedding}::vector LIMIT 25;)

I think the table count will probably reach around 20-30 million at some point, that is why I already set it to 20000, is it unwise to already have it at this count now?
I just followed the 4*sqrt(len(vectors)) <= lists <= 16 4*sqrt(len(vectors)) of the documentation.

@VoVAllen
Copy link
Member

Does you get the result as the first query or it's done by multiple rounds?

Can you also add result with

EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM "ImageEmbedding" ORDER BY "embeddings" <=> ${embedding}::vector LIMIT 25;

6s is too slow. It should be with in 10ms level for 7.5m vectors. One possible reason is that some IO occured to read data from HDD, so it's slow.

@NetroScript
Copy link
Author

NetroScript commented Mar 14, 2025

I am not quite sure what you mean with your first question.
Each query I used, uses a new embedding. If the same query with the same embedding is ran, the result seems cached and is very fast.
As embedding I picked random images of the dataset.

EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM "ImageEmbedding" ORDER BY "embeddings" <=> ${embedding}::vector LIMIT 25; results in:

Limit  (cost=0.00..0.60 rows=25 width=67) (actual time=1152.861..6481.805 rows=25 loops=1)
  Buffers: shared hit=29307 read=803
  ->  Index Scan using "ImageEmbedding_embeddings_vchordrq" on "ImageEmbedding"  (cost=0.00..174876.56 rows=7319085 width=67) (actual time=1152.859..6481.691 rows=25 loops=1)
        Order By: (embeddings <=> '[-0.005576801,0.0024379352, ... ,0.014884236,0.034900967]'::vector)
        Buffers: shared hit=29307 read=803
Planning Time: 0.706 ms
Execution Time: 6482.023 ms

Edit: Ah, and for this query JIT is off, vchordrq.probes = 2000, effective_io_concurrency = 10.

Edit 2:

This is a second query, with probes = 1000, with a an embedding from another random image:

Limit  (cost=0.00..0.60 rows=25 width=67) (actual time=1699.492..12400.945 rows=25 loops=1)
  Buffers: shared hit=20591 read=1667
  ->  Index Scan using "ImageEmbedding_embeddings_vchordrq" on "ImageEmbedding"  (cost=0.00..174884.96 rows=7319437 width=67) (actual time=1699.489..12400.744 rows=25 loops=1)
        Order By: (embeddings <=> '[0.034457225,0.02460603, ... ,0.0019907318,0.0009660904]'::vector)
        Buffers: shared hit=20591 read=1667
Planning Time: 0.691 ms
Execution Time: 12401.203 ms

@VoVAllen
Copy link
Member

It's expected to be cached. Otherwise it need to read pages from HDD, which is expected to be super slow. You can see Buffers: shared hit=20591 read=1667, means 20591 pages are in memory, and 1667 pages are from HDD. You can do prewarm to load everything into memory for better performance, like SELECT pg_prewarm('test.test_table_idx'); and SELECT pg_prewarm('test.test_table');. Since you have 48G shared buffer, should be easy to have everything in memory.

It's the similar case for pgvecto.rs, the latency is based on everything already in memory, without any read from the disk.

@NetroScript
Copy link
Author

NetroScript commented Mar 14, 2025

What would be the correct syntax for vchordrq_prewarm? The documentation seems to have a " too much, and no combination of the index name (ImageEmbedding_embeddings_vchordrq) or the table name (ImageEmbedding) seem to work.

Running SELECT vchordrq_prewarm('ImageEmbedding_embeddings_vchordrq'::regclass); for example fails with
ERROR: relation "imageembedding_embeddings_vchordrq" does not exist

@VoVAllen
Copy link
Member

Not vchordrq_prewarm, it will only prewarm all the bit quantized vectors, not includes the full vector. pg_prewarm will load all the files into the memory. Can you try pg_prewarm("public.ImageEmbedding") and pg_prewarm("public.imageembedding_embeddings_vchordrq")?

@VoVAllen
Copy link
Member

vchordrq_prewarm is for the disk-based scenario, which put quantized vector in memory, and full vector on disk. However, we expect user using performant SSD in this scenario. HDD is too slow

@NetroScript
Copy link
Author

Thank you for your time and all the information!

It completed the commands, but it seems the DB has insufficient memory available. The previous Explain command with buffers for another embedding:

Limit  (cost=0.00..0.60 rows=25 width=67) (actual time=298.376..4905.615 rows=25 loops=1)
  Buffers: shared hit=25049 read=796
  ->  Index Scan using "ImageEmbedding_embeddings_vchordrq" on "ImageEmbedding"  (cost=0.00..174943.67 rows=7321894 width=67) (actual time=298.375..4905.544 rows=25 loops=1)
        Order By: (embeddings <=> '[-0.023124589,-0.011638013, ... ,0.0071100155,0.029197253,-0.0006587554]'::vector)
        Buffers: shared hit=25049 read=796
Planning Time: 0.427 ms
Execution Time: 4905.768 ms

I guess to obtain high query speeds, I will either need to allocate more RAM (but the mainboard does not support more than 128GB) or add some SSDs to the server and move the database to a purely SSD volume.

As this is expected behavior and I know what next steps I can take, you can feel free to close this.

@VoVAllen
Copy link
Member

Yes. You can try increase shared_buffer size or add more hardwares. And do pg_prewarm("public.imageembedding_embeddings_vchordrq") after than pg_prewarm("public.ImageEmbedding") may help, since the later executed statement may evict pages in previous command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants