GitHub - hicder/muopdb: MuopDB

MuopDB - A vector database for AI memories

Introduction

MuopDB is a vector database for machine learning. Currently, it supports:

Index type: HNSW, IVF, SPANN, Multi-user SPANN. All on-disk with mmap.
Quantization: product quantization

Why MuopDB?

MuopDB supports multiple users by default. What that means is, each user will have its own vector index, within the same collection. The use-case for this is to build memory for LLMs. Think of it as:

Each user will have its own memory
Each user can still search a shared knowledge base.

All users' indices will be stored in a few files, reducing operational complexity.

Quick Start

Build MuopDB. Refer to this instruction.
Prepare necessary data and indices directories. On Mac, you might want to change these directories since root directory is read-only, i.e: ~/mnt/muopdb/.

mkdir -p /mnt/muopdb/indices
mkdir -p /mnt/muopdb/data

Start MuopDB index_server with the directories we just prepared using one of these methods:

# Start server locally. This is recommended for Mac.
cd target/release
RUST_LOG=info ./index_server --node-id 0 --index-config-path /mnt/muopdb/indices --index-data-path /mnt/muopdb/data --port 9002

# Start server with Docker. Only use this option on Linux.
docker-compose up --build

Now you have an up and running MuopDB index_server.
- You can send gRPC requests to this server (possibly with Postman).
- You can use Server Reflection in Postman - it will automatically detect the RPCs for MuopDB.

Examples using Postman

Create collection

{
    "collection_name": "test-collection-2",
    "num_features": 10,
    "wal_file_size": 1024000000,
    "max_time_to_flush_ms": 5000,
    "max_pending_ops": 10
}

Insert some data

{
    "collection_name": "test-collection-2",
    "doc_ids": [
        {
            "high_id": 0,
            "low_id": 100
        }
    ],
    "user_ids": [
        {
            "high_id": 0,
            "low_id": 0
        }
    ],
    "vectors": [
        100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0
    ]
}

Search

{
    "collection_name": "test-collection-2",
    "ef_construction": 200,
    "record_metrics": false,
    "top_k": 1,
    "user_ids": [
        {
            "high_id": 0,
            "low_id": 0
        }
    ],
    "vector": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]
}

Remove

{
    "collection_name": "test-collection-2",
    "doc_ids": [
        {
            "low_id": 100,
            "high_id": 0
        }
    ],
    "user_ids": [
        {
            "low_id": 0,
            "high_id": 0
        }
    ]
}

Search again You should see something else

{
    "collection_name": "test-collection-2",
    "ef_construction": 200,
    "record_metrics": false,
    "top_k": 1,
    "user_ids": [
        {
            "high_id": 0,
            "low_id": 0
        }
    ],
    "vector": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]
}

This time it should give you something else

Plans

Phase 0 (Done)

Query path
- Vector similarity search
- Hierarchical Navigable Small Worlds (HNSW)
- Product Quantization (PQ)
Indexing path
- Support periodic offline indexing
Database Management
- Doc-sharding & query fan-out with aggregator-leaf architecture
- In-memory & disk-based storage with mmap

Phase 1 (Done)

Query & Indexing
- Inverted File (IVF)
- Improve locality for HNSW
- SPANN

Phase 2 (Done)

Phase 3 (Done)

Phase 4 (Ongoing)

Building

Install prerequisites:
- Rust: https://www.rust-lang.org/tools/install
- Make sure you're on nightly: rustup toolchain install nightly
- Libraries

# MacOS (using Homebrew)
brew install hdf5 protobuf openblas

# Linux (Arch-based)
# On Arch Linux (and its derivatives, such as EndeavourOS, CachyOS):
sudo pacman -Syu hdf5 protobuf openblas

# Linux (Debian-based)
sudo apt-get install libhdf5-dev libprotobuf-dev libopenblas-dev

Build from Source:

git clone https://github.com/hicder/muopdb.git
cd muopdb

# Build
cargo build --release

# Run tests
cargo test --release

Contributions

This project is done with TechCare Coaching. I am mentoring mentees who made contributions to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 473 Commits
.cargo		.cargo
.github/workflows		.github/workflows
py		py
rs		rs
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
extra_launch.json		extra_launch.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuopDB - A vector database for AI memories

Introduction

Why MuopDB?

Quick Start

Examples using Postman

Plans

Phase 0 (Done)

Phase 1 (Done)

Phase 2 (Done)

Phase 3 (Done)

Phase 4 (Ongoing)

Building

Contributions

About

Releases

Packages

Contributors 9

Languages

hicder/muopdb

Folders and files

Latest commit

History

Repository files navigation

MuopDB - A vector database for AI memories

Introduction

Why MuopDB?

Quick Start

Examples using Postman

Plans

Phase 0 (Done)

Phase 1 (Done)

Phase 2 (Done)

Phase 3 (Done)

Phase 4 (Ongoing)

Building

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages