RAGTools.jl is a battle-tested package for building Retrieval-Augmented Generation (RAG) applications in Julia. Originally part of PromptingTools.jl, it has been carved out into a standalone package after proving its value in production use cases for over a year.
The package focuses on high-performance, in-memory RAG pipelines that leverage Julia's speed to avoid the complexity of cloud-hosted vector databases. It seamlessly integrates with PromptingTools.jl to support a wide range of AI models and providers. However, if you need vector database support, you can simply overload the necessary functions in the pipeline.
Import the package:
using RAGTools
Key functions:
build_index
: Create a RAG index from documents (returnsChunkIndex
)airag
: Generate answers using RAG (combinesretrieve
andgenerate!
)retrieve
: Find relevant document chunks for a questiongenerate!
: Create an answer from retrieved chunksannotate_support
: Highlight which parts of answers are supported by documentsbuild_qa_evals
: Generate question-answer pairs for evaluating RAG performance
Index some documents:
# Create sample documents
sentences = [
"The Distributed.jl package enables efficient parallel computing and workload distribution across multiple processes in Julia.",
"DataFrames.jl provides comprehensive tools for data manipulation and analysis similar to pandas in Python.",
"Plots.jl offers a powerful unified interface for creating publication-quality visualizations in Julia.",
]
# Build the index
index = build_index(sentences);
Generate an answer:
# Simple query
question = "What package to use for parallel computing in Julia?"
msg = airag(index; question)
# [ Info: Done with RAG. Total cost: $0.0
# AIMessage("You should use the Distributed.jl package for parallel computing in Julia.")
# Get detailed results including intermediate steps
result = airag(index; question, return_all=true)
# Pretty print with support annotations
pprint(result)
The package is designed to be modular and extensible:
- Use the default pipeline with
SimpleIndexer
:
index = build_index(SimpleIndexer(), sentences)
- Or customize any step by implementing your own methods:
# Example structure of the pipeline
result = retrieve(index, question) # Get relevant chunks
result = generate!(index, result) # Generate answer
RAGTools provides powerful support annotation capabilities through its pretty-printing system. Use pprint
to automatically analyze and display how well the generated answer is supported by the source documents:
pprint(result)
Example output (with color highlighting in terminal):
--------------------
QUESTION(s)
--------------------
- What are the best practices for parallel computing in Julia?
--------------------
ANSWER
--------------------
Some of the best practices for parallel computing in Julia include:[1,0.7]
- Using [3,0.4]`@threads` for simple parallelism[1,0.34]
- Utilizing `Distributed` module for more complex parallel tasks[1,0.19]
- Avoiding excessive memory allocation
- Considering task granularity for efficient workload distribution
--------------------
SOURCES
--------------------
1. Doc8
2. Doc15
3. Doc5
4. Doc2
5. Doc9
The annotation system helps you validate the generated answers:
- Color Coding:
- Uncolored text: High match with source documents
- Blue text: Partial match with sources
- Magenta text: No match (model-generated)
- Source Citations:
[3,0.4]
indicates source document #3 with 40% match score
For web applications, use print_html
to generate HTML-formatted output with styling:
print_html(result) # Great for Genie.jl/Stipple.jl applications
RAGTools.jl offers a rich set of features for building production-ready RAG applications:
- Simple One-Line RAG
- Quick setup with
build_index
andairag
functions - Default pipeline with semantic search and basic generation
- Seamless integration with PromptingTools.jl for various AI models
- Flexible Pipeline Components
- Modular pipeline with consistent step names (
retrieve
,rerank
, etc.) - Each step dispatches on custom types (e.g., inherit from
AbstractRetriever
,AbstractReranker
, etc.) - Easy to extend by implementing new types and the corresponding methods without changing core pipeline
- Dispatching kwarg & configuration always passed as first argument for maximum flexibility
- Semantic Search
- Cosine similarity with dense embeddings
- BM25 text similarity for keyword-based search
- Binary embeddings with Hamming distance for efficiency
- Bit-packed binary embeddings for maximum space efficiency
- Hybrid indices combining multiple similarity methods
-
Query Enhancement
- HYDE (Hypothetical Document Embedding) for query rephrasing
- Multiple query variations for better coverage
-
Ranking & Fusion
- Reciprocal Rank Fusion for combining multiple rankings
- Multiple ranking models:
- Local ranking with FlashRank.jl
- RankGPT for LLM-based reranking
- Cohere Rerank API integration
- Custom ranking model support
-
Chunking & Embedding
- Multiple chunking strategies
- Batched embedding for efficiency
- Support for various embedding models
- Binary and bit-packed embedding compression
- Embedding dimension truncation
-
Tagging & Filtering
- Tag-based filtering system
- Custom tag generation support
- Flexible tag matching strategies
- Generation & Refinement
- Multiple generation strategies
- Answer refinement steps
- Customizable post-processing
- Support for various AI models through PromptingTools.jl
- Quality & Analysis
- Answer Support Analysis
- Automatic source citation with
[source_id, score]
format - Support score calculation using trigram matching
- Color-coded fact-checking visualization:
- Uncolored: High confidence match with sources
- Blue: Partial match with sources
- Magenta: No source support (model-generated)
- Sentence-level support analysis
- Support threshold customization
- Automated citation placement
- Source document tracking
- Automatic source citation with
- Visual Validation
- Pretty printing with color-coded support levels
- HTML output for web applications
- Interactive source exploration
- Support score distribution analysis
- Evaluation Tools
- Automated QA pair generation for evaluation
- Support coverage metrics
- Source utilization analysis
- Answer consistency checking
- Integration & Observability
- JSON logging of results and conversations
- Integration with Spehulak.jl for RAG performance analysis
- Cost tracking across API calls
- Performance metrics and timing
- Utility Features
- Tokenization utilities
- Text splitting functions
- Pretty printing with support annotations
- Batch processing utilities
- Cost tracking and optimization tools
- Extensibility
- Modular pipeline design
- Custom component support
- Multiple pre-built configurations:
SimpleRetriever
SimpleBM25Retriever
AdvancedRetriever
- Easy integration with vector databases
- Performance Optimization
- In-memory operation for speed
- Efficient binary embedding compression
- Batched operations for API calls
- Multi-threading support
- Memory-efficient data structures (eg, bit-packed binary embeddings)
We welcome contributions to RAGTools.jl! Here are some guidelines to make the process smooth for everyone:
- JuliaFormatter.jl: Please format your code using JuliaFormatter.jl before submitting a PR:
using JuliaFormatter format("path/to/changed/files", verbose=true)
Also available as a VS Code extension.
We follow Conventional Commits v1.0.0 for clear and standardized commit messages.
Feel free to open an issue on the GitHub repository if you have any questions or feedback.
Alternatively, ask in the #generative-ai
channel in the JuliaLang Slack.