Skip to content

Commit dffb77b

Browse files
authored
initial add of ragulate to ragstack (#528)
1 parent 8114ed4 commit dffb77b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+2426
-0
lines changed

libs/ragulate/Makefile

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sort imports and format python files
2+
fmt:
3+
isort --profile black .
4+
black .

libs/ragulate/README.md

+175
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# RAGulate
2+
3+
A tool for evaluating RAG pipelines
4+
5+
![ragulate_logo](https://raw.githubusercontent.com/datastax/ragstack-ai/main/libs/ragulate/images/logo_smaller.png)
6+
7+
## The Metrics
8+
9+
The RAGulate currently reports 4 relevancy metrics: Answer Correctness, Answer Relevance, Context Relevance, and Groundedness.
10+
11+
12+
![metrics_diagram](https://raw.githubusercontent.com/datastax/ragstack-ai/main/libs/ragulate/images/metrics.png)
13+
14+
* Answer Correctness
15+
* How well does the generated answer match the ground-truth answer?
16+
* This confirms how well the full system performed.
17+
* Answer Relevance
18+
* Is the generated answer relevant to the query?
19+
* This shows if the LLM is responding in a way that is helpful to answer the query.
20+
* Context Relevance:
21+
* Does the retrieved context contain information to answer the query?
22+
* This shows how well the retrieval part of the process is performing.
23+
* Groundedness:
24+
* Is the generated response supported by the context?
25+
* Low scores here indicate that the LLM is hallucinating.
26+
27+
## Example Output
28+
29+
The tool outputs results as images like this:
30+
31+
![example_output](https://raw.githubusercontent.com/datastax/ragstack-ai/main/libs/ragulate/images/example.png)
32+
33+
These images show distribution box plots of the metrics for different test runs.
34+
35+
## Installation
36+
37+
```sh
38+
pip install ragulate
39+
```
40+
41+
## Initial Setup
42+
43+
1. Set your environment variables or create a `.env` file. You will need to set `OPENAI_API_KEY` and
44+
any other environment variables needed by your ingest and query pipelines.
45+
46+
1. Wrap your ingest pipeline in a single python method. The method should take a `file_path` parameter and
47+
any other variables that you will pass during your experimentation. The method should ingest the passed
48+
file into your vector store.
49+
50+
See the `ingest()` method in [open_ai_chunk_size_and_k.py](open_ai_chunk_size_and_k.py) as an example.
51+
This method configures an ingest pipeline using the parameter `chunk_size` and ingests the file passed.
52+
53+
1. Wrap your query pipeline in a single python method, and return it. The method should have parameters for
54+
any variables that you will pass during your experimentation. Currently only LangChain LCEL query pipelines
55+
are supported.
56+
57+
See the `query()` method in [open_ai_chunk_size_and_k.py](open_ai_chunk_size_and_k.py) as an example.
58+
This method returns a LangChain LCEL pipeline configured by the parameters `chunk_size` and `k`.
59+
60+
Note: It is helpful to have a `**kwargs` param in your pipeline method definitions, so that if extra params
61+
are passed, they can be safely ignored.
62+
63+
## Usage
64+
65+
### Summary
66+
67+
```sh
68+
usage: ragulate [-h] {download,ingest,query,compare} ...
69+
70+
RAGu-late CLI tool.
71+
72+
options:
73+
-h, --help show this help message and exit
74+
75+
commands:
76+
download Download a dataset
77+
ingest Run an ingest pipeline
78+
query Run an query pipeline
79+
compare Compare results from 2 (or more) recipes
80+
run Run an experiment from a config file
81+
```
82+
83+
### Example
84+
85+
For the examples below, we will use the example experiment [open_ai_chunk_size_and_k.py](open_ai_chunk_size_and_k.py)
86+
and see how the RAG metrics change for changes in `chunk_size` and `k` (number of documents retrieved).
87+
88+
There are two ways to run Ragulate to run an experiment. Either define an experiment with a config file or execute it manually step by step.
89+
90+
#### Via Config File
91+
92+
**Note: Running via config file is a new feature and it is not as stable as running manually.**
93+
94+
1. Create a yaml config file with a similar format to the example config: [example_config.yaml](example_config.yaml). This defines the same test as shown manually below.
95+
96+
1. Execute it with a single command:
97+
98+
```
99+
ragulate run example_config.yaml
100+
```
101+
102+
This will:
103+
* Download the test datasets
104+
* Run the ingest pipelines
105+
* Run the query pipelines
106+
* Output an analysis of the results.
107+
108+
109+
#### Manually
110+
111+
1. Download a dataset. See available datasets here: https://llamahub.ai/?tab=llama_datasets
112+
* If you are unsure where to start, recommended datasets are:
113+
* `BraintrustCodaHelpDesk`
114+
* `BlockchainSolana`
115+
116+
Examples:
117+
* `ragulate download -k llama BraintrustCodaHelpDesk`
118+
* `ragulate download -k llama BlockchainSolana`
119+
120+
2. Ingest the datasets using different methods:
121+
122+
Examples:
123+
* Ingest with `chunk_size=200`:
124+
```
125+
ragulate ingest -n chunk_size_200 -s open_ai_chunk_size_and_k.py -m ingest \
126+
--var-name chunk_size --var-value 200 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
127+
```
128+
* Ingest with `chunk_size=100`:
129+
```
130+
ragulate ingest -n chunk_size_100 -s open_ai_chunk_size_and_k.py -m ingest \
131+
--var-name chunk_size --var-value 100 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
132+
```
133+
134+
3. Run query and evaluations on the datasets using methods:
135+
136+
Examples:
137+
* Query with `chunk_size=200` and `k=2`
138+
```
139+
ragulate query -n chunk_size_200_k_2 -s open_ai_chunk_size_and_k.py -m query_pipeline \
140+
--var-name chunk_size --var-value 200 --var-name k --var-value 2 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
141+
```
142+
143+
* Query with `chunk_size=100` and `k=2`
144+
```
145+
ragulate query -n chunk_size_100_k_2 -s open_ai_chunk_size_and_k.py -m query_pipeline \
146+
--var-name chunk_size --var-value 100 --var-name k --var-value 2 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
147+
```
148+
149+
* Query with `chunk_size=200` and `k=5`
150+
```
151+
ragulate query -n chunk_size_200_k_5 -s open_ai_chunk_size_and_k.py -m query_pipeline \
152+
--var-name chunk_size --var-value 200 --var-name k --var-value 5 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
153+
```
154+
155+
* Query with `chunk_size=100` and `k=5`
156+
```
157+
ragulate query -n chunk_size_100_k_5 -s open_ai_chunk_size_and_k.py -m query_pipeline \
158+
--var-name chunk_size --var-value 100 --var-name k --var-value 5 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
159+
```
160+
161+
1. Run a compare to get the results:
162+
163+
Example:
164+
```
165+
ragulate compare -r chunk_size_100_k_2 -r chunk_size_200_k_2 -r chunk_size_100_k_5 -r chunk_size_200_k_5
166+
```
167+
168+
This will output 2 png files. one for each dataset.
169+
170+
## Current Limitations
171+
172+
* The evaluation model is locked to OpenAI gpt3.5
173+
* Only LangChain query pipelines are supported
174+
* Only LlamaIndex datasets are supported
175+
* There is no way to specify which metrics to evaluate.
+157
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
import os
2+
import time
3+
from pathlib import Path
4+
from typing import List
5+
6+
from langchain.text_splitter import RecursiveCharacterTextSplitter
7+
from langchain_community.document_loaders import UnstructuredFileLoader
8+
from langchain_core.output_parsers import StrOutputParser
9+
from langchain_core.prompts import ChatPromptTemplate
10+
from langchain_core.runnables import RunnablePassthrough
11+
from langchain_openai import ChatOpenAI
12+
from ragstack_colbert import (
13+
CassandraDatabase,
14+
Chunk,
15+
ColbertEmbeddingModel,
16+
ColbertVectorStore,
17+
)
18+
from ragstack_langchain.colbert import ColbertVectorStore as LangChainColbertVectorStore
19+
from transformers import BertTokenizer
20+
21+
LLM_MODEL = "gpt-3.5-turbo"
22+
23+
batch_size = 640
24+
25+
astra_token = os.getenv("ASTRA_DB_TOKEN")
26+
database_id = os.getenv("ASTRA_DB_ID")
27+
keyspace = "colbert"
28+
29+
import logging
30+
31+
logging.basicConfig(level=logging.INFO)
32+
logging.getLogger("unstructured").setLevel(logging.ERROR)
33+
logging.getLogger("cassandra").setLevel(logging.ERROR)
34+
logging.getLogger("http").setLevel(logging.ERROR)
35+
logging.getLogger("httpx").setLevel(logging.ERROR)
36+
37+
38+
def get_embedding_model(chunk_size: int) -> ColbertEmbeddingModel:
39+
return ColbertEmbeddingModel(doc_maxlen=chunk_size, batch_size=batch_size)
40+
41+
42+
def get_database(chunk_size: int) -> CassandraDatabase:
43+
table_name = f"colbert_chunk_size_{chunk_size}"
44+
45+
return CassandraDatabase.from_astra(
46+
astra_token=astra_token,
47+
database_id=database_id,
48+
keyspace=keyspace,
49+
table_name=table_name,
50+
timeout=500,
51+
)
52+
53+
54+
def get_lc_vector_store(chunk_size: int) -> LangChainColbertVectorStore:
55+
database = get_database(chunk_size=chunk_size)
56+
embedding_model = get_embedding_model(chunk_size=chunk_size)
57+
58+
return LangChainColbertVectorStore(
59+
database=database,
60+
embedding_model=embedding_model,
61+
)
62+
63+
64+
def get_vector_store(chunk_size: int) -> ColbertVectorStore:
65+
database = get_database(chunk_size=chunk_size)
66+
return ColbertVectorStore(database=database)
67+
68+
69+
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
70+
71+
72+
def len_function(text: str) -> int:
73+
return len(tokenizer.tokenize(text))
74+
75+
76+
async def ingest(file_path: str, chunk_size: int, **kwargs):
77+
doc_id = Path(file_path).name
78+
79+
chunk_overlap = min(chunk_size / 4, min(chunk_size / 2, 64))
80+
81+
start = time.time()
82+
docs = UnstructuredFileLoader(
83+
file_path=file_path, mode="single", strategy="fast"
84+
).load()
85+
duration = time.time() - start
86+
print(f"It took {duration} seconds to load and parse the document")
87+
88+
# confirm only one document returned per file
89+
assert len(docs) == 1
90+
91+
text_splitter = RecursiveCharacterTextSplitter(
92+
chunk_size=chunk_size,
93+
chunk_overlap=chunk_overlap,
94+
length_function=len_function,
95+
)
96+
97+
start = time.time()
98+
chunked_docs = text_splitter.split_documents(docs)
99+
duration = time.time() - start
100+
print(
101+
f"It took {duration} seconds to split the document into {len(chunked_docs)} chunks"
102+
)
103+
104+
texts = [doc.page_content for doc in chunked_docs]
105+
start = time.time()
106+
embeddings = get_embedding_model(chunk_size=chunk_size).embed_texts(texts=texts)
107+
duration = time.time() - start
108+
print(f"It took {duration} seconds to embed {len(chunked_docs)} chunks")
109+
110+
colbert_vector_store = get_vector_store(chunk_size=chunk_size)
111+
112+
await colbert_vector_store.adelete_chunks(doc_ids=[doc_id])
113+
114+
chunks: List[Chunk] = []
115+
for i, doc in enumerate(chunked_docs):
116+
chunks.append(
117+
Chunk(
118+
doc_id=doc_id,
119+
chunk_id=i,
120+
text=doc.page_content,
121+
metadata={} if doc.metadata is None else doc.metadata,
122+
embedding=embeddings[i],
123+
)
124+
)
125+
126+
start = time.time()
127+
await colbert_vector_store.aadd_chunks(chunks=chunks, concurrent_inserts=100)
128+
duration = time.time() - start
129+
print(
130+
f"It took {duration} seconds to insert {len(chunked_docs)} chunks into AstraDB"
131+
)
132+
133+
134+
def query_pipeline(k: int, chunk_size: int, **kwargs):
135+
vector_store = get_lc_vector_store(chunk_size=chunk_size)
136+
llm = ChatOpenAI(model_name=LLM_MODEL)
137+
138+
# build a prompt
139+
prompt_template = """
140+
Answer the question based only on the supplied context. If you don't know the answer, say: "I don't know".
141+
Context: {context}
142+
Question: {question}
143+
Your answer:
144+
"""
145+
prompt = ChatPromptTemplate.from_template(prompt_template)
146+
147+
rag_chain = (
148+
{
149+
"context": vector_store.as_retriever(search_kwargs={"k": k}),
150+
"question": RunnablePassthrough(),
151+
}
152+
| prompt
153+
| llm
154+
| StrOutputParser()
155+
)
156+
157+
return rag_chain

libs/ragulate/images/example.png

41.9 KB
Loading

libs/ragulate/images/logo.png

114 KB
Loading

libs/ragulate/images/logo_smaller.png

114 KB
Loading

libs/ragulate/images/metrics.png

17.2 KB
Loading

0 commit comments

Comments
 (0)