datastax
diff --git a/‎libs/ragulate/Makefile
+4 b/‎libs/ragulate/Makefile
+4
diff --git a/‎libs/ragulate/README.md
+175 b/‎libs/ragulate/README.md
+175
diff --git a/‎libs/ragulate/colbert_chunk_size_and_k.py
+157 b/‎libs/ragulate/colbert_chunk_size_and_k.py
+157
diff --git a/‎libs/ragulate/images/example.png
41.9 KB b/‎libs/ragulate/images/example.png
41.9 KB
diff --git a/‎libs/ragulate/images/logo.png
114 KB b/‎libs/ragulate/images/logo.png
114 KB
diff --git a/‎libs/ragulate/images/logo_smaller.png
114 KB b/‎libs/ragulate/images/logo_smaller.png
114 KB
diff --git a/‎libs/ragulate/images/metrics.png
17.2 KB b/‎libs/ragulate/images/metrics.png
17.2 KB
@@ -0,0 +1,4 @@
+# Sort imports and format python files
+fmt:
+	isort --profile black .
+	black .
@@ -0,0 +1,175 @@
+# RAGulate
+
+A tool for evaluating RAG pipelines
+
+![ragulate_logo](https://raw.githubusercontent.com/datastax/ragstack-ai/main/libs/ragulate/images/logo_smaller.png)
+
+## The Metrics
+
+The RAGulate currently reports 4 relevancy metrics: Answer Correctness, Answer Relevance, Context Relevance, and Groundedness.
+
+
+![metrics_diagram](https://raw.githubusercontent.com/datastax/ragstack-ai/main/libs/ragulate/images/metrics.png)
+
+* Answer Correctness
+  * How well does the generated answer match the ground-truth answer?
+  * This confirms how well the full system performed.
+* Answer Relevance
+  * Is the generated answer relevant to the query?
+  * This shows if the LLM is responding in a way that is helpful to answer the query.
+* Context Relevance:
+  * Does the retrieved context contain information to answer the query?
+  * This shows how well the retrieval part of the process is performing.
+* Groundedness:
+  * Is the generated response supported by the context?
+  * Low scores here indicate that the LLM is hallucinating.
+
+## Example Output
+
+The tool outputs results as images like this:
+
+![example_output](https://raw.githubusercontent.com/datastax/ragstack-ai/main/libs/ragulate/images/example.png)
+
+These images show distribution box plots of the metrics for different test runs.
+
+## Installation
+
+```sh
+pip install ragulate
+```
+
+## Initial Setup
+
+1. Set your environment variables or create a `.env` file. You will need to set `OPENAI_API_KEY` and
+  any other environment variables needed by your ingest and query pipelines.
+
+1. Wrap your ingest pipeline in a single python method. The method should take a `file_path` parameter and
+  any other variables that you will pass during your experimentation. The method should ingest the passed
+  file into your vector store.
+
+   See the `ingest()` method in [open_ai_chunk_size_and_k.py](open_ai_chunk_size_and_k.py) as an example.
+   This method configures an ingest pipeline using the parameter `chunk_size` and ingests the file passed.
+
+1. Wrap your query pipeline in a single python method, and return it. The method should have parameters for
+  any variables that you will pass during your experimentation. Currently only LangChain LCEL query pipelines
+  are supported.
+
+   See the `query()` method in [open_ai_chunk_size_and_k.py](open_ai_chunk_size_and_k.py) as an example.
+   This method returns a LangChain LCEL pipeline configured by the parameters `chunk_size` and `k`.
+
+Note: It is helpful to have a `**kwargs` param in your pipeline method definitions, so that if extra params
+  are passed, they can be safely ignored.
+
+## Usage
+
+### Summary
+
+```sh
+usage: ragulate [-h] {download,ingest,query,compare} ...
+
+RAGu-late CLI tool.
+
+options:
+  -h, --help            show this help message and exit
+
+commands:
+    download            Download a dataset
+    ingest              Run an ingest pipeline
+    query               Run an query pipeline
+    compare             Compare results from 2 (or more) recipes
+    run                 Run an experiment from a config file
+```
+
+### Example
+
+For the examples below, we will use the example experiment [open_ai_chunk_size_and_k.py](open_ai_chunk_size_and_k.py)
+and see how the RAG metrics change for changes in `chunk_size` and `k` (number of documents retrieved).
+
+There are two ways to run Ragulate to run an experiment. Either define an experiment with a config file or execute it manually step by step.
+
+#### Via Config File
+
+**Note: Running via config file is a new feature and it is not as stable as running manually.**
+
+1. Create a yaml config file with a similar format to the example config: [example_config.yaml](example_config.yaml).  This defines the same test as shown manually below.
+
+1. Execute it with a single command:
+
+    ```
+    ragulate run example_config.yaml
+    ```
+
+    This will:
+    * Download the test datasets
+    * Run the ingest pipelines
+    * Run the query pipelines
+    * Output an analysis of the results.
+
+
+#### Manually
+
+1. Download a dataset. See available datasets here: https://llamahub.ai/?tab=llama_datasets
+  * If you are unsure where to start, recommended datasets are:
+    * `BraintrustCodaHelpDesk`
+    * `BlockchainSolana`
+
+    Examples:
+    * `ragulate download -k llama BraintrustCodaHelpDesk`
+    * `ragulate download -k llama BlockchainSolana`
+
+2. Ingest the datasets using different methods:
+
+    Examples:
+    * Ingest with `chunk_size=200`:
+      ```
+      ragulate ingest -n chunk_size_200 -s open_ai_chunk_size_and_k.py -m ingest \
+      --var-name chunk_size --var-value 200 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
+      ```
+    * Ingest with `chunk_size=100`:
+      ```
+      ragulate ingest -n chunk_size_100 -s open_ai_chunk_size_and_k.py -m ingest \
+      --var-name chunk_size --var-value 100 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
+      ```
+
+3. Run query and evaluations on the datasets using methods:
+
+    Examples:
+    * Query with `chunk_size=200` and `k=2`
+      ```
+      ragulate query -n chunk_size_200_k_2 -s open_ai_chunk_size_and_k.py -m query_pipeline \
+      --var-name chunk_size --var-value 200  --var-name k --var-value 2 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
+      ```
+
+    * Query with `chunk_size=100` and `k=2`
+      ```
+      ragulate query -n chunk_size_100_k_2 -s open_ai_chunk_size_and_k.py -m query_pipeline \
+      --var-name chunk_size --var-value 100  --var-name k --var-value 2 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
+      ```
+
+    * Query with `chunk_size=200` and `k=5`
+      ```
+      ragulate query -n chunk_size_200_k_5 -s open_ai_chunk_size_and_k.py -m query_pipeline \
+      --var-name chunk_size --var-value 200  --var-name k --var-value 5 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
+      ```
+
+    * Query with `chunk_size=100` and `k=5`
+      ```
+      ragulate query -n chunk_size_100_k_5 -s open_ai_chunk_size_and_k.py -m query_pipeline \
+      --var-name chunk_size --var-value 100  --var-name k --var-value 5 --dataset BraintrustCodaHelpDesk --dataset BlockchainSolana
+      ```
+
+1. Run a compare to get the results:
+
+    Example:
+      ```
+      ragulate compare -r chunk_size_100_k_2 -r chunk_size_200_k_2 -r chunk_size_100_k_5 -r chunk_size_200_k_5
+      ```
+
+    This will output 2 png files. one for each dataset.
+
+## Current Limitations
+
+* The evaluation model is locked to OpenAI gpt3.5
+* Only LangChain query pipelines are supported
+* Only LlamaIndex datasets are supported
+* There is no way to specify which metrics to evaluate.
@@ -0,0 +1,157 @@
+import os
+import time
+from pathlib import Path
+from typing import List
+
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.document_loaders import UnstructuredFileLoader
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.runnables import RunnablePassthrough
+from langchain_openai import ChatOpenAI
+from ragstack_colbert import (
+    CassandraDatabase,
+    Chunk,
+    ColbertEmbeddingModel,
+    ColbertVectorStore,
+)
+from ragstack_langchain.colbert import ColbertVectorStore as LangChainColbertVectorStore
+from transformers import BertTokenizer
+
+LLM_MODEL = "gpt-3.5-turbo"
+
+batch_size = 640
+
+astra_token = os.getenv("ASTRA_DB_TOKEN")
+database_id = os.getenv("ASTRA_DB_ID")
+keyspace = "colbert"
+
+import logging
+
+logging.basicConfig(level=logging.INFO)
+logging.getLogger("unstructured").setLevel(logging.ERROR)
+logging.getLogger("cassandra").setLevel(logging.ERROR)
+logging.getLogger("http").setLevel(logging.ERROR)
+logging.getLogger("httpx").setLevel(logging.ERROR)
+
+
+def get_embedding_model(chunk_size: int) -> ColbertEmbeddingModel:
+    return ColbertEmbeddingModel(doc_maxlen=chunk_size, batch_size=batch_size)
+
+
+def get_database(chunk_size: int) -> CassandraDatabase:
+    table_name = f"colbert_chunk_size_{chunk_size}"
+
+    return CassandraDatabase.from_astra(
+        astra_token=astra_token,
+        database_id=database_id,
+        keyspace=keyspace,
+        table_name=table_name,
+        timeout=500,
+    )
+
+
+def get_lc_vector_store(chunk_size: int) -> LangChainColbertVectorStore:
+    database = get_database(chunk_size=chunk_size)
+    embedding_model = get_embedding_model(chunk_size=chunk_size)
+
+    return LangChainColbertVectorStore(
+        database=database,
+        embedding_model=embedding_model,
+    )
+
+
+def get_vector_store(chunk_size: int) -> ColbertVectorStore:
+    database = get_database(chunk_size=chunk_size)
+    return ColbertVectorStore(database=database)
+
+
+tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+
+
+def len_function(text: str) -> int:
+    return len(tokenizer.tokenize(text))
+
+
+async def ingest(file_path: str, chunk_size: int, **kwargs):
+    doc_id = Path(file_path).name
+
+    chunk_overlap = min(chunk_size / 4, min(chunk_size / 2, 64))
+
+    start = time.time()
+    docs = UnstructuredFileLoader(
+        file_path=file_path, mode="single", strategy="fast"
+    ).load()
+    duration = time.time() - start
+    print(f"It took {duration} seconds to load and parse the document")
+
+    # confirm only one document returned per file
+    assert len(docs) == 1
+
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=chunk_size,
+        chunk_overlap=chunk_overlap,
+        length_function=len_function,
+    )
+
+    start = time.time()
+    chunked_docs = text_splitter.split_documents(docs)
+    duration = time.time() - start
+    print(
+        f"It took {duration} seconds to split the document into {len(chunked_docs)} chunks"
+    )
+
+    texts = [doc.page_content for doc in chunked_docs]
+    start = time.time()
+    embeddings = get_embedding_model(chunk_size=chunk_size).embed_texts(texts=texts)
+    duration = time.time() - start
+    print(f"It took {duration} seconds to embed {len(chunked_docs)} chunks")
+
+    colbert_vector_store = get_vector_store(chunk_size=chunk_size)
+
+    await colbert_vector_store.adelete_chunks(doc_ids=[doc_id])
+
+    chunks: List[Chunk] = []
+    for i, doc in enumerate(chunked_docs):
+        chunks.append(
+            Chunk(
+                doc_id=doc_id,
+                chunk_id=i,
+                text=doc.page_content,
+                metadata={} if doc.metadata is None else doc.metadata,
+                embedding=embeddings[i],
+            )
+        )
+
+    start = time.time()
+    await colbert_vector_store.aadd_chunks(chunks=chunks, concurrent_inserts=100)
+    duration = time.time() - start
+    print(
+        f"It took {duration} seconds to insert {len(chunked_docs)} chunks into AstraDB"
+    )
+
+
+def query_pipeline(k: int, chunk_size: int, **kwargs):
+    vector_store = get_lc_vector_store(chunk_size=chunk_size)
+    llm = ChatOpenAI(model_name=LLM_MODEL)
+
+    # build a prompt
+    prompt_template = """
+    Answer the question based only on the supplied context. If you don't know the answer, say: "I don't know".
+    Context: {context}
+    Question: {question}
+    Your answer:
+    """
+    prompt = ChatPromptTemplate.from_template(prompt_template)
+
+    rag_chain = (
+        {
+            "context": vector_store.as_retriever(search_kwargs={"k": k}),
+            "question": RunnablePassthrough(),
+        }
+        | prompt
+        | llm
+        | StrOutputParser()
+    )
+
+    return rag_chain