Skip to content

Source code and supplimentary materials for "LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration"

License

Notifications You must be signed in to change notification settings

gzy02/LEGO-GraphRAG

Repository files navigation

LEGO-GraphRAG

Source code and supplementary materials for "LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration".


Table of Contents


Introduction

LEGO-GraphRAG is a modular framework designed for exploring design spaces using graph-based retrieval-augmented generation. This repository contains the source code and supplementary materials for the project, including data preprocessing, experiments, and evaluation tools.


Directory Structure

.
├── requirements.txt
├── Preprocess
├── SEPRAlign
├── Instance
├── CostRecord
├── Evaluation
├── Results
├── ExistingInstances
├── ApproximatePPR
└── FineTune

System Overview

The system is divided into several independently executed repositories, each serving a specific purpose:

Preprocess

Handles data preprocessing tasks, including cleaning, formatting, and preparing datasets for further processing.

Instance

Responsible for generating LEGO-GraphRAG instances and running the Instance Experiments.

SEPRAlign

Aligns subgraph-extraction and path-retrieval modules to run the Module Experiments.

CostRecord

Records various runtime costs, such as computational resource consumption by LLMs and EEMs.

Evaluation

Provides tools to evaluate the performance of different components and generate visualizations, including charts and graphs.

Results

All the reported results in our paper are stored in this folder.

ExistingInstances

Reproduces instance alignment experiments, including:

  • KELP
  • RoG
  • ToG

ApproximatePPR

Reproduces personalized PageRank (PPR) experiments, including:

  • Fora
  • TopPPR

FineTune

Includes scripts and configurations for fine-tuning EEMs and LLMs.


Run the Instance Experiments

  1. Clone this repository:

    git clone https://github.com/gzy02/LEGO-GraphRAG.git
  2. Navigate to the project directory and install dependencies:

    cd LEGO-GraphRAG
    pip install -r requirements.txt
    cd Instance
  3. Prepare the Graph and datasets by following the instructions in the Preprocess module.

  4. Deploy a LLM using the provided script:

    ./vllm_qwen.sh
  5. Change the config.py file to the desired configuration. The keyworks should include:

  • llm_url: the url of the deployed LLM
  • reasoning_model: the reasoning model used in the LLM
  • reasoning_dataset: the dataset used in the experiment
  • subgraph_list: the subgraphs used in the experiment
  • emb_model_dir: the directory of the embedding model
  • temperature: the temperature used in the LLM generation
  • max_tokens: the maximum tokens generated by the LLM
  1. Run the Instance Experiments:
  • Subgraph Extraction Experiments:
    python SE_PPR.py
    python SE_EEM.py
    python SE_LLM.py
  • Path Retrieval Experiments:
    python Instances.py
  • Generate the results:
    python Generate.py

License

This project is licensed under the terms specified in the LICENSE file.

About

Source code and supplimentary materials for "LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published