This repo is our project "TPC-DI Benchmark Using DuckDB" in the course "Data Warehouses" at Université Libre de Bruxelles (ULB). In this project, we implement the TPC-DI Benchmark on DuckDB Database Management System.
- Clone the repo
git clone https://github.com/hieunm44/dw-tpcdi-duckdb.git cd dw-tpcdi-duckdb
- Install
duckdb
packagepip install duckdb
- Go to https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp and download
TPC-DI_Tools_v1.1.0.zip
, then unzip it to a folderTPC-DI-Tool
. - Check the document
TPC-DI_v1.1.0.pdf
(also from the link above), the paper TPC-DI: The First Industry Benchmark for Data Integration, and the slides TPC-DI_Slides to get details about the TPC-DI benchmark. - Give full access permission to the data folder
chmod 777 generated_data
We only show examples for scale factor 3. Other scales can be reimplemented similarly.
- Data generation
Data will be generated in the folder
mv Tools/PDGF Tools/pdgf cd TPC-DI-Tool/Tool java -jar DIGen.jar -sf 3 -o ../../generated_data
generated_data
. - Run the benchmark
A database file
python3 main.py
sf_3.db
will be created in the foldercreated_db
. The result will be saved in the fileresults/result_sf3.csv
.
We use the following repository as a reference: https://github.com/risg99/tpc-di-benchmark.