Using the File Data Connector you can create datasets from files. This enables you to easily query locally accessible data stored in various file formats including CSV, Parquet, and Markdown.
- Spice.ai CLI installed (see Getting Started)
Follow these steps to get started with using local Parquet files as a dataset.
wget https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet -O yellow_tripdata_2024-01.parquet
cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
datasets:
- name: yellow_taxis
from: file://yellow_tripdata_2024-01.parquet
EOF
spice run
Open a new terminal and run the CLI command spice sql
.
spice sql
Then execute a query on the yellow_taxis
dataset.
select avg(passenger_count) from yellow_taxis;
You should see the following output:
sql> select avg(passenger_count) from yellow_taxis;
+-----------------------------------+
| avg(yellow_taxis.passenger_count) |
+-----------------------------------+
| 1.3392808966805005 |
+-----------------------------------+
Time: 0.0253585 seconds. 1 rows.
Close the running Spice runtime and Spice SQL REPL.
# Remove the spicepod.yaml
rm spicepod.yaml
# Remove the Parquet file
rm yellow_tripdata_2024-01.parquet
Follow these steps to get started with using local Markdown files as a dataset.
base_url="https://raw.githubusercontent.com/spiceai/docs/refs/heads/trunk/website/docs/components/data-connectors"
files=(
"clickhouse.md"
"databricks.md"
"debezium.md"
"delta-lake.md"
)
for file in "${files[@]}"; do
curl -O "$base_url/$file"
done
cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
datasets:
- name: docs
from: file:./
params:
file_format: md
EOF
spice run
Open a new terminal and run the CLI command spice sql
.
spice sql
Then execute a query on the docs
dataset.
select location from docs;
You should see outputs similar to the following:
+---------------------------------------------+
| location |
+---------------------------------------------+
| Users/lukim/dev/cookbook/file/debezium.md |
| Users/lukim/dev/cookbook/file/databricks.md |
| Users/lukim/dev/cookbook/file/README.md |
| Users/lukim/dev/cookbook/file/clickhouse.md |
| Users/lukim/dev/cookbook/file/delta-lake.md |
+---------------------------------------------+
Close the running Spice runtime and Spice SQL REPL.
# Remove the spicepod.yaml
rm spicepod.yaml
# Remove the downloaded Markdown files
rm *.md
For more information, see the File Data Connector documentation.