Using the File Data Connector you can create datasets from files. This enables you to easily query locally accessible data stored in various file formats including CSV, Parquet, and Markdown.
- CLI installed (see Getting Started)
Follow these steps to get started with using local Parquet files as a dataset.
wget -O yellow_tripdata_2024-01.parquet
cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
- name: yellow_taxis
from: file://yellow_tripdata_2024-01.parquet
spice run
Open a new terminal and run the CLI command spice sql
spice sql
Then execute a query on the yellow_taxis
select avg(passenger_count) from yellow_taxis;
You should see the following output:
sql> select avg(passenger_count) from yellow_taxis;
| avg(yellow_taxis.passenger_count) |
| 1.3392808966805005 |
Time: 0.0253585 seconds. 1 rows.
Close the running Spice runtime and Spice SQL REPL.
# Remove the spicepod.yaml
rm spicepod.yaml
# Remove the Parquet file
rm yellow_tripdata_2024-01.parquet
Follow these steps to get started with using local Markdown files as a dataset.
for file in "${files[@]}"; do
curl -O "$base_url/$file"
cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
- name: docs
from: file:./
file_format: md
spice run
Open a new terminal and run the CLI command spice sql
spice sql
Then execute a query on the docs
select location from docs;
You should see outputs similar to the following:
| location |
| Users/lukim/dev/cookbook/file/ |
| Users/lukim/dev/cookbook/file/ |
| Users/lukim/dev/cookbook/file/ |
| Users/lukim/dev/cookbook/file/ |
| Users/lukim/dev/cookbook/file/ |
Close the running Spice runtime and Spice SQL REPL.
# Remove the spicepod.yaml
rm spicepod.yaml
# Remove the downloaded Markdown files
rm *.md
For more information, see the File Data Connector documentation.