Skip to content

Latest commit

 

History

History
165 lines (117 loc) · 3.43 KB

README.md

File metadata and controls

165 lines (117 loc) · 3.43 KB

File Data Connector Recipe

Using the File Data Connector you can create datasets from files. This enables you to easily query locally accessible data stored in various file formats including CSV, Parquet, and Markdown.

Prerequisites

Query Parquet Files

Follow these steps to get started with using local Parquet files as a dataset.

Step 1: Download or Move a Parquet File Locally

wget https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet -O yellow_tripdata_2024-01.parquet

Step 2: Create the Spicepod

cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
datasets:
  - name: yellow_taxis
    from: file://yellow_tripdata_2024-01.parquet
EOF

Step 3: Start the Spice Runtime

spice run

Step 4: Query the Dataset Using SQL

Open a new terminal and run the CLI command spice sql.

spice sql

Then execute a query on the yellow_taxis dataset.

select avg(passenger_count) from yellow_taxis;

You should see the following output:

sql> select avg(passenger_count) from yellow_taxis;
+-----------------------------------+
| avg(yellow_taxis.passenger_count) |
+-----------------------------------+
| 1.3392808966805005                |
+-----------------------------------+

Time: 0.0253585 seconds. 1 rows.

Step 5: Terminate the Spice Runtime

Close the running Spice runtime and Spice SQL REPL.

Step 6: (Optional) Cleanup

# Remove the spicepod.yaml
rm spicepod.yaml

# Remove the Parquet file
rm yellow_tripdata_2024-01.parquet

Query Markdown Documents

Follow these steps to get started with using local Markdown files as a dataset.

Step 1: Download Markdown Documents

base_url="https://raw.githubusercontent.com/spiceai/docs/refs/heads/trunk/website/docs/components/data-connectors"

files=(
  "clickhouse.md"
  "databricks.md"
  "debezium.md"
  "delta-lake.md"
)

for file in "${files[@]}"; do
  curl -O "$base_url/$file"
done

Step 2: Create the Spicepod

cat <<EOF > spicepod.yaml
version: v1
kind: Spicepod
name: file_recipe
datasets:
  - name: docs
    from: file:./
    params:
      file_format: md
EOF

Step 3: Start the Spice Runtime

spice run

Step 4: Query the Dataset Using SQL

Open a new terminal and run the CLI command spice sql.

spice sql

Then execute a query on the docs dataset.

select location from docs;

You should see outputs similar to the following:

+---------------------------------------------+
| location                                    |
+---------------------------------------------+
| Users/lukim/dev/cookbook/file/debezium.md   |
| Users/lukim/dev/cookbook/file/databricks.md |
| Users/lukim/dev/cookbook/file/README.md     |
| Users/lukim/dev/cookbook/file/clickhouse.md |
| Users/lukim/dev/cookbook/file/delta-lake.md |
+---------------------------------------------+

Step 5: Terminate the Spice Runtime

Close the running Spice runtime and Spice SQL REPL.

Step 6: (Optional) Cleanup

# Remove the spicepod.yaml
rm spicepod.yaml

# Remove the downloaded Markdown files
rm *.md

Additional Resources

For more information, see the File Data Connector documentation.