Claude DuckDB Skill Guide (2026)
Ten DuckDB recipes you can run today — query a CSV, join CSV against Parquet, flatten JSON, read a remote Parquet over HTTP, window functions, PIVOT, export to Parquet, query SQLite, a typed COPY load, and read an Excel sheet — each a single Claude prompt with the exact, copy-paste-runnable DuckDB SQL it produces.
Already know what skills are? Skip to the cookbook. First time? Read the explainer then come back. Need the install? It’s on the /skills/write-script-duckdb page.

On this page · 21 sections▾
- What this skill does
- The cookbook
- Install + README
- Watch it explained
- 01 · Query a CSV with zero setup
- 02 · Join a CSV against a Parquet file
- 03 · Read and flatten nested JSON
- 04 · Query a remote Parquet over HTTP
- 05 · Window functions for running totals & ranks
- 06 · PIVOT and UNPIVOT a table
- 07 · Export query results to Parquet
- 08 · Query a SQLite database in place
- 09 · COPY a CSV into a typed table
- 10 · Read an Excel sheet directly
- Community signal
- The contrarian take
- Where DuckDB ships
- Gotchas
- Pairs well with
- FAQ
- Sources
What this skill actually does
Sixty seconds of context before the cookbook — what the DuckDB skill is, what Claude returns when you invoke it, and the one thing it does NOT do for you.
What this skill actually does
“MUST use when writing DuckDB queries.”
— windmill-labs, the skill author · /skills/write-script-duckdb
What Claude returns
You get DuckDB SQL: a query (or COPY script) that reads files directly with read_csv(), read_parquet(), read_json()/read_ndjson(), read_xlsx(), or the FROM 'file.ext' shorthand - no CREATE TABLE required. The skill knows DuckDB's analytical surface: GROUP BY aggregates, window functions with frame clauses and QUALIFY, PIVOT/UNPIVOT, ATTACH for SQLite/Postgres/MySQL, httpfs for remote files over https:// and s3://, and COPY ... TO 'out.parquet' (FORMAT parquet) for export. It uses DuckDB's native COPY for writes, and declares script parameters as typed SQL comments (-- $name (text)).
What it does NOT do
It does not install DuckDB for you, and it writes SQL - not a running database server. You still need the DuckDB CLI or a client, and the skill's parameter/resource syntax (-- $name, $res:, s3object) is tuned for Windmill's script runner; lift the plain SQL out if you run DuckDB standalone.
How you trigger it
Write DuckDB SQL to find the top 10 countries by revenue in orders.csv.Join events.csv against users.parquet on user_id and count actions per plan.Read budget.xlsx sheet FY26 and return variance by department, then export to Parquet.Cost when idle
~100 tokens
One framing to hold onto: DuckDB is an in-process analytical SQL engine — think “SQLite for analytics.” There is no server to run and no schema to define before you query. You point SQL at files and get answers. That is what makes it a natural fit for an agent: the skill writes the query, you run one command.
The cookbook
Each entry below is a query you could run this afternoon. They run in rough order of how often you’ll reach for them — the early ones (read a CSV, join files) are daily-driver moves, the later ones lean on DuckDB features you only need when the shape gets weird (window frames, PIVOT, ATTACH). Every entry pairs with one or two skills or MCP servers you already have on mcp.directory. Every SQL snippet is copy-paste-runnable against current DuckDB.
Install + README
If the skill isn’t on your machine yet, here’s the one-liner. The full install panel (Codex, Copilot, Antigravity variants) is on the skill page.
One-line install · by windmill-labs
Open skill pageInstall
mkdir -p .claude/skills/write-script-duckdb && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1544" && unzip -o skill.zip -d .claude/skills/write-script-duckdb && rm skill.zipInstalls to .claude/skills/write-script-duckdb
Watch it explained
A one-minute framing of why DuckDB exists and what “in-process analytical SQL” actually means — useful before the cookbook because it anchors the mental model the recipes assume.
Query a CSV with zero setup
Point DuckDB at a CSV on disk and run analytical SQL against it directly - no import, no schema, no database server.
ForAnyone who has a CSV and a question. Analysts, backend engineers, anyone who reaches for Excel and then regrets it past 100k rows.
The prompt
Use the write-script-duckdb skill. I have ./orders.csv (~2M rows, columns: order_id, customer_id, country, amount, created_at). Write DuckDB SQL that returns the top 10 countries by total revenue, with order count and average order value, sorted by revenue descending. Use the FROM 'file.csv' shorthand - DuckDB auto-detects the schema. Don't create a table.What slides.md looks like
-- DuckDB auto-detects types and headers from the CSV
SELECT
country,
count(*) AS orders,
sum(amount) AS revenue,
round(avg(amount),2) AS avg_order_value
FROM 'orders.csv'
GROUP BY country
ORDER BY revenue DESC
LIMIT 10;
-- Override detection only if needed:
-- FROM read_csv('orders.csv', header = true, sample_size = 20_000)One-line tweak
Prefix the whole thing with SUMMARIZE FROM 'orders.csv'; to get min/max/avg/null-rate per column before you write a single aggregate.
Join a CSV against a Parquet file
Join two files of different formats in one query - a CSV export against a Parquet dimension table - without loading either into a database first.
ForData engineers stitching a hand-exported CSV against a columnar dataset from the warehouse dump.
The prompt
Use the write-script-duckdb skill. Join ./events.csv (event_id, user_id, ts, action) against ./users.parquet (user_id, plan, signup_country) on user_id. Return action counts per plan tier for the last 30 days. Use read_csv() and read_parquet() in the same FROM clause. Files stay on disk; no CREATE TABLE.What slides.md looks like
SELECT
u.plan,
e.action,
count(*) AS n
FROM read_csv('events.csv') AS e
JOIN read_parquet('users.parquet') AS u
ON e.user_id = u.user_id
WHERE e.ts >= now() - INTERVAL 30 DAY
GROUP BY u.plan, e.action
ORDER BY u.plan, n DESC;One-line tweak
Swap the JOIN for a LEFT JOIN and add WHERE u.user_id IS NULL to find events from users missing in the dimension file - a one-line orphan check.
Read and flatten nested JSON
Parse a JSON or newline-delimited JSON file, extract nested fields, and unnest an array column into rows - all in SQL.
ForEngineers wrangling API dumps and log exports that arrive as JSON, not tidy tables.
The prompt
Use the write-script-duckdb skill. I have ./api_dump.json where each record has { id, customer: { name, tier }, items: [ { sku, qty } ] }. Write DuckDB SQL that returns one row per item with id, customer name, tier, sku, qty. Use read_json(), the -> / ->> JSON operators, and UNNEST for the items array.What slides.md looks like
SELECT
j.id,
j.customer->>'name' AS customer_name,
j.customer->>'tier' AS tier,
item->>'sku' AS sku,
(item->>'qty')::INT AS qty
FROM read_json('api_dump.json') AS j,
UNNEST(j.items) AS t(item);
-- For newline-delimited logs use read_ndjson('logs.ndjson')
-- or read_json(..., format = 'newline_delimited')One-line tweak
If the file is one giant JSON array per line that auto-detection mis-reads, force it with read_json('api_dump.json', format = 'auto') or point at read_ndjson() for line-delimited logs.
Query a remote Parquet over HTTP
Run SQL against a Parquet file sitting on a URL or object store without downloading it first - DuckDB streams only the columns and row groups it needs.
ForAnyone exploring a public dataset, a shared S3 bucket, or a Hugging Face Parquet without wanting a 4 GB local copy.
The prompt
Use the write-script-duckdb skill. Query this remote Parquet over HTTPS: https://example.com/data/nyc_taxi_2026.parquet. Return average trip distance and total fare by payment_type, for trips over $50. Read it straight from the URL with read_parquet() - the httpfs extension auto-loads on first remote read, so no manual INSTALL is needed.What slides.md looks like
-- httpfs auto-loads on the first https:// read
SELECT
payment_type,
count(*) AS trips,
round(avg(trip_distance), 2) AS avg_miles,
round(sum(fare_amount), 2) AS total_fares
FROM read_parquet('https://example.com/data/nyc_taxi_2026.parquet')
WHERE fare_amount > 50
GROUP BY payment_type
ORDER BY total_fares DESC;
-- Glob a folder of remote files and keep the source:
-- SELECT *, filename FROM read_parquet('s3://bucket/y=2026/*.parquet')One-line tweak
Add filename = true (or SELECT *, filename) so each row carries which partition file it came from - invaluable when globbing a dated folder.
Window functions for running totals & ranks
Compute a 7-day moving average, a running total, and per-group ranks in one pass using DuckDB's window functions and QUALIFY.
ForAnalysts who keep reaching for Excel's drag-fill. Window functions do it correctly and reproducibly.
The prompt
Use the write-script-duckdb skill. From ./daily_revenue.csv (date, region, revenue), compute per region: the 7-day moving average of revenue, the running cumulative total, and rank each day within its region by revenue. Use AVG/SUM OVER with frame clauses and row_number() OVER. Use QUALIFY to also return only each region's single best day.What slides.md looks like
SELECT
date, region, revenue,
round(avg(revenue) OVER w7, 2) AS revenue_7d_avg,
sum(revenue) OVER wrun AS revenue_running,
row_number() OVER wrank AS day_rank
FROM 'daily_revenue.csv'
WINDOW
w7 AS (PARTITION BY region ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW),
wrun AS (PARTITION BY region ORDER BY date),
wrank AS (PARTITION BY region ORDER BY revenue DESC)
QUALIFY day_rank = 1; -- best day per region onlyOne-line tweak
Drop the QUALIFY line to keep every day with its rolling metrics; keep it to collapse to the one peak day per region. Same query, two deliverables.
PIVOT and UNPIVOT a table
Reshape long data to wide (a cross-tab) and back again using DuckDB's PIVOT / UNPIVOT statements - no hand-written CASE-WHEN columns.
ForAnyone building a quarter-by-region matrix for a deck, or normalizing a wide spreadsheet into long form for analysis.
The prompt
Use the write-script-duckdb skill. From ./sales.csv (country, year, population) build a cross-tab: one row per country, one column per year, cells = sum(population). Use DuckDB's PIVOT statement. Then show me the inverse with UNPIVOT to fold a wide year-columns table back into (country, year, population) long form.What slides.md looks like
-- Wide cross-tab: years become columns
PIVOT (FROM 'sales.csv')
ON year
USING sum(population)
GROUP BY country;
-- Inverse: fold year columns back to long form
UNPIVOT wide_sales
ON "2024", "2025", "2026"
INTO
NAME year
VALUE population;One-line tweak
PIVOT discovers the distinct ON values automatically. To freeze a fixed set of columns (e.g. only 2024-2026), use the SQL-standard form: PIVOT sales ON year IN (2024, 2025, 2026) USING sum(population).
Export query results to Parquet
Run a transformation and write the result to a compressed, columnar Parquet file - the format every downstream warehouse and Polars/pandas job reads fastest.
ForData engineers producing a cleaned, typed artifact for a pipeline instead of re-deriving it from raw CSV every run.
The prompt
Use the write-script-duckdb skill. Read ./raw_events.csv, drop rows where user_id IS NULL, cast ts to TIMESTAMP, keep only the columns user_id, ts, action, and write the cleaned result to ./clean/events.parquet using COPY ... TO with FORMAT parquet. Then also show the PARTITION_BY variant that writes one folder per action.What slides.md looks like
-- Single cleaned Parquet file
COPY (
SELECT user_id, ts::TIMESTAMP AS ts, action
FROM 'raw_events.csv'
WHERE user_id IS NOT NULL
) TO 'clean/events.parquet' (FORMAT parquet);
-- Hive-partitioned: one sub-folder per action value
COPY (SELECT * FROM 'raw_events.csv')
TO 'clean/events' (FORMAT parquet, PARTITION_BY (action));One-line tweak
Add COMPRESSION 'zstd' inside the options for a smaller file than the default Snappy, at a small write-time cost. Worth it for cold artifacts you read often.
Query a SQLite database in place
Attach an existing SQLite file and run analytical SQL across its tables - join them, aggregate them, without an ETL step into another store.
ForDevelopers whose app data lives in a SQLite file (mobile apps, local tools, Django/Rails dev DBs) who want fast analytics on it.
The prompt
Use the write-script-duckdb skill. I have ./app.sqlite with tables users(id, country) and sessions(user_id, started_at, duration_s). Attach it as a SQLite database and return median session duration and session count per country. Use INSTALL sqlite; LOAD sqlite; ATTACH ... (TYPE sqlite); then query the attached tables by qualified name.What slides.md looks like
INSTALL sqlite; -- once per environment
LOAD sqlite;
ATTACH 'app.sqlite' AS app (TYPE sqlite);
SELECT
u.country,
count(*) AS sessions,
median(s.duration_s) AS median_seconds
FROM app.sessions AS s
JOIN app.users AS u ON u.id = s.user_id
GROUP BY u.country
ORDER BY sessions DESC;One-line tweak
DuckDB ships ATTACH readers for SQLite, Postgres, and MySQL. Swap (TYPE sqlite) for (TYPE postgres) plus a connection string to run the same analytical SQL against a live Postgres without touching its write path.
COPY a CSV into a typed table
Define an explicit, typed schema and bulk-load a CSV into it with COPY - the path you want when types matter more than zero-setup convenience.
ForEngineers who need a column to be DECIMAL not DOUBLE, or a date parsed with a specific format, before any downstream query touches it.
The prompt
Use the write-script-duckdb skill. Create a typed DuckDB table 'transactions' (txn_id BIGINT, amount DECIMAL(12,2), currency VARCHAR, txn_date DATE) and bulk-load ./txns.csv into it with COPY ... FROM. The CSV has a header row and a non-default date format YYYY-MM-DD. After loading, return the row count and the sum by currency.What slides.md looks like
CREATE TABLE transactions (
txn_id BIGINT,
amount DECIMAL(12,2),
currency VARCHAR,
txn_date DATE
);
COPY transactions FROM 'txns.csv'
(HEADER true, DATEFORMAT '%Y-%m-%d');
SELECT currency, count(*) AS n, sum(amount) AS total
FROM transactions
GROUP BY currency
ORDER BY total DESC;One-line tweak
Add (HEADER true, IGNORE_ERRORS true) to skip malformed rows instead of failing the whole load, then query the rejects table DuckDB records to see what it dropped.
Read an Excel sheet directly
Query an .xlsx workbook the same way you query a CSV - pick a sheet, treat row one as headers, and run SQL with no manual Save-As-CSV dance.
ForAnyone handed a spreadsheet by finance, ops, or a vendor who needs answers out of it without opening Excel.
The prompt
Use the write-script-duckdb skill. From ./budget.xlsx, read the sheet named 'FY26' with the first row as headers, and return total budget and total actual by department, plus the variance. Use read_xlsx() with the sheet and header options - DuckDB infers the Excel extension automatically, no manual INSTALL.What slides.md looks like
SELECT
department,
sum(budget) AS budget,
sum(actual) AS actual,
sum(actual) - sum(budget) AS variance
FROM read_xlsx('budget.xlsx', sheet = 'FY26', header = true)
GROUP BY department
ORDER BY variance;
-- .xlsx only; .xls is not supported by the reader.One-line tweak
Wrap the read in a COPY (...) TO 'budget_fy26.parquet' (FORMAT parquet) to snapshot the spreadsheet into a stable Parquet the rest of the pipeline can trust.
Community signal
Three voices from people using DuckDB for real work. The first is the time-to-value story; the second is the daily-default shift; the third names exactly who it clicks for.
“Naively I started by looking at redshift, Athena/glue, etc. I spun my wheels for a week and barely made progress... So I decided to give DuckDB a shot and I got the whole project done in under a day.”
moribvndvs · Hacker News
An engineer tasked with giving a non-technical team arbitrary queries over a daily Parquet drop - a week of fragile cloud PoCs versus a day on DuckDB.
“Over the past few years, I've found myself using DuckDB more and more for data processing, to the point where I now use it almost exclusively”
Robin Linacre · Blog
Data scientist Robin Linacre on why DuckDB became his default tool for data processing - the conviction shift the cookbook is built around.
“I'm sure the use of duckdb may seem weird for normal developers, but for data people it really is game-changing, especially for data scientists or business analysts.”
rubenvanwyk · Hacker News
A commenter naming exactly who DuckDB clicks for - analysts and data scientists - even when it looks unusual to app developers.
The contrarian take
DuckDB earns real skepticism when people try to make it something it isn’t. The sharpest version, from sradman on Hacker News:
“I would not adopt DuckDB as an OLTP replacement for either SQLite or PostgreSQL without a strong OLAP component and some serious performance/load/integrity testing”
sradman · Hacker News
From a Hacker News thread on DuckDB's positioning.
DuckDB was never meant to replace your OLTP store - it is an in-process OLAP engine. The right architecture is Postgres (or your app DB) for writes, DuckDB for analytics over the exports. The maturity worry is also dated: this comment predates DuckDB 1.0. The skill in this cookbook leans into that split on purpose: every recipe is a read-heavy analytical query over files or an attached store, never a transactional write path. If you find yourself wanting concurrent writers, row-level locking, or a long-running server, you want Postgres — and you can still point DuckDB at it (use case 8) for the analytics.
One related framing worth naming: the “DuckDB vs Polars” debate is mostly a false choice. As one commenter put it, “Polars and duckdb interoperate so it’s not really one or the other.” Both build on Apache Arrow, so a frame moves between them with no copy. Pick SQL when the work is joins and windows; pick the dataframe API when you want Python-side logic. The cookbook pairs them deliberately.
Where DuckDB ships
Concrete examples of DuckDB in production and notable projects. None used the Claude skill specifically — they’re here to show the range of work the engine carries, so you have a sense of how far the in-process model reaches before you write your first query.
- Hugging Face — the Hub Dataset Viewer auto-converts datasets to Parquet and queries them with DuckDB (native hf:// support)
- dbt Labs — official dbt adapter for DuckDB for local and lakehouse transformations
- MotherDuck — managed cloud data warehouse 'powered by DuckDB' from the DuckDB creators
- Evidence.dev — BI-as-code reporting tool that connects to DuckDB as a SQL data source
- Harvard Law School Library Innovation Lab — DuckDB-WASM lets users query terabytes of legal data in the browser
- DeepSeek — Smallpond, a distributed data-processing framework built on DuckDB and the 3FS filesystem
Gotchas (the four that bite)
Sourced from the DuckDB documentation and the duckdb/duckdb issue tracker.
It is single-process, not a concurrent server
DuckDB is embedded: one writer at a time, no network protocol, no replication. It's for analytics, not for serving an app's concurrent writes. That's a feature, not a bug - but plan your architecture around it.
Large joins can exhaust memory
DuckDB spills to disk, but a huge join or aggregation can still blow past RAM. Set a memory_limit pragma, prefer Parquet over CSV (it reads only needed columns), and filter early in the query.
CSV type detection samples, then commits
Auto-detection reads a sample (default ~20k rows) to infer types. A rare string in an otherwise-numeric column further down can break the read. Bump sample_size, or define columns explicitly when the data is messy.
read_xlsx is .xlsx only
The Excel reader handles .xlsx, not the legacy .xls binary format. Convert the workbook to .xlsx first, or save the sheet as CSV - then the rest of the cookbook applies unchanged.
Pairs well with
Curated to match the cookbook’s actual integrations: the data-adjacent skills (data-analysis for Polars, sql, xlsx, matplotlib) plus the SQL MCP servers the later use cases lean on when the source data lives in a real database. There is also a DuckDB MCP server and a MotherDuck & DuckDB server if you want the engine running as a shared service rather than authoring SQL with the skill.
Related skills
Related MCP servers
Two posts that compose well with this cookbook: What are Claude Code skills? covers the underlying mechanism, and Claude Code best practices covers the orchestration patterns the multi-file recipes (2, 4, 7) lean on.
Frequently asked questions
Do I need to set up a database to use the DuckDB skill?
No. That is the whole point of DuckDB - it is an in-process engine, so there is no server to start and no schema to define. The skill writes SQL that reads CSV, Parquet, JSON, and Excel files straight off disk or a URL with the FROM 'file.ext' shorthand. You only CREATE TABLE when you deliberately want a typed, persistent schema (use case 9).
Can DuckDB query Parquet and CSV files without importing them first?
Yes - read_csv(), read_parquet(), read_json(), and read_xlsx() (plus the FROM 'file.ext' shorthand) let you run analytical SQL directly against files. DuckDB only reads the columns and row groups your query needs, so a query over a multi-GB Parquet does not load the whole file. Use cases 1, 2, and 4 cover this.
Does the DuckDB skill handle remote files over HTTP or S3?
Yes. DuckDB's httpfs extension auto-loads on the first read of an https:// or s3:// path - no manual INSTALL needed - so read_parquet('https://.../file.parquet') works as-is. Glob a folder (s3://bucket/*.parquet) and add filename=true to keep the source file per row. Use case 4 walks through it.
DuckDB vs Polars - which should I use?
They interoperate rather than compete; both build on Apache Arrow, so you can move a frame between them with no copy. Reach for DuckDB when the work is naturally SQL - joins across files, window functions, PIVOT. Reach for Polars (the data-analysis skill) when you want a Python dataframe API for plotting or row-wise logic. Most real pipelines use both.
Is DuckDB a replacement for Postgres?
No, and it is not trying to be. Postgres is an OLTP database built for many concurrent writers and transactional integrity; DuckDB is an in-process OLAP engine built for fast analytical reads. The clean pattern is Postgres for your application's writes and DuckDB for analytics over the exports - DuckDB can even ATTACH a live Postgres (use case 8) to query it without touching the write path.
Can DuckDB query a SQLite database directly?
Yes. INSTALL sqlite; LOAD sqlite; ATTACH 'app.sqlite' AS app (TYPE sqlite); then query app.table_name with full DuckDB SQL - joins, window functions, aggregates - against the SQLite tables in place. The same ATTACH mechanism works for Postgres and MySQL with (TYPE postgres) / (TYPE mysql). Use case 8 has the full script.
Is there a DuckDB MCP server, or should I use the skill?
Both exist on mcp.directory. The skill writes DuckDB SQL and costs ~100 tokens at idle - the lighter choice when you run the CLI yourself. The DuckDB MCP server (and MotherDuck & DuckDB) runs the engine as a service so multiple AI clients can query a shared instance, at the cost of tool schemas loaded on every turn. Use the server for shared state; the skill for authoring queries.
Sources
Primary
- windmill-labs write-script-duckdb SKILL.md (the skill manifest)
- DuckDB official documentation
- DuckDB CSV import reference
- DuckDB Parquet reference
- DuckDB window functions
- DuckDB PIVOT / UNPIVOT
Community
- moribvndvs — Hacker News
- Robin Linacre — Blog
- rubenvanwyk — Hacker News
- SmirkingRevenge — Hacker News
- nasretdinov — Hacker News
- David Breunig — Blog
Critical and contrarian
Internal