Updated June 2026Cookbook16 min read

Claude DuckDB Skill Guide (2026)

Ten DuckDB recipes you can run today — query a CSV, join CSV against Parquet, flatten JSON, read a remote Parquet over HTTP, window functions, PIVOT, export to Parquet, query SQLite, a typed COPY load, and read an Excel sheet — each a single Claude prompt with the exact, copy-paste-runnable DuckDB SQL it produces.

Already know what skills are? Skip to the cookbook. First time? Read the explainer then come back. Need the install? It’s on the /skills/write-script-duckdb page.

Editorial illustration: a stack of CSV and Parquet file glyphs on the left flowing into a yellow duck-silhouette SQL engine on the right, connected by a luminous teal flow arc, on a midnight navy background.
On this page · 21 sections
  1. What this skill does
  2. The cookbook
  3. Install + README
  4. Watch it explained
  5. 01 · Query a CSV with zero setup
  6. 02 · Join a CSV against a Parquet file
  7. 03 · Read and flatten nested JSON
  8. 04 · Query a remote Parquet over HTTP
  9. 05 · Window functions for running totals & ranks
  10. 06 · PIVOT and UNPIVOT a table
  11. 07 · Export query results to Parquet
  12. 08 · Query a SQLite database in place
  13. 09 · COPY a CSV into a typed table
  14. 10 · Read an Excel sheet directly
  15. Community signal
  16. The contrarian take
  17. Where DuckDB ships
  18. Gotchas
  19. Pairs well with
  20. FAQ
  21. Sources

What this skill actually does

Sixty seconds of context before the cookbook — what the DuckDB skill is, what Claude returns when you invoke it, and the one thing it does NOT do for you.

What this skill actually does

MUST use when writing DuckDB queries.

windmill-labs, the skill author · /skills/write-script-duckdb

What Claude returns

You get DuckDB SQL: a query (or COPY script) that reads files directly with read_csv(), read_parquet(), read_json()/read_ndjson(), read_xlsx(), or the FROM 'file.ext' shorthand - no CREATE TABLE required. The skill knows DuckDB's analytical surface: GROUP BY aggregates, window functions with frame clauses and QUALIFY, PIVOT/UNPIVOT, ATTACH for SQLite/Postgres/MySQL, httpfs for remote files over https:// and s3://, and COPY ... TO 'out.parquet' (FORMAT parquet) for export. It uses DuckDB's native COPY for writes, and declares script parameters as typed SQL comments (-- $name (text)).

What it does NOT do

It does not install DuckDB for you, and it writes SQL - not a running database server. You still need the DuckDB CLI or a client, and the skill's parameter/resource syntax (-- $name, $res:, s3object) is tuned for Windmill's script runner; lift the plain SQL out if you run DuckDB standalone.

How you trigger it

Write DuckDB SQL to find the top 10 countries by revenue in orders.csv.Join events.csv against users.parquet on user_id and count actions per plan.Read budget.xlsx sheet FY26 and return variance by department, then export to Parquet.

Cost when idle

~100 tokens

One framing to hold onto: DuckDB is an in-process analytical SQL engine — think “SQLite for analytics.” There is no server to run and no schema to define before you query. You point SQL at files and get answers. That is what makes it a natural fit for an agent: the skill writes the query, you run one command.

The cookbook

Each entry below is a query you could run this afternoon. They run in rough order of how often you’ll reach for them — the early ones (read a CSV, join files) are daily-driver moves, the later ones lean on DuckDB features you only need when the shape gets weird (window frames, PIVOT, ATTACH). Every entry pairs with one or two skills or MCP servers you already have on mcp.directory. Every SQL snippet is copy-paste-runnable against current DuckDB.

Install + README

If the skill isn’t on your machine yet, here’s the one-liner. The full install panel (Codex, Copilot, Antigravity variants) is on the skill page.

One-line install · by windmill-labs

Open skill page

Install

mkdir -p .claude/skills/write-script-duckdb && curl -L -o skill.zip "https://mcp.directory/api/skills/download/1544" && unzip -o skill.zip -d .claude/skills/write-script-duckdb && rm skill.zip

Installs to .claude/skills/write-script-duckdb

Watch it explained

A one-minute framing of why DuckDB exists and what “in-process analytical SQL” actually means — useful before the cookbook because it anchors the mental model the recipes assume.

01

Query a CSV with zero setup

Point DuckDB at a CSV on disk and run analytical SQL against it directly - no import, no schema, no database server.

ForAnyone who has a CSV and a question. Analysts, backend engineers, anyone who reaches for Excel and then regrets it past 100k rows.

The prompt

Use the write-script-duckdb skill. I have ./orders.csv (~2M rows, columns: order_id, customer_id, country, amount, created_at). Write DuckDB SQL that returns the top 10 countries by total revenue, with order count and average order value, sorted by revenue descending. Use the FROM 'file.csv' shorthand - DuckDB auto-detects the schema. Don't create a table.

What slides.md looks like

-- DuckDB auto-detects types and headers from the CSV
SELECT
    country,
    count(*)            AS orders,
    sum(amount)         AS revenue,
    round(avg(amount),2) AS avg_order_value
FROM 'orders.csv'
GROUP BY country
ORDER BY revenue DESC
LIMIT 10;

-- Override detection only if needed:
-- FROM read_csv('orders.csv', header = true, sample_size = 20_000)

One-line tweak

Prefix the whole thing with SUMMARIZE FROM 'orders.csv'; to get min/max/avg/null-rate per column before you write a single aggregate.

02

Join a CSV against a Parquet file

Join two files of different formats in one query - a CSV export against a Parquet dimension table - without loading either into a database first.

ForData engineers stitching a hand-exported CSV against a columnar dataset from the warehouse dump.

The prompt

Use the write-script-duckdb skill. Join ./events.csv (event_id, user_id, ts, action) against ./users.parquet (user_id, plan, signup_country) on user_id. Return action counts per plan tier for the last 30 days. Use read_csv() and read_parquet() in the same FROM clause. Files stay on disk; no CREATE TABLE.

What slides.md looks like

SELECT
    u.plan,
    e.action,
    count(*) AS n
FROM read_csv('events.csv') AS e
JOIN read_parquet('users.parquet') AS u
    ON e.user_id = u.user_id
WHERE e.ts >= now() - INTERVAL 30 DAY
GROUP BY u.plan, e.action
ORDER BY u.plan, n DESC;

One-line tweak

Swap the JOIN for a LEFT JOIN and add WHERE u.user_id IS NULL to find events from users missing in the dimension file - a one-line orphan check.

03

Read and flatten nested JSON

Parse a JSON or newline-delimited JSON file, extract nested fields, and unnest an array column into rows - all in SQL.

ForEngineers wrangling API dumps and log exports that arrive as JSON, not tidy tables.

The prompt

Use the write-script-duckdb skill. I have ./api_dump.json where each record has { id, customer: { name, tier }, items: [ { sku, qty } ] }. Write DuckDB SQL that returns one row per item with id, customer name, tier, sku, qty. Use read_json(), the -> / ->> JSON operators, and UNNEST for the items array.

What slides.md looks like

SELECT
    j.id,
    j.customer->>'name' AS customer_name,
    j.customer->>'tier' AS tier,
    item->>'sku'        AS sku,
    (item->>'qty')::INT AS qty
FROM read_json('api_dump.json') AS j,
     UNNEST(j.items) AS t(item);

-- For newline-delimited logs use read_ndjson('logs.ndjson')
-- or read_json(..., format = 'newline_delimited')

One-line tweak

If the file is one giant JSON array per line that auto-detection mis-reads, force it with read_json('api_dump.json', format = 'auto') or point at read_ndjson() for line-delimited logs.

04

Query a remote Parquet over HTTP

Run SQL against a Parquet file sitting on a URL or object store without downloading it first - DuckDB streams only the columns and row groups it needs.

ForAnyone exploring a public dataset, a shared S3 bucket, or a Hugging Face Parquet without wanting a 4 GB local copy.

The prompt

Use the write-script-duckdb skill. Query this remote Parquet over HTTPS: https://example.com/data/nyc_taxi_2026.parquet. Return average trip distance and total fare by payment_type, for trips over $50. Read it straight from the URL with read_parquet() - the httpfs extension auto-loads on first remote read, so no manual INSTALL is needed.

What slides.md looks like

-- httpfs auto-loads on the first https:// read
SELECT
    payment_type,
    count(*)               AS trips,
    round(avg(trip_distance), 2) AS avg_miles,
    round(sum(fare_amount), 2)   AS total_fares
FROM read_parquet('https://example.com/data/nyc_taxi_2026.parquet')
WHERE fare_amount > 50
GROUP BY payment_type
ORDER BY total_fares DESC;

-- Glob a folder of remote files and keep the source:
-- SELECT *, filename FROM read_parquet('s3://bucket/y=2026/*.parquet')

One-line tweak

Add filename = true (or SELECT *, filename) so each row carries which partition file it came from - invaluable when globbing a dated folder.

05

Window functions for running totals & ranks

Compute a 7-day moving average, a running total, and per-group ranks in one pass using DuckDB's window functions and QUALIFY.

ForAnalysts who keep reaching for Excel's drag-fill. Window functions do it correctly and reproducibly.

The prompt

Use the write-script-duckdb skill. From ./daily_revenue.csv (date, region, revenue), compute per region: the 7-day moving average of revenue, the running cumulative total, and rank each day within its region by revenue. Use AVG/SUM OVER with frame clauses and row_number() OVER. Use QUALIFY to also return only each region's single best day.

What slides.md looks like

SELECT
    date, region, revenue,
    round(avg(revenue) OVER w7, 2) AS revenue_7d_avg,
    sum(revenue)       OVER wrun   AS revenue_running,
    row_number()       OVER wrank  AS day_rank
FROM 'daily_revenue.csv'
WINDOW
    w7   AS (PARTITION BY region ORDER BY date
             ROWS BETWEEN 6 PRECEDING AND CURRENT ROW),
    wrun AS (PARTITION BY region ORDER BY date),
    wrank AS (PARTITION BY region ORDER BY revenue DESC)
QUALIFY day_rank = 1;   -- best day per region only

One-line tweak

Drop the QUALIFY line to keep every day with its rolling metrics; keep it to collapse to the one peak day per region. Same query, two deliverables.

06

PIVOT and UNPIVOT a table

Reshape long data to wide (a cross-tab) and back again using DuckDB's PIVOT / UNPIVOT statements - no hand-written CASE-WHEN columns.

ForAnyone building a quarter-by-region matrix for a deck, or normalizing a wide spreadsheet into long form for analysis.

The prompt

Use the write-script-duckdb skill. From ./sales.csv (country, year, population) build a cross-tab: one row per country, one column per year, cells = sum(population). Use DuckDB's PIVOT statement. Then show me the inverse with UNPIVOT to fold a wide year-columns table back into (country, year, population) long form.

What slides.md looks like

-- Wide cross-tab: years become columns
PIVOT (FROM 'sales.csv')
ON year
USING sum(population)
GROUP BY country;

-- Inverse: fold year columns back to long form
UNPIVOT wide_sales
ON "2024", "2025", "2026"
INTO
    NAME  year
    VALUE population;

One-line tweak

PIVOT discovers the distinct ON values automatically. To freeze a fixed set of columns (e.g. only 2024-2026), use the SQL-standard form: PIVOT sales ON year IN (2024, 2025, 2026) USING sum(population).

07

Export query results to Parquet

Run a transformation and write the result to a compressed, columnar Parquet file - the format every downstream warehouse and Polars/pandas job reads fastest.

ForData engineers producing a cleaned, typed artifact for a pipeline instead of re-deriving it from raw CSV every run.

The prompt

Use the write-script-duckdb skill. Read ./raw_events.csv, drop rows where user_id IS NULL, cast ts to TIMESTAMP, keep only the columns user_id, ts, action, and write the cleaned result to ./clean/events.parquet using COPY ... TO with FORMAT parquet. Then also show the PARTITION_BY variant that writes one folder per action.

What slides.md looks like

-- Single cleaned Parquet file
COPY (
    SELECT user_id, ts::TIMESTAMP AS ts, action
    FROM 'raw_events.csv'
    WHERE user_id IS NOT NULL
) TO 'clean/events.parquet' (FORMAT parquet);

-- Hive-partitioned: one sub-folder per action value
COPY (SELECT * FROM 'raw_events.csv')
TO 'clean/events' (FORMAT parquet, PARTITION_BY (action));

One-line tweak

Add COMPRESSION 'zstd' inside the options for a smaller file than the default Snappy, at a small write-time cost. Worth it for cold artifacts you read often.

08

Query a SQLite database in place

Attach an existing SQLite file and run analytical SQL across its tables - join them, aggregate them, without an ETL step into another store.

ForDevelopers whose app data lives in a SQLite file (mobile apps, local tools, Django/Rails dev DBs) who want fast analytics on it.

The prompt

Use the write-script-duckdb skill. I have ./app.sqlite with tables users(id, country) and sessions(user_id, started_at, duration_s). Attach it as a SQLite database and return median session duration and session count per country. Use INSTALL sqlite; LOAD sqlite; ATTACH ... (TYPE sqlite); then query the attached tables by qualified name.

What slides.md looks like

INSTALL sqlite;   -- once per environment
LOAD sqlite;
ATTACH 'app.sqlite' AS app (TYPE sqlite);

SELECT
    u.country,
    count(*)                        AS sessions,
    median(s.duration_s)            AS median_seconds
FROM app.sessions AS s
JOIN app.users    AS u ON u.id = s.user_id
GROUP BY u.country
ORDER BY sessions DESC;

One-line tweak

DuckDB ships ATTACH readers for SQLite, Postgres, and MySQL. Swap (TYPE sqlite) for (TYPE postgres) plus a connection string to run the same analytical SQL against a live Postgres without touching its write path.

09

COPY a CSV into a typed table

Define an explicit, typed schema and bulk-load a CSV into it with COPY - the path you want when types matter more than zero-setup convenience.

ForEngineers who need a column to be DECIMAL not DOUBLE, or a date parsed with a specific format, before any downstream query touches it.

The prompt

Use the write-script-duckdb skill. Create a typed DuckDB table 'transactions' (txn_id BIGINT, amount DECIMAL(12,2), currency VARCHAR, txn_date DATE) and bulk-load ./txns.csv into it with COPY ... FROM. The CSV has a header row and a non-default date format YYYY-MM-DD. After loading, return the row count and the sum by currency.

What slides.md looks like

CREATE TABLE transactions (
    txn_id   BIGINT,
    amount   DECIMAL(12,2),
    currency VARCHAR,
    txn_date DATE
);

COPY transactions FROM 'txns.csv'
    (HEADER true, DATEFORMAT '%Y-%m-%d');

SELECT currency, count(*) AS n, sum(amount) AS total
FROM transactions
GROUP BY currency
ORDER BY total DESC;

One-line tweak

Add (HEADER true, IGNORE_ERRORS true) to skip malformed rows instead of failing the whole load, then query the rejects table DuckDB records to see what it dropped.

10

Read an Excel sheet directly

Query an .xlsx workbook the same way you query a CSV - pick a sheet, treat row one as headers, and run SQL with no manual Save-As-CSV dance.

ForAnyone handed a spreadsheet by finance, ops, or a vendor who needs answers out of it without opening Excel.

The prompt

Use the write-script-duckdb skill. From ./budget.xlsx, read the sheet named 'FY26' with the first row as headers, and return total budget and total actual by department, plus the variance. Use read_xlsx() with the sheet and header options - DuckDB infers the Excel extension automatically, no manual INSTALL.

What slides.md looks like

SELECT
    department,
    sum(budget)            AS budget,
    sum(actual)            AS actual,
    sum(actual) - sum(budget) AS variance
FROM read_xlsx('budget.xlsx', sheet = 'FY26', header = true)
GROUP BY department
ORDER BY variance;

-- .xlsx only; .xls is not supported by the reader.

One-line tweak

Wrap the read in a COPY (...) TO 'budget_fy26.parquet' (FORMAT parquet) to snapshot the spreadsheet into a stable Parquet the rest of the pipeline can trust.

Community signal

Three voices from people using DuckDB for real work. The first is the time-to-value story; the second is the daily-default shift; the third names exactly who it clicks for.

Naively I started by looking at redshift, Athena/glue, etc. I spun my wheels for a week and barely made progress... So I decided to give DuckDB a shot and I got the whole project done in under a day.

moribvndvs · Hacker News

An engineer tasked with giving a non-technical team arbitrary queries over a daily Parquet drop - a week of fragile cloud PoCs versus a day on DuckDB.

Source
Over the past few years, I've found myself using DuckDB more and more for data processing, to the point where I now use it almost exclusively

Robin Linacre · Blog

Data scientist Robin Linacre on why DuckDB became his default tool for data processing - the conviction shift the cookbook is built around.

Source
I'm sure the use of duckdb may seem weird for normal developers, but for data people it really is game-changing, especially for data scientists or business analysts.

rubenvanwyk · Hacker News

A commenter naming exactly who DuckDB clicks for - analysts and data scientists - even when it looks unusual to app developers.

Source

The contrarian take

DuckDB earns real skepticism when people try to make it something it isn’t. The sharpest version, from sradman on Hacker News:

I would not adopt DuckDB as an OLTP replacement for either SQLite or PostgreSQL without a strong OLAP component and some serious performance/load/integrity testing

sradman · Hacker News

From a Hacker News thread on DuckDB's positioning.

Source

DuckDB was never meant to replace your OLTP store - it is an in-process OLAP engine. The right architecture is Postgres (or your app DB) for writes, DuckDB for analytics over the exports. The maturity worry is also dated: this comment predates DuckDB 1.0. The skill in this cookbook leans into that split on purpose: every recipe is a read-heavy analytical query over files or an attached store, never a transactional write path. If you find yourself wanting concurrent writers, row-level locking, or a long-running server, you want Postgres — and you can still point DuckDB at it (use case 8) for the analytics.

One related framing worth naming: the “DuckDB vs Polars” debate is mostly a false choice. As one commenter put it, “Polars and duckdb interoperate so it’s not really one or the other.” Both build on Apache Arrow, so a frame moves between them with no copy. Pick SQL when the work is joins and windows; pick the dataframe API when you want Python-side logic. The cookbook pairs them deliberately.

Where DuckDB ships

Concrete examples of DuckDB in production and notable projects. None used the Claude skill specifically — they’re here to show the range of work the engine carries, so you have a sense of how far the in-process model reaches before you write your first query.

Gotchas (the four that bite)

Sourced from the DuckDB documentation and the duckdb/duckdb issue tracker.

It is single-process, not a concurrent server

DuckDB is embedded: one writer at a time, no network protocol, no replication. It's for analytics, not for serving an app's concurrent writes. That's a feature, not a bug - but plan your architecture around it.

Large joins can exhaust memory

DuckDB spills to disk, but a huge join or aggregation can still blow past RAM. Set a memory_limit pragma, prefer Parquet over CSV (it reads only needed columns), and filter early in the query.

CSV type detection samples, then commits

Auto-detection reads a sample (default ~20k rows) to infer types. A rare string in an otherwise-numeric column further down can break the read. Bump sample_size, or define columns explicitly when the data is messy.

read_xlsx is .xlsx only

The Excel reader handles .xlsx, not the legacy .xls binary format. Convert the workbook to .xlsx first, or save the sheet as CSV - then the rest of the cookbook applies unchanged.

Pairs well with

Curated to match the cookbook’s actual integrations: the data-adjacent skills (data-analysis for Polars, sql, xlsx, matplotlib) plus the SQL MCP servers the later use cases lean on when the source data lives in a real database. There is also a DuckDB MCP server and a MotherDuck & DuckDB server if you want the engine running as a shared service rather than authoring SQL with the skill.

Two posts that compose well with this cookbook: What are Claude Code skills? covers the underlying mechanism, and Claude Code best practices covers the orchestration patterns the multi-file recipes (2, 4, 7) lean on.

Frequently asked questions

Do I need to set up a database to use the DuckDB skill?

No. That is the whole point of DuckDB - it is an in-process engine, so there is no server to start and no schema to define. The skill writes SQL that reads CSV, Parquet, JSON, and Excel files straight off disk or a URL with the FROM 'file.ext' shorthand. You only CREATE TABLE when you deliberately want a typed, persistent schema (use case 9).

Can DuckDB query Parquet and CSV files without importing them first?

Yes - read_csv(), read_parquet(), read_json(), and read_xlsx() (plus the FROM 'file.ext' shorthand) let you run analytical SQL directly against files. DuckDB only reads the columns and row groups your query needs, so a query over a multi-GB Parquet does not load the whole file. Use cases 1, 2, and 4 cover this.

Does the DuckDB skill handle remote files over HTTP or S3?

Yes. DuckDB's httpfs extension auto-loads on the first read of an https:// or s3:// path - no manual INSTALL needed - so read_parquet('https://.../file.parquet') works as-is. Glob a folder (s3://bucket/*.parquet) and add filename=true to keep the source file per row. Use case 4 walks through it.

DuckDB vs Polars - which should I use?

They interoperate rather than compete; both build on Apache Arrow, so you can move a frame between them with no copy. Reach for DuckDB when the work is naturally SQL - joins across files, window functions, PIVOT. Reach for Polars (the data-analysis skill) when you want a Python dataframe API for plotting or row-wise logic. Most real pipelines use both.

Is DuckDB a replacement for Postgres?

No, and it is not trying to be. Postgres is an OLTP database built for many concurrent writers and transactional integrity; DuckDB is an in-process OLAP engine built for fast analytical reads. The clean pattern is Postgres for your application's writes and DuckDB for analytics over the exports - DuckDB can even ATTACH a live Postgres (use case 8) to query it without touching the write path.

Can DuckDB query a SQLite database directly?

Yes. INSTALL sqlite; LOAD sqlite; ATTACH 'app.sqlite' AS app (TYPE sqlite); then query app.table_name with full DuckDB SQL - joins, window functions, aggregates - against the SQLite tables in place. The same ATTACH mechanism works for Postgres and MySQL with (TYPE postgres) / (TYPE mysql). Use case 8 has the full script.

Is there a DuckDB MCP server, or should I use the skill?

Both exist on mcp.directory. The skill writes DuckDB SQL and costs ~100 tokens at idle - the lighter choice when you run the CLI yourself. The DuckDB MCP server (and MotherDuck & DuckDB) runs the engine as a service so multiple AI clients can query a shared instance, at the cost of tool schemas loaded on every turn. Use the server for shared state; the skill for authoring queries.

Sources

Primary

Community

Critical and contrarian

Internal

Keep reading