Query remote Parquet files, local CSVs, org-mode tables, and Elisp data structures with SQL. Get results as native Elisp values. One function call, no intermediate files, no external scripts.
| Query Remote Data, Filter with Elisp, Get Native Structures |
|---|
|
Query joining 336k-row remote Parquet with Elisp alist: (let ((carriers '(((code . "UA") (name . "United Airlines"))
((code . "AA") (name . "American Airlines"))
((code . "DL") (name . "Delta Air Lines")))))
(duckdb-query
"SELECT c.name as airline,
{'flights': COUNT(*),
'avg_delay': ROUND(AVG(f.arr_delay), 1)} as stats
FROM 'https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet' f
JOIN @carriers c ON f.carrier = c.code
GROUP BY c.name
ORDER BY COUNT(*) DESC"
:data `((carriers . ,carriers))
:format :columnar))Result (columnar format with nested structs): ((airline . ["United Airlines" "Delta Air Lines" "American Airlines"])
(stats . [((flights . 58665) (avg_delay . 3.6))
((flights . 48110) (avg_delay . 1.6))
((flights . 32729) (avg_delay . 0.4))])) |
One function call: remote Parquet + Elisp filter → nested Elisp structures
DuckDB is an embedded analytical database that reads Parquet, CSV, and JSON
directly from local paths, HTTP URLs, or cloud storage. This package makes
DuckDB’s capabilities available to Elisp programs. The duckdb-query function
executes SQL and returns alists, plists, vectors, or org-table-compatible lists.
Results are ready for mapcar, seq-filter, cl-loop, or direct insertion
into buffers.
The package treats all data sources uniformly. Elisp alists become SQL tables
via the :data parameter. Named org tables in your buffer resolve through
@org:table-name syntax. These combine freely with file paths and URLs in
joins, unions, and subqueries. A query can join a remote Parquet file with a
ten-row org table and a runtime Elisp variable, returning the result as an
alist you can immediately process.
| Code | Result |
|---|---|
;; List of alists (default)
(duckdb-query
"SELECT 'Alice' as name, 95 as score
UNION ALL SELECT 'Bob', 87") |
(((name . "Alice") (score . 95))
((name . "Bob") (score . 87))) |
;; Org-table format for display
(duckdb-query
"SELECT * FROM range(1,4) t(n),
(SELECT unnest(['a','b']) as letter)"
:format :org-table) |
, rendered data ,| n | letter | ((n letter) ,|---|--------| (1 "a") (2 "a") ,| 1 | a | (3 "a") (1 "b") ,| 1 | b | (2 "b") (3 "b")) ,| 2 | a | ,| 2 | b | ,| 3 | a | ,| 3 | b | |
;; Single value extraction
(duckdb-query-value
"SELECT COUNT(*) FROM range(1000)") |
1000 |
;; Single column as list
(duckdb-query-column
"SELECT unnest(
['red','green','blue']) as color") |
("red" "green" "blue") |
;; Columnar format for analysis
(duckdb-query
"SELECT * FROM range(5) t(n)"
:format :columnar) |
((n . [0 1 2 3 4])) |
- Querying Elisp Data
- Querying Org Tables
- Combining Sources
- Output Formats
- Schema Introspection
- Database Persistence
- (NEW) Session-Based Execution
- Nested Types
- Performance
- Acknowledgements
- Related Packages
- Integration
Pass Elisp alists to queries via the :data parameter. The data becomes a table
named @data:
| Code | Result |
|---|---|
(duckdb-query
"SELECT name, score * 1.1 as curved
FROM @data
WHERE score > 85"
:data '(((name . "Aly") (score . 95))
((name . "Bob") (score . 87))
((name . "Carol") (score . 72)))) |
(((name . "Aly") (curved . 104.5))
((name . "Bob") (curved . 95.7))) |
For multiple tables, use named bindings with backquote to capture lexical variables:
| Code | Result |
|---|---|
(let ((users ;; users table
'(((id . 1) (name . "Alice"))
((id . 2) (name . "Bob"))))
(orders ;; orders table
'(((user_id . 1) (item . "Widget"))
((user_id . 2) (item . "Gadget"))
((user_id . 1) (item . "Gizmo")))))
(duckdb-query
"SELECT u.name, COUNT(*) as count
FROM @users u
JOIN @orders o ON u.id = o.user_id
GROUP BY u.name"
:data `((users . ,users)
(orders . ,orders)))) |
(((name . "Alice") (count . 2))
((name . "Bob") (count . 1))) |
This enables joining runtime Elisp state with static file data. Previously this required exporting to CSV, running an external query, and parsing results. Now it is one expression.
Reference named org tables with @org:table-name. Given this table in your
buffer:
| Table |
|---|
,#+NAME: inventory ,| sku | product | stock | price | ,|-------+---------+-------+-------| ,| A-100 | Widget | 50 | 9.99 | ,| B-200 | Gadget | 30 | 24.99 | ,| C-300 | Gizmo | 75 | 14.99 | |
| Code | Result |
|---|---|
(duckdb-query
"SELECT product,
stock * price as value
FROM @org:inventory
ORDER BY value DESC"
:format :org-table) |
,| product | value | ((product value)
,|---------+---------| ("Gizmo" 1124.25)
,| Gizmo | 1124.25 | ("Gadget" 749.7)
,| Gadget | 749.7 | ("Widget" 499.5))
,| Widget | 499.5 |
|
Reference tables in other files with @org:path/to/file.org:table-name.
Join an org table with Elisp data:
| Code | Result |
|---|---|
(let ((sales '(((sku . "A-100") (qty . 10))
((sku . "A-100") (qty . 5))
((sku . "C-300") (qty . 20)))))
(duckdb-query
"SELECT i.product,
SUM(s.qty) as sold,
i.stock - SUM(s.qty) as remaining
FROM @org:inventory i
JOIN @sales s ON i.sku = s.sku
GROUP BY i.product, i.stock"
:data `((sales . ,sales)))) |
(((product . "Widget")
(sold . 15)
(remaining . 35))
((product . "Gizmo")
(sold . 20)
(remaining . 55))) |
Join a remote Parquet file with local Elisp data:
(let ((airports '(((code . "JFK")
(name . "JFK International"))
((code . "LGA")
(name . "LaGuardia")))))
(duckdb-query
"SELECT a.name,
COUNT(*) as departures,
ROUND(AVG(f.dep_delay), 1) as avg_delay
FROM 'https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet' as f
JOIN @airports a ON f.origin = a.code
GROUP BY a.name
ORDER BY departures DESC"
:data `((airports . ,airports))))
;; => (((name . "JFK International") (departures . 111279) (avg_delay . 12.1))
;; ((name . "LaGuardia") (departures . 104662) (avg_delay . 10.3)))The :format parameter controls result structure:
| Format | Structure | Use Case |
|---|---|---|
:alist | List of alists | General Elisp processing |
:plist | List of plists | Keyword-based access |
:hash | List of hash-tables | O(1) field lookup |
:vector | Vector of alists | Index-based access |
:columnar | Alist of column vectors | Numerical analysis |
:org-table | List of lists with header | Display, org-mode integration |
:raw | JSON string | Custom parsing |
(duckdb-query-column-types "SELECT 1 as id, 'text' as name, 3.14 as value")
;; => (("id" . "INTEGER") ("name" . "VARCHAR") ("value" . "DECIMAL(3,2)"))(duckdb-query-columns
"https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet")
;; => ("year" "month" "day" "dep_time" "sched_dep_time" "dep_delay" "arr_time"
;; "sched_arr_time" "arr_delay" "carrier" "flight" "tailnum" "origin" "dest"
;; "air_time" "distance" "hour" "minute" "time_hour")Tip
New in 0.7.0: duckdb-query-with-database accepts an optional spec list for
additional options. Both (duckdb-query-with-database "path" ...) and
(duckdb-query-with-database '("path" :readonly t) ...) are supported.
For workflows requiring state across queries, bind a database file:
| Code | Result |
|---|---|
(duckdb-query-with-transient-database
(duckdb-query
"CREATE TABLE events (id INT, name TEXT)")
(duckdb-query
"INSERT INTO events VALUES
(1, 'click'), (2, 'scroll')")
(duckdb-query "SELECT * FROM events")) |
(((id . 1) (name . "click"))
((id . 2) (name . "scroll"))) |
duckdb-query-with-transient-database creates a temporary file deleted after the
block. Use duckdb-query-with-database for persistent files:
(duckdb-query-with-database '("analytics.duckdb")
(duckdb-query "SELECT COUNT(*) FROM events"))The spec list accepts keyword options:
;; Open read-only to prevent write conflicts
(duckdb-query-with-database '("/shared/data.db" :readonly t)
(duckdb-query "SELECT * FROM reports"))
;; With variable path
(let ((db-path "~/project/app.db"))
(duckdb-query-with-database `(,db-path)
(duckdb-query "SELECT * FROM users")))These macros work with CLI execution. For persistent processes with lower latency, see “Session-Based Execution” below.
For repeated queries or workflows requiring persistent state, sessions eliminate DuckDB process spawn overhead (~10-20ms per query). A session maintains a long-running DuckDB process with its own temporary database:
;; Initialize sessions
(duckdb-query-session-start "work")
(duckdb-query-session-start "other-session")
(let ((result-set
(list :work-results
(duckdb-query-with-session "work"
(duckdb-query "CREATE TABLE events AS SELECT i, random() AS value FROM range(1000) t(i)")
(duckdb-query "SELECT AVG(value) as average FROM events"))
:other-results
(duckdb-query-with-session "other-session"
(duckdb-query "CREATE TABLE logs AS SELECT i, random() AS value FROM range(10) t(i)")
(duckdb-query "SELECT AVG(value) as average FROM logs"))
:back-to-work
;; Session state persists across scopes
(duckdb-query-with-session "work"
(duckdb-query "CREATE TABLE new_table AS SELECT i, random() AS value FROM range(100) t(i)")
(duckdb-query "SELECT AVG(value) as average FROM new_table")))))
;; Clean up when done
(duckdb-query-session-kill "work")
(duckdb-query-session-kill "other-session")
result-set)For temporary sessions that clean up automatically:
(duckdb-query-with-transient-session
(duckdb-query "CREATE TABLE scratch AS SELECT random() as val FROM range(5)")
(duckdb-query "SELECT AVG(val) as mean FROM scratch"))
;; => (((mean . 0.33628298988435995)))Sessions provide 7-10x speedup for simple queries:
| Executor | Simple Query | 10k Rows |
|---|---|---|
| :cli | ~22ms | ~93ms |
| :session | ~2ms | ~20ms |
Attach external database files to a session for cross-database queries.
duckdb-query-with-database works seamlessly inside session scope:
(duckdb-query-session-start "analysis")
(unwind-protect
(duckdb-query-with-session "analysis"
;; Create local session table
(duckdb-query "CREATE TABLE local_summary AS SELECT 'session' as src")
;; Attach external database - automatically becomes default catalog
(duckdb-query-with-database '("/data/sales.db" :readonly t)
;; Unqualified names resolve to attached database
(duckdb-query "SELECT * FROM orders LIMIT 5")))
(duckdb-query-session-kill "analysis"))The macro handles catalog context automatically:
- ATTACHes the database with specified readonly mode
- Executes
USE alias.mainto make it the default catalog - Restores previous catalog context after the block
- DETACHes the database on exit
For manual control, use duckdb-query-session-attach and duckdb-query-session-detach:
(duckdb-query-session-start "work")
(unwind-protect
(duckdb-query-with-session "work"
(let ((alias (duckdb-query-session-attach "work" "/data/sales.db" nil t)))
;; Access with qualified names: alias.table_name
(duckdb-query (format "SELECT COUNT(*) FROM %s.orders" alias))
(duckdb-query-session-detach "work" alias)))
(duckdb-query-session-kill "work"))duckdb-query-with-transient-database also adapts to session context:
(duckdb-query-with-session "work"
(duckdb-query "CREATE TABLE permanent AS SELECT 'stays' as val")
(duckdb-query-with-transient-database
;; Temp database attached and set as default catalog
(duckdb-query "CREATE TABLE scratch AS SELECT 'gone' as val")
(duckdb-query-value "SELECT val FROM scratch"))
;; => "gone"
;; Back to session context, scratch is gone
(duckdb-query-value "SELECT val FROM permanent"))
;; => "stays"List active sessions:
(duckdb-query-session-list)
;; => (#s(duckdb-session :name "analytics" :status active :query-count 42 ...))Interactive commands:
M-x duckdb-query-session-display-status- Show all sessions with statisticsM-x duckdb-query-session-kill-interactive- Kill session with completionM-x duckdb-query-session-kill-all- Kill all sessions
Session initialization commands run when sessions start:
(setq duckdb-query-session-init-commands
'("SET memory_limit = '4GB'"
"INSTALL spatial"
"LOAD spatial"))DuckDB supports STRUCT, LIST, MAP, and ARRAY types. These serialize correctly as nested Elisp structures:
| Code | Result |
|---|---|
(duckdb-query
"SELECT
{'x': 1, 'y': 2}::STRUCT(x INT, y INT)
as point") |
(((point (x . 1) (y . 2)))) |
;; Nested data survives roundtrips
(let ((data
'(((id . 1) (loc (x . 10) (y . 20)))
((id . 2) (loc (x . 30) (y . 40))))))
(duckdb-query
"SELECT id, loc.x + loc.y as sum
FROM @data"
:data data)) |
(((id . 1) (sum . 30))
((id . 2) (sum . 70))) |
The package writes results to temporary files by default, providing 3-5x speedup on large results compared to streaming through stdout:
| Result Size | Default (file) | Streaming (pipe) | Speedup | |-------------+----------------+------------------+---------| | 10k rows | 64ms | 162ms | 2.5x | | 100k rows | 366ms | 1.28s | 3.5x | | 500k rows | 1.32s | 6.19s | 4.7x |
Force streaming mode with :output-via :pipe when needed. DDL statements
automatically use streaming since they cannot be wrapped in file output.
Profile your specific queries with the benchmark module:
| Code | Result |
|---|---|
(duckdb-query-bench-output-strategies
"SELECT * FROM range(10000) t(i)"
:iterations 3) |
,| strategy | mean | min | max | n | ,|----------|-------|-------|-------|---| ,| :file | 22ms | 21ms | 23ms | 3 | ,| :pipe | 28ms | 27ms | 29ms | 3 | ,| :speedup | 1.27x | | | | |
Other benchmark functions: duckdb-query-bench-formats (compare output formats),
duckdb-query-bench-nested-types (compare nested type handling),
duckdb-query-bench-data-formats (compare :data serialization).
- cl-duckdb by Ákos Kiss: The Common Lisp DuckDB wrapper that inspired this
package’s design philosophy. The approach of treating query results as native
Lisp data structures, the
with-static-tablepattern for querying Lisp data directly, and thewith-transient-connection/with-default-connectionmacros for database context management all informed the design of duckdb-query’s:dataparameter,@org:references, andduckdb-query-with-transient-database/duckdb-query-with-databaseforms.
- ob-duckdb: Org Babel integration for DuckDB source blocks. SOON I’ll add most of the functionalities here to it via a specialized ob-duckdb-query executor, this will allow org-table reference in duckdb-sql source blocks and new output types.
Warning
UNDER CONSTRUCTION
As mentioned before, sessions allow low-latency, fast response times, going as
low as 2ms on frequent, repeating small queries like batched updates/inserts and
single row extraction. Add to this persistent session processes that can
communicate between each other via database ATTACH~/~DETACH or external file
output/input, and we have the foundation for database-backed major modes, worker
pools, and orchestrated data pipelines within Emacs.
The duckdb-query-session-setup-buffer helper attaches a session to the current
buffer. It creates the session if needed, registers the buffer as an owner, and
sets up the buffer-local binding so all duckdb-query calls within that buffer
automatically route to the session. The session persists while any owning buffer
exists and is killed when the last owner closes:
(add-hook 'my-data-mode-hook
(lambda ()
(duckdb-query-session-setup-buffer "my-data-session")))
;; All my-data-mode buffers share one session
;; Session killed when last my-data-mode buffer closesThis pattern enables major modes that maintain persistent analytical state. A dashboard mode might cache expensive aggregations in session tables and refresh instantly on demand:
(define-derived-mode my-dashboard-mode special-mode "Dashboard"
(duckdb-query-session-setup-buffer "dashboard")
(setq-local revert-buffer-function #'my-dashboard-refresh))
(defun my-dashboard-refresh (&rest _)
"Refresh dashboard from session's cached aggregations."
(let ((data (duckdb-query "SELECT * FROM cached_metrics")))
(my-dashboard-render data)))Three hooks provide visibility into session activity, enabling logging, metrics collection, and cross-session coordination:
;; Log all queries with timing
(add-hook 'duckdb-query-session-query-executed-functions
(lambda (name query duration-ms)
(message "[%s] %.1fms: %s"
name duration-ms
(truncate-string-to-width query 60))))
;; Track session lifecycle
(add-hook 'duckdb-query-session-created-functions
(lambda (name _session)
(message "Session started: %s" name)))
(add-hook 'duckdb-query-session-killed-functions
(lambda (name)
(message "Session ended: %s" name)))These hooks will be essential for building a coordinator layer that manages worker sessions, tracks query performance across the system, and handles failover when sessions die unexpectedly.
This infrastructure opens up architectural patterns that go beyond single-query workflows. The ultimate goal is to enable the creation of data-centric emacs packages and tools, such as:
- quickly retrieving tabular data from most filetypes or complex queries
- perform tens to hundreds of CRUD operations per second
- manage and orchestrate data processing jobs across multiple workers from a central coordinator
- leverage graph queries, geo-spatial processing, vector similarity search and other techniques inside Emacs
Some directions still in design:
- Database-backed buffers: Major modes where buffer content is backed by session tables, enabling SQL-powered search, filter, and transform operations that would be impractical with pure Elisp
- Worker pools: Multiple sessions processing jobs in parallel, coordinated
through a shared database via
ATTACHor message-passing through elisp. - Live dashboards: Buffers that subscribe to query results and refresh automatically when underlying data changes
I’ll update this section as these experiments mature into usable packages.