duckdb-query.el - Query Anything, Get Elisp

Query remote Parquet files, local CSVs, org-mode tables, and Elisp data structures with SQL. Get results as native Elisp values. One function call, no intermediate files, no external scripts.

Query Remote Data, Filter with Elisp, Get Native Structures

Query joining 336k-row remote Parquet with Elisp alist:

(let ((carriers '(((code . "UA") (name . "United Airlines"))
                  ((code . "AA") (name . "American Airlines"))
                  ((code . "DL") (name . "Delta Air Lines")))))
  (duckdb-query
   "SELECT c.name as airline,
           {'flights': COUNT(*),
            'avg_delay': ROUND(AVG(f.arr_delay), 1)} as stats
    FROM 'https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet' f
    JOIN @carriers c ON f.carrier = c.code
    GROUP BY c.name
    ORDER BY COUNT(*) DESC"
   :data `((carriers . ,carriers))
   :format :columnar))

Result (columnar format with nested structs):

((airline . ["United Airlines" "Delta Air Lines" "American Airlines"])
 (stats . [((flights . 58665) (avg_delay . 3.6))
           ((flights . 48110) (avg_delay . 1.6))
           ((flights . 32729) (avg_delay . 0.4))]))

One function call: remote Parquet + Elisp filter → nested Elisp structures

DuckDB is an embedded analytical database that reads Parquet, CSV, and JSON directly from local paths, HTTP URLs, or cloud storage. This package makes DuckDB’s capabilities available to Elisp programs. The duckdb-query function executes SQL and returns alists, plists, vectors, or org-table-compatible lists. Results are ready for mapcar, seq-filter, cl-loop, or direct insertion into buffers.

The package treats all data sources uniformly. Elisp alists become SQL tables via the :data parameter. Named org tables in your buffer resolve through @org:table-name syntax. These combine freely with file paths and URLs in joins, unions, and subqueries. A query can join a remote Parquet file with a ten-row org table and a runtime Elisp variable, returning the result as an alist you can immediately process.

Quick Examples

Code	Result
;; List of alists (default) (duckdb-query "SELECT 'Alice' as name, 95 as score UNION ALL SELECT 'Bob', 87")	(((name . "Alice") (score . 95)) ((name . "Bob") (score . 87)))
;; Org-table format for display (duckdb-query "SELECT * FROM range(1,4) t(n), (SELECT unnest(['a','b']) as letter)" :format :org-table)	, rendered data ,\| n \| letter \| ((n letter) ,\|---\|--------\| (1 "a") (2 "a") ,\| 1 \| a \| (3 "a") (1 "b") ,\| 1 \| b \| (2 "b") (3 "b")) ,\| 2 \| a \| ,\| 2 \| b \| ,\| 3 \| a \| ,\| 3 \| b \|
;; Single value extraction (duckdb-query-value "SELECT COUNT(*) FROM range(1000)")	1000
;; Single column as list (duckdb-query-column "SELECT unnest( ['red','green','blue']) as color")	("red" "green" "blue")
;; Columnar format for analysis (duckdb-query "SELECT * FROM range(5) t(n)" :format :columnar)	((n . [0 1 2 3 4]))

Querying Elisp Data

Pass Elisp alists to queries via the :data parameter. The data becomes a table named @data:

Code

Result

(duckdb-query
 "SELECT name, score * 1.1 as curved
  FROM @data
  WHERE score > 85"
 :data '(((name . "Aly") (score . 95))
         ((name . "Bob") (score . 87))
         ((name . "Carol") (score . 72))))

(((name . "Aly") (curved . 104.5))
 ((name . "Bob") (curved . 95.7)))

For multiple tables, use named bindings with backquote to capture lexical variables:

Code

Result

 (let ((users ;; users table
       '(((id . 1) (name . "Alice"))
         ((id . 2) (name . "Bob"))))
      (orders ;; orders table
       '(((user_id . 1) (item . "Widget"))
         ((user_id . 2) (item . "Gadget"))
         ((user_id . 1) (item . "Gizmo")))))
  (duckdb-query
   "SELECT u.name, COUNT(*) as count
    FROM @users u
    JOIN @orders o ON u.id = o.user_id
    GROUP BY u.name"
   :data `((users . ,users)
           (orders . ,orders))))

(((name . "Alice") (count . 2))
 ((name . "Bob") (count . 1)))

This enables joining runtime Elisp state with static file data. Previously this required exporting to CSV, running an external query, and parsing results. Now it is one expression.

Querying Org Tables

Reference named org tables with @org:table-name. Given this table in your buffer:

Table
,#+NAME: inventory ,\| sku \| product \| stock \| price \| ,\|-------+---------+-------+-------\| ,\| A-100 \| Widget \| 50 \| 9.99 \| ,\| B-200 \| Gadget \| 30 \| 24.99 \| ,\| C-300 \| Gizmo \| 75 \| 14.99 \|

Code

Result

(duckdb-query
 "SELECT product,
    stock * price as value
  FROM @org:inventory
  ORDER BY value DESC"
 :format :org-table)

,| product |   value | ((product value)
,|---------+---------|  ("Gizmo" 1124.25)
,| Gizmo   | 1124.25 |  ("Gadget" 749.7)
,| Gadget  |   749.7 |  ("Widget" 499.5))
,| Widget  |   499.5 |

Reference tables in other files with @org:path/to/file.org:table-name.

Combining Sources

Join an org table with Elisp data:

Code

Result

(let ((sales '(((sku . "A-100") (qty . 10))
               ((sku . "A-100") (qty . 5))
               ((sku . "C-300") (qty . 20)))))
  (duckdb-query
   "SELECT i.product,
           SUM(s.qty) as sold,
           i.stock - SUM(s.qty) as remaining
    FROM @org:inventory i
    JOIN @sales s ON i.sku = s.sku
    GROUP BY i.product, i.stock"
   :data `((sales . ,sales))))

(((product . "Widget")
  (sold . 15)
  (remaining . 35))
 ((product . "Gizmo")
  (sold . 20)
  (remaining . 55)))

Join a remote Parquet file with local Elisp data:

 (let ((airports '(((code . "JFK")
                   (name . "JFK International"))
                  ((code . "LGA")
                   (name . "LaGuardia")))))
  (duckdb-query
   "SELECT a.name,
           COUNT(*) as departures,
           ROUND(AVG(f.dep_delay), 1) as avg_delay
    FROM 'https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet' as f
    JOIN @airports a ON f.origin = a.code
    GROUP BY a.name
    ORDER BY departures DESC"
   :data `((airports . ,airports))))

;; => (((name . "JFK International") (departures . 111279) (avg_delay . 12.1))
;;     ((name . "LaGuardia") (departures . 104662) (avg_delay . 10.3)))

Output Formats

The :format parameter controls result structure:

Format	Structure	Use Case
`:alist`	List of alists	General Elisp processing
`:plist`	List of plists	Keyword-based access
`:hash`	List of hash-tables	O(1) field lookup
`:vector`	Vector of alists	Index-based access
`:columnar`	Alist of column vectors	Numerical analysis
`:org-table`	List of lists with header	Display, org-mode integration
`:raw`	JSON string	Custom parsing

Schema Introspection

Discover structure before querying:

(duckdb-query-column-types "SELECT 1 as id, 'text' as name, 3.14 as value")
;; => (("id" . "INTEGER") ("name" . "VARCHAR") ("value" . "DECIMAL(3,2)"))

(duckdb-query-columns
 "https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet")

;; => ("year" "month" "day" "dep_time" "sched_dep_time" "dep_delay" "arr_time"
;;     "sched_arr_time" "arr_delay" "carrier" "flight" "tailnum" "origin" "dest"
;;     "air_time" "distance" "hour" "minute" "time_hour")

Database Persistence

Tip

New in 0.7.0: duckdb-query-with-database accepts an optional spec list for additional options. Both (duckdb-query-with-database "path" ...) and (duckdb-query-with-database '("path" :readonly t) ...) are supported.

For workflows requiring state across queries, bind a database file:

Code

Result

(duckdb-query-with-transient-database
  (duckdb-query
   "CREATE TABLE events (id INT, name TEXT)")
  (duckdb-query
   "INSERT INTO events VALUES
    (1, 'click'), (2, 'scroll')")
  (duckdb-query "SELECT * FROM events"))

(((id . 1) (name . "click"))
 ((id . 2) (name . "scroll")))

duckdb-query-with-transient-database creates a temporary file deleted after the block. Use duckdb-query-with-database for persistent files:

(duckdb-query-with-database '("analytics.duckdb")
  (duckdb-query "SELECT COUNT(*) FROM events"))

The spec list accepts keyword options:

;; Open read-only to prevent write conflicts
(duckdb-query-with-database '("/shared/data.db" :readonly t)
  (duckdb-query "SELECT * FROM reports"))

;; With variable path
(let ((db-path "~/project/app.db"))
  (duckdb-query-with-database `(,db-path)
    (duckdb-query "SELECT * FROM users")))

These macros work with CLI execution. For persistent processes with lower latency, see “Session-Based Execution” below.

Session-Based Execution

For repeated queries or workflows requiring persistent state, sessions eliminate DuckDB process spawn overhead (~10-20ms per query). A session maintains a long-running DuckDB process with its own temporary database:

;; Initialize sessions
(duckdb-query-session-start "work")
(duckdb-query-session-start "other-session")

(let ((result-set
       (list :work-results
             (duckdb-query-with-session "work"
               (duckdb-query "CREATE TABLE events AS SELECT i, random() AS value FROM range(1000) t(i)")
               (duckdb-query "SELECT AVG(value) as average FROM events"))
             :other-results
             (duckdb-query-with-session "other-session"
               (duckdb-query "CREATE TABLE logs AS SELECT i, random() AS value FROM range(10) t(i)")
               (duckdb-query "SELECT AVG(value) as average FROM logs"))
             :back-to-work
             ;; Session state persists across scopes
             (duckdb-query-with-session "work"
               (duckdb-query "CREATE TABLE new_table AS SELECT i, random() AS value FROM range(100) t(i)")
               (duckdb-query "SELECT AVG(value) as average FROM new_table")))))

  ;; Clean up when done
  (duckdb-query-session-kill "work")
  (duckdb-query-session-kill "other-session")
  result-set)

For temporary sessions that clean up automatically:

(duckdb-query-with-transient-session
  (duckdb-query "CREATE TABLE scratch AS SELECT random() as val FROM range(5)")
  (duckdb-query "SELECT AVG(val) as mean FROM scratch"))
;; => (((mean . 0.33628298988435995)))

Sessions provide 7-10x speedup for simple queries:

Executor	Simple Query	10k Rows
:cli	~22ms	~93ms
:session	~2ms	~20ms

Attaching External Databases

Attach external database files to a session for cross-database queries. duckdb-query-with-database works seamlessly inside session scope:

(duckdb-query-session-start "analysis")
(unwind-protect
    (duckdb-query-with-session "analysis"
      ;; Create local session table
      (duckdb-query "CREATE TABLE local_summary AS SELECT 'session' as src")

      ;; Attach external database - automatically becomes default catalog
      (duckdb-query-with-database '("/data/sales.db" :readonly t)
        ;; Unqualified names resolve to attached database
        (duckdb-query "SELECT * FROM orders LIMIT 5")))
  (duckdb-query-session-kill "analysis"))

The macro handles catalog context automatically:

ATTACHes the database with specified readonly mode
Executes USE alias.main to make it the default catalog
Restores previous catalog context after the block
DETACHes the database on exit

For manual control, use duckdb-query-session-attach and duckdb-query-session-detach:

(duckdb-query-session-start "work")
(unwind-protect
    (duckdb-query-with-session "work"
      (let ((alias (duckdb-query-session-attach "work" "/data/sales.db" nil t)))
        ;; Access with qualified names: alias.table_name
        (duckdb-query (format "SELECT COUNT(*) FROM %s.orders" alias))
        (duckdb-query-session-detach "work" alias)))
  (duckdb-query-session-kill "work"))

Transient Databases in Session Scope

duckdb-query-with-transient-database also adapts to session context:

(duckdb-query-with-session "work"
  (duckdb-query "CREATE TABLE permanent AS SELECT 'stays' as val")

  (duckdb-query-with-transient-database
    ;; Temp database attached and set as default catalog
    (duckdb-query "CREATE TABLE scratch AS SELECT 'gone' as val")
    (duckdb-query-value "SELECT val FROM scratch"))
  ;; => "gone"

  ;; Back to session context, scratch is gone
  (duckdb-query-value "SELECT val FROM permanent"))
;; => "stays"

Session Management

List active sessions:

(duckdb-query-session-list)
;; => (#s(duckdb-session :name "analytics" :status active :query-count 42 ...))

Interactive commands:

M-x duckdb-query-session-display-status - Show all sessions with statistics
M-x duckdb-query-session-kill-interactive - Kill session with completion
M-x duckdb-query-session-kill-all - Kill all sessions

Session initialization commands run when sessions start:

(setq duckdb-query-session-init-commands
      '("SET memory_limit = '4GB'"
        "INSTALL spatial"
        "LOAD spatial"))

Nested Types

DuckDB supports STRUCT, LIST, MAP, and ARRAY types. These serialize correctly as nested Elisp structures:

Code

Result

(duckdb-query
 "SELECT
    {'x': 1, 'y': 2}::STRUCT(x INT, y INT)
    as point")

(((point (x . 1) (y . 2))))

;; Nested data survives roundtrips
(let ((data
       '(((id . 1) (loc (x . 10) (y . 20)))
         ((id . 2) (loc (x . 30) (y . 40))))))
  (duckdb-query
   "SELECT id, loc.x + loc.y as sum
    FROM @data"
   :data data))

(((id . 1) (sum . 30))
 ((id . 2) (sum . 70)))

Performance

The package writes results to temporary files by default, providing 3-5x speedup on large results compared to streaming through stdout:

| Result Size | Default (file) | Streaming (pipe) | Speedup |
|-------------+----------------+------------------+---------|
| 10k rows    | 64ms           | 162ms            | 2.5x    |
| 100k rows   | 366ms          | 1.28s            | 3.5x    |
| 500k rows   | 1.32s          | 6.19s            | 4.7x    |

Force streaming mode with :output-via :pipe when needed. DDL statements automatically use streaming since they cannot be wrapped in file output.

Profile your specific queries with the benchmark module:

Code

Result

(duckdb-query-bench-output-strategies
 "SELECT * FROM range(10000) t(i)"
 :iterations 3)

,| strategy | mean  | min   | max   | n |
,|----------|-------|-------|-------|---|
,| :file    | 22ms  | 21ms  | 23ms  | 3 |
,| :pipe    | 28ms  | 27ms  | 29ms  | 3 |
,| :speedup | 1.27x |       |       |   |

Other benchmark functions: duckdb-query-bench-formats (compare output formats), duckdb-query-bench-nested-types (compare nested type handling), duckdb-query-bench-data-formats (compare :data serialization).

Acknowledgements

cl-duckdb by Ákos Kiss: The Common Lisp DuckDB wrapper that inspired this package’s design philosophy. The approach of treating query results as native Lisp data structures, the with-static-table pattern for querying Lisp data directly, and the with-transient-connection / with-default-connection macros for database context management all informed the design of duckdb-query’s :data parameter, @org: references, and duckdb-query-with-transient-database / duckdb-query-with-database forms.

Related Packages

ob-duckdb: Org Babel integration for DuckDB source blocks. SOON I’ll add most of the functionalities here to it via a specialized ob-duckdb-query executor, this will allow org-table reference in duckdb-sql source blocks and new output types.

Integration

Warning

UNDER CONSTRUCTION

As mentioned before, sessions allow low-latency, fast response times, going as low as 2ms on frequent, repeating small queries like batched updates/inserts and single row extraction. Add to this persistent session processes that can communicate between each other via database ATTACH~/~DETACH or external file output/input, and we have the foundation for database-backed major modes, worker pools, and orchestrated data pipelines within Emacs.

Buffer-Local Sessions

The duckdb-query-session-setup-buffer helper attaches a session to the current buffer. It creates the session if needed, registers the buffer as an owner, and sets up the buffer-local binding so all duckdb-query calls within that buffer automatically route to the session. The session persists while any owning buffer exists and is killed when the last owner closes:

(add-hook 'my-data-mode-hook
          (lambda ()
            (duckdb-query-session-setup-buffer "my-data-session")))
;; All my-data-mode buffers share one session
;; Session killed when last my-data-mode buffer closes

This pattern enables major modes that maintain persistent analytical state. A dashboard mode might cache expensive aggregations in session tables and refresh instantly on demand:

(define-derived-mode my-dashboard-mode special-mode "Dashboard"
  (duckdb-query-session-setup-buffer "dashboard")
  (setq-local revert-buffer-function #'my-dashboard-refresh))

(defun my-dashboard-refresh (&rest _)
  "Refresh dashboard from session's cached aggregations."
  (let ((data (duckdb-query "SELECT * FROM cached_metrics")))
    (my-dashboard-render data)))

Lifecycle Hooks

Three hooks provide visibility into session activity, enabling logging, metrics collection, and cross-session coordination:

;; Log all queries with timing
(add-hook 'duckdb-query-session-query-executed-functions
          (lambda (name query duration-ms)
            (message "[%s] %.1fms: %s"
                     name duration-ms
                     (truncate-string-to-width query 60))))

;; Track session lifecycle
(add-hook 'duckdb-query-session-created-functions
          (lambda (name _session)
            (message "Session started: %s" name)))

(add-hook 'duckdb-query-session-killed-functions
          (lambda (name)
            (message "Session ended: %s" name)))

These hooks will be essential for building a coordinator layer that manages worker sessions, tracks query performance across the system, and handles failover when sessions die unexpectedly.

Future Directions

This infrastructure opens up architectural patterns that go beyond single-query workflows. The ultimate goal is to enable the creation of data-centric emacs packages and tools, such as:

quickly retrieving tabular data from most filetypes or complex queries
perform tens to hundreds of CRUD operations per second
manage and orchestrate data processing jobs across multiple workers from a central coordinator
leverage graph queries, geo-spatial processing, vector similarity search and other techniques inside Emacs

Some directions still in design:

Database-backed buffers: Major modes where buffer content is backed by session tables, enabling SQL-powered search, filter, and transform operations that would be impractical with pure Elisp
Worker pools: Multiple sessions processing jobs in parallel, coordinated through a shared database via ATTACH or message-passing through elisp.
Live dashboards: Buffers that subscribe to query results and refresh automatically when underlying data changes

I’ll update this section as these experiments mature into usable packages.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.dir-locals.el		.dir-locals.el
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
duckdb-query-bench.el		duckdb-query-bench.el
duckdb-query.el		duckdb-query.el

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

duckdb-query.el - Query Anything, Get Elisp

Quick Examples

Contents

Querying Elisp Data

Querying Org Tables

Combining Sources

Output Formats

Schema Introspection

Database Persistence

Session-Based Execution

Attaching External Databases

Transient Databases in Session Scope

Session Management

Nested Types

Performance

Acknowledgements

Related Packages

Integration

Buffer-Local Sessions

Lifecycle Hooks

Future Directions

About

Uh oh!

Releases

Packages

Languages

License

gggion/duckdb-query.el

Folders and files

Latest commit

History

Repository files navigation

duckdb-query.el - Query Anything, Get Elisp

Quick Examples

Contents

Querying Elisp Data

Querying Org Tables

Combining Sources

Output Formats

Schema Introspection

Database Persistence

Session-Based Execution

Attaching External Databases

Transient Databases in Session Scope

Session Management

Nested Types

Performance

Acknowledgements

Related Packages

Integration

Buffer-Local Sessions

Lifecycle Hooks

Future Directions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages