Skip to content

DuckDB SQL query results as native Elisp data structures

License

Notifications You must be signed in to change notification settings

gggion/duckdb-query.el

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

duckdb-query.el - Query Anything, Get Elisp

Query remote Parquet files, local CSVs, org-mode tables, and Elisp data structures with SQL. Get results as native Elisp values. One function call, no intermediate files, no external scripts.

Query Remote Data, Filter with Elisp, Get Native Structures

Query joining 336k-row remote Parquet with Elisp alist:

(let ((carriers '(((code . "UA") (name . "United Airlines"))
                  ((code . "AA") (name . "American Airlines"))
                  ((code . "DL") (name . "Delta Air Lines")))))
  (duckdb-query
   "SELECT c.name as airline,
           {'flights': COUNT(*),
            'avg_delay': ROUND(AVG(f.arr_delay), 1)} as stats
    FROM 'https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet' f
    JOIN @carriers c ON f.carrier = c.code
    GROUP BY c.name
    ORDER BY COUNT(*) DESC"
   :data `((carriers . ,carriers))
   :format :columnar))

Result (columnar format with nested structs):

((airline . ["United Airlines" "Delta Air Lines" "American Airlines"])
 (stats . [((flights . 58665) (avg_delay . 3.6))
           ((flights . 48110) (avg_delay . 1.6))
           ((flights . 32729) (avg_delay . 0.4))]))

One function call: remote Parquet + Elisp filter → nested Elisp structures

DuckDB is an embedded analytical database that reads Parquet, CSV, and JSON directly from local paths, HTTP URLs, or cloud storage. This package makes DuckDB’s capabilities available to Elisp programs. The duckdb-query function executes SQL and returns alists, plists, vectors, or org-table-compatible lists. Results are ready for mapcar, seq-filter, cl-loop, or direct insertion into buffers.

The package treats all data sources uniformly. Elisp alists become SQL tables via the :data parameter. Named org tables in your buffer resolve through @org:table-name syntax. These combine freely with file paths and URLs in joins, unions, and subqueries. A query can join a remote Parquet file with a ten-row org table and a runtime Elisp variable, returning the result as an alist you can immediately process.

Quick Examples

Code Result
;; List of alists (default)
(duckdb-query
 "SELECT 'Alice' as name, 95 as score
  UNION ALL SELECT 'Bob', 87")
(((name . "Alice") (score . 95))
 ((name . "Bob") (score . 87)))
;; Org-table format for display
(duckdb-query
 "SELECT * FROM range(1,4) t(n),
   (SELECT unnest(['a','b']) as letter)"
 :format :org-table)
,  rendered       data
,| n | letter | ((n letter)
,|---|--------|  (1 "a") (2 "a")
,| 1 | a      |  (3 "a") (1 "b")
,| 1 | b      |  (2 "b") (3 "b"))
,| 2 | a      |
,| 2 | b      |
,| 3 | a      |
,| 3 | b      |
;; Single value extraction
(duckdb-query-value
 "SELECT COUNT(*) FROM range(1000)")
1000
;; Single column as list
(duckdb-query-column
 "SELECT unnest(
 ['red','green','blue']) as color")
("red" "green" "blue")
;; Columnar format for analysis
(duckdb-query
 "SELECT * FROM range(5) t(n)"
 :format :columnar)
((n . [0 1 2 3 4]))

Contents

Querying Elisp Data

Pass Elisp alists to queries via the :data parameter. The data becomes a table named @data:

Code Result
(duckdb-query
 "SELECT name, score * 1.1 as curved
  FROM @data
  WHERE score > 85"
 :data '(((name . "Aly") (score . 95))
         ((name . "Bob") (score . 87))
         ((name . "Carol") (score . 72))))
(((name . "Aly") (curved . 104.5))
 ((name . "Bob") (curved . 95.7)))

For multiple tables, use named bindings with backquote to capture lexical variables:

Code Result
 (let ((users ;; users table
       '(((id . 1) (name . "Alice"))
         ((id . 2) (name . "Bob"))))
      (orders ;; orders table
       '(((user_id . 1) (item . "Widget"))
         ((user_id . 2) (item . "Gadget"))
         ((user_id . 1) (item . "Gizmo")))))
  (duckdb-query
   "SELECT u.name, COUNT(*) as count
    FROM @users u
    JOIN @orders o ON u.id = o.user_id
    GROUP BY u.name"
   :data `((users . ,users)
           (orders . ,orders))))
(((name . "Alice") (count . 2))
 ((name . "Bob") (count . 1)))

This enables joining runtime Elisp state with static file data. Previously this required exporting to CSV, running an external query, and parsing results. Now it is one expression.

Querying Org Tables

Reference named org tables with @org:table-name. Given this table in your buffer:

Table
,#+NAME: inventory
,| sku   | product | stock | price |
,|-------+---------+-------+-------|
,| A-100 | Widget  |    50 |  9.99 |
,| B-200 | Gadget  |    30 | 24.99 |
,| C-300 | Gizmo   |    75 | 14.99 |
Code Result
(duckdb-query
 "SELECT product,
    stock * price as value
  FROM @org:inventory
  ORDER BY value DESC"
 :format :org-table)
,| product |   value | ((product value)
,|---------+---------|  ("Gizmo" 1124.25)
,| Gizmo   | 1124.25 |  ("Gadget" 749.7)
,| Gadget  |   749.7 |  ("Widget" 499.5))
,| Widget  |   499.5 |

Reference tables in other files with @org:path/to/file.org:table-name.

Combining Sources

Join an org table with Elisp data:

Code Result
(let ((sales '(((sku . "A-100") (qty . 10))
               ((sku . "A-100") (qty . 5))
               ((sku . "C-300") (qty . 20)))))
  (duckdb-query
   "SELECT i.product,
           SUM(s.qty) as sold,
           i.stock - SUM(s.qty) as remaining
    FROM @org:inventory i
    JOIN @sales s ON i.sku = s.sku
    GROUP BY i.product, i.stock"
   :data `((sales . ,sales))))
(((product . "Widget")
  (sold . 15)
  (remaining . 35))
 ((product . "Gizmo")
  (sold . 20)
  (remaining . 55)))

Join a remote Parquet file with local Elisp data:

 (let ((airports '(((code . "JFK")
                   (name . "JFK International"))
                  ((code . "LGA")
                   (name . "LaGuardia")))))
  (duckdb-query
   "SELECT a.name,
           COUNT(*) as departures,
           ROUND(AVG(f.dep_delay), 1) as avg_delay
    FROM 'https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet' as f
    JOIN @airports a ON f.origin = a.code
    GROUP BY a.name
    ORDER BY departures DESC"
   :data `((airports . ,airports))))

;; => (((name . "JFK International") (departures . 111279) (avg_delay . 12.1))
;;     ((name . "LaGuardia") (departures . 104662) (avg_delay . 10.3)))

Output Formats

The :format parameter controls result structure:

FormatStructureUse Case
:alistList of alistsGeneral Elisp processing
:plistList of plistsKeyword-based access
:hashList of hash-tablesO(1) field lookup
:vectorVector of alistsIndex-based access
:columnarAlist of column vectorsNumerical analysis
:org-tableList of lists with headerDisplay, org-mode integration
:rawJSON stringCustom parsing

Schema Introspection

Discover structure before querying:
(duckdb-query-column-types "SELECT 1 as id, 'text' as name, 3.14 as value")
;; => (("id" . "INTEGER") ("name" . "VARCHAR") ("value" . "DECIMAL(3,2)"))
(duckdb-query-columns
 "https://github.com/rfsaldanha/releases/releases/download/v1/flights.parquet")

;; => ("year" "month" "day" "dep_time" "sched_dep_time" "dep_delay" "arr_time"
;;     "sched_arr_time" "arr_delay" "carrier" "flight" "tailnum" "origin" "dest"
;;     "air_time" "distance" "hour" "minute" "time_hour")

Database Persistence

Tip

New in 0.7.0: duckdb-query-with-database accepts an optional spec list for additional options. Both (duckdb-query-with-database "path" ...) and (duckdb-query-with-database '("path" :readonly t) ...) are supported.

For workflows requiring state across queries, bind a database file:

Code Result
(duckdb-query-with-transient-database
  (duckdb-query
   "CREATE TABLE events (id INT, name TEXT)")
  (duckdb-query
   "INSERT INTO events VALUES
    (1, 'click'), (2, 'scroll')")
  (duckdb-query "SELECT * FROM events"))
(((id . 1) (name . "click"))
 ((id . 2) (name . "scroll")))

duckdb-query-with-transient-database creates a temporary file deleted after the block. Use duckdb-query-with-database for persistent files:

(duckdb-query-with-database '("analytics.duckdb")
  (duckdb-query "SELECT COUNT(*) FROM events"))

The spec list accepts keyword options:

;; Open read-only to prevent write conflicts
(duckdb-query-with-database '("/shared/data.db" :readonly t)
  (duckdb-query "SELECT * FROM reports"))

;; With variable path
(let ((db-path "~/project/app.db"))
  (duckdb-query-with-database `(,db-path)
    (duckdb-query "SELECT * FROM users")))

These macros work with CLI execution. For persistent processes with lower latency, see “Session-Based Execution” below.

Session-Based Execution

For repeated queries or workflows requiring persistent state, sessions eliminate DuckDB process spawn overhead (~10-20ms per query). A session maintains a long-running DuckDB process with its own temporary database:

;; Initialize sessions
(duckdb-query-session-start "work")
(duckdb-query-session-start "other-session")

(let ((result-set
       (list :work-results
             (duckdb-query-with-session "work"
               (duckdb-query "CREATE TABLE events AS SELECT i, random() AS value FROM range(1000) t(i)")
               (duckdb-query "SELECT AVG(value) as average FROM events"))
             :other-results
             (duckdb-query-with-session "other-session"
               (duckdb-query "CREATE TABLE logs AS SELECT i, random() AS value FROM range(10) t(i)")
               (duckdb-query "SELECT AVG(value) as average FROM logs"))
             :back-to-work
             ;; Session state persists across scopes
             (duckdb-query-with-session "work"
               (duckdb-query "CREATE TABLE new_table AS SELECT i, random() AS value FROM range(100) t(i)")
               (duckdb-query "SELECT AVG(value) as average FROM new_table")))))

  ;; Clean up when done
  (duckdb-query-session-kill "work")
  (duckdb-query-session-kill "other-session")
  result-set)

For temporary sessions that clean up automatically:

(duckdb-query-with-transient-session
  (duckdb-query "CREATE TABLE scratch AS SELECT random() as val FROM range(5)")
  (duckdb-query "SELECT AVG(val) as mean FROM scratch"))
;; => (((mean . 0.33628298988435995)))

Sessions provide 7-10x speedup for simple queries:

ExecutorSimple Query10k Rows
:cli~22ms~93ms
:session~2ms~20ms

Attaching External Databases

Attach external database files to a session for cross-database queries. duckdb-query-with-database works seamlessly inside session scope:

(duckdb-query-session-start "analysis")
(unwind-protect
    (duckdb-query-with-session "analysis"
      ;; Create local session table
      (duckdb-query "CREATE TABLE local_summary AS SELECT 'session' as src")

      ;; Attach external database - automatically becomes default catalog
      (duckdb-query-with-database '("/data/sales.db" :readonly t)
        ;; Unqualified names resolve to attached database
        (duckdb-query "SELECT * FROM orders LIMIT 5")))
  (duckdb-query-session-kill "analysis"))

The macro handles catalog context automatically:

  • ATTACHes the database with specified readonly mode
  • Executes USE alias.main to make it the default catalog
  • Restores previous catalog context after the block
  • DETACHes the database on exit

For manual control, use duckdb-query-session-attach and duckdb-query-session-detach:

(duckdb-query-session-start "work")
(unwind-protect
    (duckdb-query-with-session "work"
      (let ((alias (duckdb-query-session-attach "work" "/data/sales.db" nil t)))
        ;; Access with qualified names: alias.table_name
        (duckdb-query (format "SELECT COUNT(*) FROM %s.orders" alias))
        (duckdb-query-session-detach "work" alias)))
  (duckdb-query-session-kill "work"))

Transient Databases in Session Scope

duckdb-query-with-transient-database also adapts to session context:

(duckdb-query-with-session "work"
  (duckdb-query "CREATE TABLE permanent AS SELECT 'stays' as val")

  (duckdb-query-with-transient-database
    ;; Temp database attached and set as default catalog
    (duckdb-query "CREATE TABLE scratch AS SELECT 'gone' as val")
    (duckdb-query-value "SELECT val FROM scratch"))
  ;; => "gone"

  ;; Back to session context, scratch is gone
  (duckdb-query-value "SELECT val FROM permanent"))
;; => "stays"

Session Management

List active sessions:

(duckdb-query-session-list)
;; => (#s(duckdb-session :name "analytics" :status active :query-count 42 ...))

Interactive commands:

  • M-x duckdb-query-session-display-status - Show all sessions with statistics
  • M-x duckdb-query-session-kill-interactive - Kill session with completion
  • M-x duckdb-query-session-kill-all - Kill all sessions

Session initialization commands run when sessions start:

(setq duckdb-query-session-init-commands
      '("SET memory_limit = '4GB'"
        "INSTALL spatial"
        "LOAD spatial"))

Nested Types

DuckDB supports STRUCT, LIST, MAP, and ARRAY types. These serialize correctly as nested Elisp structures:

Code Result
(duckdb-query
 "SELECT
    {'x': 1, 'y': 2}::STRUCT(x INT, y INT)
    as point")
(((point (x . 1) (y . 2))))
;; Nested data survives roundtrips
(let ((data
       '(((id . 1) (loc (x . 10) (y . 20)))
         ((id . 2) (loc (x . 30) (y . 40))))))
  (duckdb-query
   "SELECT id, loc.x + loc.y as sum
    FROM @data"
   :data data))
(((id . 1) (sum . 30))
 ((id . 2) (sum . 70)))

Performance

The package writes results to temporary files by default, providing 3-5x speedup on large results compared to streaming through stdout:

| Result Size | Default (file) | Streaming (pipe) | Speedup |
|-------------+----------------+------------------+---------|
| 10k rows    | 64ms           | 162ms            | 2.5x    |
| 100k rows   | 366ms          | 1.28s            | 3.5x    |
| 500k rows   | 1.32s          | 6.19s            | 4.7x    |

Force streaming mode with :output-via :pipe when needed. DDL statements automatically use streaming since they cannot be wrapped in file output.

Profile your specific queries with the benchmark module:

Code Result
(duckdb-query-bench-output-strategies
 "SELECT * FROM range(10000) t(i)"
 :iterations 3)
,| strategy | mean  | min   | max   | n |
,|----------|-------|-------|-------|---|
,| :file    | 22ms  | 21ms  | 23ms  | 3 |
,| :pipe    | 28ms  | 27ms  | 29ms  | 3 |
,| :speedup | 1.27x |       |       |   |

Other benchmark functions: duckdb-query-bench-formats (compare output formats), duckdb-query-bench-nested-types (compare nested type handling), duckdb-query-bench-data-formats (compare :data serialization).

Acknowledgements

  • cl-duckdb by Ákos Kiss: The Common Lisp DuckDB wrapper that inspired this package’s design philosophy. The approach of treating query results as native Lisp data structures, the with-static-table pattern for querying Lisp data directly, and the with-transient-connection / with-default-connection macros for database context management all informed the design of duckdb-query’s :data parameter, @org: references, and duckdb-query-with-transient-database / duckdb-query-with-database forms.

Related Packages

  • ob-duckdb: Org Babel integration for DuckDB source blocks. SOON I’ll add most of the functionalities here to it via a specialized ob-duckdb-query executor, this will allow org-table reference in duckdb-sql source blocks and new output types.

Integration

Warning

UNDER CONSTRUCTION

As mentioned before, sessions allow low-latency, fast response times, going as low as 2ms on frequent, repeating small queries like batched updates/inserts and single row extraction. Add to this persistent session processes that can communicate between each other via database ATTACH~/~DETACH or external file output/input, and we have the foundation for database-backed major modes, worker pools, and orchestrated data pipelines within Emacs.

Buffer-Local Sessions

The duckdb-query-session-setup-buffer helper attaches a session to the current buffer. It creates the session if needed, registers the buffer as an owner, and sets up the buffer-local binding so all duckdb-query calls within that buffer automatically route to the session. The session persists while any owning buffer exists and is killed when the last owner closes:

(add-hook 'my-data-mode-hook
          (lambda ()
            (duckdb-query-session-setup-buffer "my-data-session")))
;; All my-data-mode buffers share one session
;; Session killed when last my-data-mode buffer closes

This pattern enables major modes that maintain persistent analytical state. A dashboard mode might cache expensive aggregations in session tables and refresh instantly on demand:

(define-derived-mode my-dashboard-mode special-mode "Dashboard"
  (duckdb-query-session-setup-buffer "dashboard")
  (setq-local revert-buffer-function #'my-dashboard-refresh))

(defun my-dashboard-refresh (&rest _)
  "Refresh dashboard from session's cached aggregations."
  (let ((data (duckdb-query "SELECT * FROM cached_metrics")))
    (my-dashboard-render data)))

Lifecycle Hooks

Three hooks provide visibility into session activity, enabling logging, metrics collection, and cross-session coordination:

;; Log all queries with timing
(add-hook 'duckdb-query-session-query-executed-functions
          (lambda (name query duration-ms)
            (message "[%s] %.1fms: %s"
                     name duration-ms
                     (truncate-string-to-width query 60))))

;; Track session lifecycle
(add-hook 'duckdb-query-session-created-functions
          (lambda (name _session)
            (message "Session started: %s" name)))

(add-hook 'duckdb-query-session-killed-functions
          (lambda (name)
            (message "Session ended: %s" name)))

These hooks will be essential for building a coordinator layer that manages worker sessions, tracks query performance across the system, and handles failover when sessions die unexpectedly.

Future Directions

This infrastructure opens up architectural patterns that go beyond single-query workflows. The ultimate goal is to enable the creation of data-centric emacs packages and tools, such as:

  • quickly retrieving tabular data from most filetypes or complex queries
  • perform tens to hundreds of CRUD operations per second
  • manage and orchestrate data processing jobs across multiple workers from a central coordinator
  • leverage graph queries, geo-spatial processing, vector similarity search and other techniques inside Emacs

Some directions still in design:

  • Database-backed buffers: Major modes where buffer content is backed by session tables, enabling SQL-powered search, filter, and transform operations that would be impractical with pure Elisp
  • Worker pools: Multiple sessions processing jobs in parallel, coordinated through a shared database via ATTACH or message-passing through elisp.
  • Live dashboards: Buffers that subscribe to query results and refresh automatically when underlying data changes

I’ll update this section as these experiments mature into usable packages.

About

DuckDB SQL query results as native Elisp data structures

Resources

License

Stars

Watchers

Forks

Packages

No packages published