Skip to content

MattOates/quack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Quack: DuckLake Local Deployment

CI

This repository provides a Docker Compose setup to run a local DuckLake lakehouse stack using PostgreSQL as the catalog database and MinIO as S3-compatible object storage. It leverages the DuckLake extensions for DuckDB to bootstrap or attach to an existing lakehouse automatically.

Table of Contents

Overview

DuckLake is an open‑source lakehouse solution built on top of DuckDB, providing ACID transactions, versioning, and metadata management via pluggable catalogs (e.g., PostgreSQL).

Our setup:

  • PostgreSQL: Stores the DuckLake catalog (metadata, transaction logs).
  • MinIO: Provides an S3‑compatible endpoint for Parquet data files.
  • ducklake-init: A Python uv script that:
    1. Waits for PostgreSQL & MinIO to be healthy
    2. Creates the S3 bucket if missing
    3. Runs DuckDB to ATTACH or initialize the lakehouse

On first run, DuckLake initializes the catalog tables in PostgreSQL and creates your data folder in MinIO. On subsequent runs, it detects existing metadata and simply re‑attaches to the lakehouse.

Architecture

flowchart TD
    subgraph Catalog[PostgreSQL Catalog]
      PG[(PostgreSQL)]
    end
    subgraph Storage[Object Storage]
      S3[(MinIO)]
    end
    subgraph Compute[DuckLake Init]
      Init[ducklake-init]
      DB[DuckDB]
    end

    Init -->|waits| PG
    Init -->|waits| S3
    Init -->|s3.create_bucket| S3
    Init -->|INSTALL ducklake; ATTACH ...| DB
    DB -->|catalog tables| PG
    DB -->|writes Parquet| S3
Loading

Prerequisites

  • Docker & Docker Compose v1.29+
  • Linux, macOS, or Windows with WSL2

Getting Started

  1. Clone the repo

    git clone [email protected]:MattOates/quack.git
    cd quack
  2. Launch the stack

    docker-compose up -d
  3. Connect to DuckDB

    docker-compose exec ducklake-init duckdb
  4. Verify the lakehouse

    INSTALL ducklake;
    INSTALL postgres;
    ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=postgres user=ducklake password=ducklake' AS the_ducklake (DATA_PATH 's3://ducklake/lake/');
    USE the_ducklake;
    SELECT * FROM ducklake.schema;

Configuration

All credentials and endpoints are controlled via environment variables in docker-compose.yml:

Variable Default Purpose
POSTGRES_USER ducklake PostgreSQL catalog user
POSTGRES_PASSWORD ducklake PostgreSQL catalog password
POSTGRES_DB ducklake_catalog PostgreSQL database name
AWS_ACCESS_KEY_ID minioadmin MinIO access key
AWS_SECRET_ACCESS_KEY minioadmin MinIO secret key
AWS_ENDPOINT_URL http://minio:9000 S3 endpoint
BUCKET ducklake S3 bucket for Parquet files

Usage

  • Fresh Lake Initialization: The ATTACH command will auto‑bootstrap if no catalog tables are found.
  • Re‑attach to Existing Lake: Restarting ducklake-init will reconnect to your existing metadata and data without data loss.

For advanced options, see the DuckLake documentation for details on DataFrame APIs, versioning, and catalog backends.

References

About

Docker Compose setup to run a local DuckLake lakehouse stack

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published