This repository provides a Docker Compose setup to run a local DuckLake lakehouse stack using PostgreSQL as the catalog database and MinIO as S3-compatible object storage. It leverages the DuckLake extensions for DuckDB to bootstrap or attach to an existing lakehouse automatically.
DuckLake is an open‑source lakehouse solution built on top of DuckDB, providing ACID transactions, versioning, and metadata management via pluggable catalogs (e.g., PostgreSQL).
Our setup:
- PostgreSQL: Stores the DuckLake catalog (metadata, transaction logs).
- MinIO: Provides an S3‑compatible endpoint for Parquet data files.
- ducklake-init: A Python
uvscript that:- Waits for PostgreSQL & MinIO to be healthy
- Creates the S3 bucket if missing
- Runs DuckDB to
ATTACHor initialize the lakehouse
On first run, DuckLake initializes the catalog tables in PostgreSQL and creates your data folder in MinIO. On subsequent runs, it detects existing metadata and simply re‑attaches to the lakehouse.
flowchart TD
subgraph Catalog[PostgreSQL Catalog]
PG[(PostgreSQL)]
end
subgraph Storage[Object Storage]
S3[(MinIO)]
end
subgraph Compute[DuckLake Init]
Init[ducklake-init]
DB[DuckDB]
end
Init -->|waits| PG
Init -->|waits| S3
Init -->|s3.create_bucket| S3
Init -->|INSTALL ducklake; ATTACH ...| DB
DB -->|catalog tables| PG
DB -->|writes Parquet| S3
- Docker & Docker Compose v1.29+
- Linux, macOS, or Windows with WSL2
-
Clone the repo
git clone [email protected]:MattOates/quack.git cd quack
-
Launch the stack
docker-compose up -d
-
Connect to DuckDB
docker-compose exec ducklake-init duckdb -
Verify the lakehouse
INSTALL ducklake; INSTALL postgres; ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=postgres user=ducklake password=ducklake' AS the_ducklake (DATA_PATH 's3://ducklake/lake/'); USE the_ducklake; SELECT * FROM ducklake.schema;
All credentials and endpoints are controlled via environment variables in docker-compose.yml:
| Variable | Default | Purpose |
|---|---|---|
| POSTGRES_USER | ducklake | PostgreSQL catalog user |
| POSTGRES_PASSWORD | ducklake | PostgreSQL catalog password |
| POSTGRES_DB | ducklake_catalog | PostgreSQL database name |
| AWS_ACCESS_KEY_ID | minioadmin | MinIO access key |
| AWS_SECRET_ACCESS_KEY | minioadmin | MinIO secret key |
| AWS_ENDPOINT_URL | http://minio:9000 | S3 endpoint |
| BUCKET | ducklake | S3 bucket for Parquet files |
- Fresh Lake Initialization: The
ATTACHcommand will auto‑bootstrap if no catalog tables are found. - Re‑attach to Existing Lake: Restarting
ducklake-initwill reconnect to your existing metadata and data without data loss.
For advanced options, see the DuckLake documentation for details on DataFrame APIs, versioning, and catalog backends.
- DuckLake Stable Docs: https://ducklake.select/docs/stable/
- DuckDB Extensions Guide: https://duckdb.org/docs/extensions/ducklake