π One command to rule them all:
make dev-setup
A batteries-included Docker Compose setup for running a complete DuckLake lakehouse locally. Get PostgreSQL catalog + MinIO object storage + DuckDB with zero configuration fuss.
- π Complete Lakehouse Stack - PostgreSQL catalog + MinIO S3 storage + DuckDB compute
 - β‘ Zero Configuration - Everything works out of the box with sensible defaults
 - π οΈ Easy Makefile Interface - Simple commands for all operations
 - π ACID Transactions - Full lakehouse capabilities with versioning
 - π Demo Data Included - Pre-loaded with 44k+ gene records for testing
 - π³ Docker Everything - No local dependencies except Docker
 - πΎ Data Persistence - Your data survives container restarts
 
# Clone and start everything
git clone https://github.com/MattOates/quack.git
cd quack
make demo
# Or for development setup
make dev-setup
# Connect and start querying
make shellThat's it! You now have a fully functional lakehouse running locally.
- Architecture
 - Prerequisites
 - Installation
 - Usage
 - Makefile Commands
 - Configuration
 - Examples
 - Troubleshooting
 - References
 
DuckLake is an open-source lakehouse solution built on DuckDB, providing ACID transactions, versioning, and metadata management via pluggable catalogs.
flowchart TD
    subgraph "ποΈ Catalog Layer"
      PG[(PostgreSQL<br/>Metadata & Transactions)]
    end
    subgraph "πΎ Storage Layer"
      S3[(MinIO<br/>S3-Compatible Object Store)]
    end
    subgraph "β‘ Compute Layer"
      Init[ducklake-init<br/>Orchestrator]
      DB[DuckDB<br/>Query Engine]
    end
    subgraph "π οΈ Interface"
      Make[Makefile<br/>Easy Commands]
      Shell[Interactive Shell]
    end
    Make -->|make shell| Shell
    Make -->|make up| Init
    Init -->|health checks| PG
    Init -->|health checks| S3
    Init -->|creates bucket| S3
    Init -->|ATTACH lakehouse| DB
    DB -->|metadata| PG
    DB -->|data files| S3
    Shell -->|queries| DB
    - ποΈ PostgreSQL: Stores lakehouse metadata, transaction logs, and schema information
 - πΎ MinIO: S3-compatible object storage for Parquet data files
 - β‘ DuckDB: High-performance analytical query engine with lakehouse extensions
 - π οΈ ducklake-init: Python orchestrator that configures and initializes everything
 
- Docker Desktop or Docker + Docker Compose v2.0+
 - 4GB+ RAM recommended
 - macOS, Linux, or Windows with WSL2
 
git clone https://github.com/MattOates/quack.git
cd quack
make demoThis will:
- Build all Docker images
 - Start PostgreSQL + MinIO + DuckDB
 - Load 44k+ gene records for testing
 - Show you the results
 
git clone https://github.com/MattOates/quack.git
cd quack
make dev-setupThis will build and start everything without demo data.
git clone https://github.com/MattOates/quack.git
cd quack
make build    # Build Docker images
make up       # Start services
make shell    # Connect to DuckDB# Start everything
make up
# Connect to DuckDB shell (with DuckLake pre-attached)
make shell
# Check service health
make health
# View logs
make logs
# Stop everything
make downOnce connected via make shell, DuckLake is automatically attached as the_ducklake:
-- Your lakehouse is ready to use!
USE the_ducklake;
-- Create a table from remote data
CREATE TABLE my_data AS 
SELECT * FROM read_csv_auto('https://example.com/data.csv');
-- Query with full SQL support
SELECT COUNT(*) FROM my_data;
-- DuckLake handles ACID transactions automatically
INSERT INTO my_data VALUES ('new', 'row');| Command | Description | 
|---|---|
make help | 
Show all available commands | 
make dev-setup | 
Complete development setup (build + start) | 
make demo | 
Run demo with sample gene data | 
make shell | 
Connect to DuckDB with DuckLake attached | 
| Command | Description | 
|---|---|
make build | 
Build all Docker images (clean) | 
make build-quick | 
Build using cache | 
make up | 
Start all services | 
make down | 
Stop all services | 
make restart | 
Restart everything | 
| Command | Description | 
|---|---|
make health | 
Check service health | 
make status | 
Show service status | 
make logs | 
View all logs | 
make logs-init | 
View DuckLake init logs | 
make test-connection | 
Test DuckLake connection | 
| Command | Description | 
|---|---|
make shell | 
DuckDB shell | 
make psql | 
PostgreSQL shell | 
make minio-console | 
Open MinIO web console | 
make info | 
Show connection details | 
| Command | Description | 
|---|---|
make backup-data | 
Create timestamped backup | 
make restore-data BACKUP_FILE=backup.tar.gz | 
Restore from backup | 
make clean-data | 
Remove all data (with confirmation) | 
make reset | 
Complete reset (stop, clean, rebuild, start) | 
| Command | Description | 
|---|---|
make pull | 
Pull latest Docker images | 
make prune | 
Clean up Docker resources | 
All service configuration is controlled via environment variables in docker-compose.yml. The defaults work out of the box, but you can customize as needed:
| Variable | Default | Purpose | 
|---|---|---|
POSTGRES_USER | 
ducklake | 
PostgreSQL catalog username | 
POSTGRES_PASSWORD | 
ducklake | 
PostgreSQL catalog password | 
POSTGRES_DB | 
ducklake_catalog | 
PostgreSQL database name | 
AWS_ACCESS_KEY_ID | 
minioadmin | 
MinIO access key | 
AWS_SECRET_ACCESS_KEY | 
minioadmin | 
MinIO secret key | 
AWS_ENDPOINT_URL | 
http://minio:9000 | 
S3 endpoint URL | 
BUCKET | 
ducklake | 
S3 bucket for data files | 
| Service | URL | Credentials | 
|---|---|---|
| MinIO Console | http://localhost:9000 | admin/minioadmin | 
| PostgreSQL | localhost:5432 | ducklake/ducklake | 
| DuckDB Shell | make shell | 
Pre-configured | 
-- Connect via: make shell
-- Load remote CSV data into your lakehouse
CREATE TABLE companies AS
SELECT * FROM read_csv_auto(
    'https://example.com/companies.csv',
    HEADER => TRUE
);
-- Query with full analytical SQL
SELECT 
    industry, 
    COUNT(*) as company_count,
    AVG(revenue) as avg_revenue
FROM companies 
GROUP BY industry 
ORDER BY avg_revenue DESC;-- Load Parquet files from S3/MinIO
CREATE TABLE events AS
SELECT * FROM read_parquet('s3://ducklake/raw/events/*.parquet');
-- Transform and store back to lakehouse
CREATE TABLE daily_summary AS
SELECT 
    DATE(timestamp) as date,
    event_type,
    COUNT(*) as event_count
FROM events
GROUP BY DATE(timestamp), event_type;-- DuckLake supports versioning and time travel
SELECT * FROM my_table VERSION AS OF '2024-01-01 10:00:00';
-- View table history
SELECT * FROM table_history('my_table');If you want to connect to the lakehouse from an external DuckDB client (outside the container), use this configuration:
-- Install required extensions
INSTALL ducklake;
INSTALL postgres;
INSTALL httpfs;
-- Configure S3 settings for MinIO
SET s3_url_style = 'path';
SET s3_endpoint = 'localhost:9000';  -- Note: localhost, not minio
SET s3_access_key_id = 'minioadmin';
SET s3_secret_access_key = 'minioadmin';
SET s3_region = 'us-east-1';
SET s3_use_ssl = false;
-- Attach to your lakehouse
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=localhost user=ducklake password=ducklake'
AS my_lakehouse (DATA_PATH 's3://ducklake/lake/');
-- Now you can use it
USE my_lakehouse;
SHOW TABLES;You can also connect to the_ducklake from your own Python applications. Here's a complete example:
uv add --prerelease=allow duckdb boto3 psycopg2-binaryor
pip install --pre duckdb boto3 psycopg2-binary#!/usr/bin/env -S uv run --script --prerelease=allow
#
# /// script
# requires-python = ">=3.13.5"
# dependencies = [
#   "boto3",
#   "psycopg",
#   "duckdb",
# ]
# ///
import duckdb
import os
def connect_to_ducklake():
    """Connect to the DuckLake lakehouse from Python."""
    
    # Create DuckDB connection
    con = duckdb.connect()
    
    # Install required extensions
    con.install_extension("ducklake")
    con.install_extension("postgres")   
    con.install_extension("httpfs")
    
    # Configure S3/MinIO settings
    s3_config = {
        "s3_url_style": "path",
        "s3_endpoint": "localhost:9000",  # External connection
        "s3_access_key_id": "minioadmin",
        "s3_secret_access_key": "minioadmin", 
        "s3_region": "us-east-1",
        "s3_use_ssl": "false"
    }
    
    for key, value in s3_config.items():
        con.execute(f"SET {key}='{value}';")
    
    # Attach to DuckLake
    attach_sql = """
    ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=localhost user=ducklake password=ducklake'
    AS the_ducklake (DATA_PATH 's3://ducklake/lake/');
    """
    con.execute(attach_sql)
    con.execute("USE the_ducklake;")
    
    return con
def main():
    """Example usage of DuckLake connection."""
    
    # Make sure your quack stack is running first!
    # cd /path/to/quack && make up
    
    con = connect_to_ducklake()
    
    # Example 1: List available tables
    tables = con.execute("SHOW TABLES;").fetchall()
    print("Available tables:", tables)
    
    # Example 2: Create a sample table
    con.execute("""
        CREATE OR REPLACE TABLE python_example AS
        SELECT 
            'hello' as greeting,
            'world' as target,
            42 as meaning_of_life,
            current_timestamp as created_at
    """)
    
    # Example 3: Query the data
    result = con.execute("SELECT * FROM python_example;").fetchall()
    print("Sample data:", result)
    
    # Example 4: Load data from a CSV
    con.execute("""
        CREATE OR REPLACE TABLE sample_data AS
        SELECT 
            row_number() OVER () as id,
            'user_' || row_number() OVER () as username,
            random() * 100 as score
        FROM generate_series(1, 1000)
    """)
    
    # Example 5: Analytical query
    stats = con.execute("""
        SELECT 
            COUNT(*) as total_records,
            AVG(score) as avg_score,
            MIN(score) as min_score,
            MAX(score) as max_score
        FROM sample_data
    """).fetchone()
    
    print(f"Dataset stats: {stats}")
    
    # Example 6: Work with time series data
    con.execute("""
        CREATE OR REPLACE TABLE time_series AS
        SELECT 
            (current_date - INTERVAL (row_number() OVER ()) DAY) as date,
            random() * 1000 as value
        FROM generate_series(1, 365)
    """)
    
    monthly_summary = con.execute("""
        SELECT 
            date_trunc('month', date) as month,
            COUNT(*) as records,
            AVG(value) as avg_value,
            SUM(value) as total_value
        FROM time_series
        GROUP BY date_trunc('month', date)
        ORDER BY month
    """).fetchall()
    
    print("Monthly summary:", monthly_summary[:3])  # Show first 3 months
    
    # Don't forget to close the connection
    con.close()
    print("β
 DuckLake connection example completed!")
if __name__ == "__main__":
    main()For production applications, consider using connection pooling and environment variables:
import os
from contextlib import contextmanager
@contextmanager
def ducklake_connection():
    """Context manager for DuckLake connections."""
    con = None
    try:
        con = duckdb.connect()
        
        # Install extensions
        for ext in ["ducklake", "postgres", "httpfs"]:
            con.execute(f"INSTALL {ext};")
        
        # Configure from environment
        s3_config = {
            "s3_url_style": "path",
            "s3_endpoint": os.getenv("DUCKLAKE_S3_ENDPOINT", "localhost:9000"),
            "s3_access_key_id": os.getenv("DUCKLAKE_S3_ACCESS_KEY", "minioadmin"),
            "s3_secret_access_key": os.getenv("DUCKLAKE_S3_SECRET_KEY", "minioadmin"),
            "s3_region": os.getenv("DUCKLAKE_S3_REGION", "us-east-1"),
            "s3_use_ssl": "false"
        }
        
        for key, value in s3_config.items():
            con.execute(f"SET {key}='{value}';")
        
        # Connection details from environment
        pg_host = os.getenv("DUCKLAKE_PG_HOST", "localhost")
        pg_user = os.getenv("DUCKLAKE_PG_USER", "ducklake")
        pg_pass = os.getenv("DUCKLAKE_PG_PASSWORD", "ducklake") 
        pg_db = os.getenv("DUCKLAKE_PG_DB", "ducklake_catalog")
        bucket = os.getenv("DUCKLAKE_BUCKET", "ducklake")
        
        attach_sql = f"""
        ATTACH 'ducklake:postgres:dbname={pg_db} host={pg_host} user={pg_user} password={pg_pass}'
        AS the_ducklake (DATA_PATH 's3://{bucket}/lake/');
        """
        con.execute(attach_sql)
        con.execute("USE the_ducklake;")
        
        yield con
        
    finally:
        if con:
            con.close()
# Usage with context manager
def analyze_data():
    with ducklake_connection() as con:
        # Your analysis code here
        result = con.execute("SELECT COUNT(*) FROM your_table;").fetchone()
        return resultFor the advanced example, you can set these environment variables:
export DUCKLAKE_S3_ENDPOINT="localhost:9000"
export DUCKLAKE_S3_ACCESS_KEY="minioadmin"  
export DUCKLAKE_S3_SECRET_KEY="minioadmin"
export DUCKLAKE_PG_HOST="localhost"
export DUCKLAKE_PG_USER="ducklake"
export DUCKLAKE_PG_PASSWORD="ducklake"
export DUCKLAKE_PG_DB="ducklake_catalog"
export DUCKLAKE_BUCKET="ducklake"# Check service status
make status
# Check health
make health
# View logs for issues
make logs
# Nuclear option - complete reset
make reset# Test the connection
make test-connection
# Check if services are healthy
make health
# View specific service logs
make logs-postgres  # or logs-minio, logs-init- Increase Docker memory: Ensure Docker has at least 4GB RAM allocated
 - Check disk space: Ensure sufficient space in 
./data/directory - Monitor logs: Use 
make logsto check for errors 
# Create backup before troubleshooting
make backup-data
# Clean slate restart
make clean-data  # β οΈ This deletes all data!
make dev-setup
# Restore from backup if needed
make restore-data BACKUP_FILE=backup_20241201_120000.tar.gz- Fork the repository
 - Create a feature branch: 
git checkout -b my-feature - Make changes and test with 
make dev-setup - Commit changes: 
git commit -am 'Add feature' - Push to branch: 
git push origin my-feature - Open a Pull Request
 
- π DuckLake Documentation - Official DuckLake docs
 - π¦ DuckDB Extensions Guide - DuckDB lakehouse extensions
 - π³ Docker Compose Reference - Container orchestration
 
- π Apache Iceberg - Alternative lakehouse format
 - β‘ DuckDB - Analytical database engine
 - π Apache Arrow - Columnar data format
 
π¦ Happy Quacking! π¦
Made with β€οΈ for the data community
β Star this repo β’ π Report Issues β’ π Documentation