HomeMail

AI-powered mail scanning and organization pipeline.

Leave a scanner by your mail pile, feed in your documents, and walk away. It automatically splits, names, and organizes everything, then gives you a dashboard summarizing what needs your attention — with the original PDFs a click away.

Works with any scanner that uploads to a Samba share. Uses Claude AI for classification and includes a web portal for reviewing everything.

Epson RR-600W ──SMB──▶ RPi5 /opt/homemail/Raw/
                          │
                     pipeline.py (systemd)
                          │
                          ▼
                    /opt/homemail/Organized/  +  Reports/TODO.md
                          │
                    OwnCloud sync (cron)
                          │
                          ▼
                    OwnCloud server ──▶ all devices

Directory Layout

/opt/homemail/
├── Raw/              # Bronze layer — pristine scanner uploads (read-only)
├── Organized/        # Silver layer — AI-classified copies with smart filenames
├── Reports/          # TODO.md, document_index.csv, processing ledger, dashboard
└── _pipeline/        # Application code, config, installer
    ├── pipeline.py   # Main processing engine
    ├── config.toml   # User-editable settings (folders, thresholds, categories)
    ├── install.sh    # Automated installer
    ├── sync.sh       # OwnCloud sync script
    ├── setup.md      # Detailed manual setup guide
    └── .env          # ANTHROPIC_API_KEY (not committed)

Quick Start

git clone git@github.com:rbrenton/homemail.git /opt/homemail
cd /opt/homemail
sudo make install        # runs install.sh (sets up users, samba, deps, systemd)

After install, complete these steps:

Set your API key:

echo "ANTHROPIC_API_KEY=sk-ant-..." > /opt/homemail/_pipeline/.env

Start the service:
```
sudo systemctl start homemail
```
Configure the Epson RR-600W (scanner web UI):
- SMB path: \\<PI_IP>\HomeMail
- Username: scanner
- Password: (set during install, max 20 characters)
Open the dashboard:
```
http://<PI_IP>:8080/Reports/
```

Docker Quick Start

Run on any machine with Docker Desktop, Podman, or Rancher Desktop — no host-level dependencies to install.

git clone git@github.com:rbrenton/homemail.git
cd homemail

# Create data directory
mkdir -p ~/homemail/Raw ~/homemail/Organized ~/homemail/Reports

# Set your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > _pipeline/.env

# Build and start
make docker-up          # or: docker compose up -d --build

Dashboard at http://localhost:8080/Reports/

The container bind-mounts ~/homemail/Raw/, ~/homemail/Organized/, and ~/homemail/Reports/ from the host, so all data stays outside the repo. Drop PDFs into ~/homemail/Raw/ and the pipeline picks them up automatically.

To store data elsewhere, set HOMEMAIL_DATA before starting:

HOMEMAIL_DATA=/mnt/nas/homemail make docker-up

To customize settings, copy _pipeline/config.toml and uncomment the volume mount in docker/docker-compose.yml:

- ~/homemail/my-config.toml:/opt/homemail/_pipeline/config.toml:ro

Auto-start on boot

Runtime	Auto-start
Docker Desktop	Enable "Start Docker Desktop when you sign in" in settings
Podman Desktop	Enable "Start Podman Desktop on login" in preferences
Rancher Desktop	Enable "Start at login" in preferences
Linux dockerd	Enabled by default (`systemctl enable docker`)

The container uses restart: unless-stopped, so it starts automatically whenever the container runtime is running.

Podman compatibility

podman compose works natively — no changes needed. If using Podman 4.7+, the docker compose V2 syntax is also supported via the podman-docker compatibility package.

Bind mount ownership (Linux)

On Linux, files created by the container are owned by root on the host. If you need a specific UID/GID, run the container with --user $(id -u):$(id -g) or add user: "1000:1000" to docker/docker-compose.yml.

Configuration

Settings live in _pipeline/config.toml. The installer creates this file on first install and never overwrites it — your edits are safe across upgrades.

Settings are loaded in three layers (last wins):

Built-in defaults (hardcoded in pipeline.py)
config.toml (overrides defaults)
CLI arguments (override everything)

# _pipeline/config.toml

[folders]
bronze   = "/opt/homemail/Raw"
silver   = "/opt/homemail/Organized"
tracking = "/opt/homemail/Reports"

[processing]
poll_interval  = 15       # seconds between folder scans
ocr_if_needed  = true
verify_copies  = true

[ai]
enabled = true
split_method = "blank"  # "blank", "auto", or "ai"

[blank_detection]
threshold       = 0.98    # 0-1, higher = more lenient
min_text_length = 10

To customize categories, uncomment the [categories.*] sections in the file. When present, they fully replace the built-in list — only the categories you define will be used:

[categories.bill]
label       = "Bill"
description = "Any bill, invoice, or payment request"

[categories.medical]
label       = "Medical"
description = "Medical records, lab results, prescriptions"

Document Splitting

When you scan a batch of mail, the pipeline needs to figure out where one piece of mail ends and the next begins. There are two methods, controlled by ai.split_method in config.toml (or --split-method on the CLI):

Method	How it works
blank (default)	Original behavior — requires a blank separator sheet between each piece of mail
auto	Sends page thumbnails to Claude Haiku for AI boundary detection, falls back to blank-page splitting if AI is unavailable
ai	AI only — fails if AI is unavailable

AI splitting analyzes the actual content of each page — letterheads, dates, reference numbers, sender addresses — to detect where documents change. This means you can feed a stack of mail straight into the scanner without inserting blank pages between them.

Blank-page splitting looks for physical sheets where both sides are blank (the separator pages you insert between mail pieces). Single blank backsides are ignored, not treated as separators.

--no-ai disables both AI splitting and AI classification. To keep AI classification but force blank-page splitting, use --split-method blank.

Usage

# Watch mode (default) — polls for new scans every 15s
uv run _pipeline/pipeline.py

# Process existing files and exit (or: make batch)
uv run _pipeline/pipeline.py --batch

# Skip AI classification (date-based filenames only)
uv run _pipeline/pipeline.py --no-ai

# Enable AI boundary detection (with blank-page fallback)
uv run _pipeline/pipeline.py --split-method auto

# Use a custom config file
uv run _pipeline/pipeline.py --config /path/to/config.toml

# Custom dashboard port (0 to disable)
uv run _pipeline/pipeline.py --port 9090

# Verbose logging
uv run _pipeline/pipeline.py -v

Make Targets

make install      # Full install (requires sudo)
make start        # Start the systemd service
make stop         # Stop the service
make restart      # Restart the service
make status       # Show service status
make logs         # Tail live journal logs
make batch        # One-shot batch processing
make sync         # Run OwnCloud sync manually
make test-smb     # Verify Samba share is accessible
make docker-build # Build the Docker image
make docker-up    # Start the container (builds if needed)
make docker-down  # Stop and remove the container
make docker-logs  # Tail container logs

Dependencies

Docker: Just Docker Desktop, Podman, or Rancher Desktop. All other deps are included in the container image.

Bare-metal (RPi):

System: Python 3.11+, Tesseract OCR, Samba, uv
Python: pymupdf, anthropic, Pillow, pytesseract (declared inline via PEP 723 — uv run installs them automatically)

Install system deps manually with:

sudo apt install -y samba tesseract-ocr
curl -LsSf https://astral.sh/uv/install.sh | sh

Service Management

sudo systemctl status homemail        # Check status
sudo systemctl restart homemail       # Restart after config changes
journalctl -u homemail -f             # Tail live logs
journalctl -u homemail --since "1h"   # Recent logs

Troubleshooting

Problem	Check
Scanner can't connect	`smbclient //localhost/HomeMail -U scanner`
Pipeline not detecting files	`systemctl status homemail` and `ls -la Raw/`
OCR not working	`tesseract --version`
All files named "Unsorted"	Verify `ANTHROPIC_API_KEY` in `_pipeline/.env`
Dashboard not loading	`curl http://localhost:8080/Reports/`
OwnCloud sync issues	`bash _pipeline/sync.sh` and check `Reports/sync.log`

See _pipeline/setup.md for the full manual installation guide.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Organized		Organized
Raw		Raw
Reports		Reports
_pipeline		_pipeline
docker		docker
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HomeMail

Directory Layout

Quick Start

Docker Quick Start

Auto-start on boot

Podman compatibility

Bind mount ownership (Linux)

Configuration

Document Splitting

Usage

Make Targets

Dependencies

Service Management

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HomeMail

Directory Layout

Quick Start

Docker Quick Start

Auto-start on boot

Podman compatibility

Bind mount ownership (Linux)

Configuration

Document Splitting

Usage

Make Targets

Dependencies

Service Management

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages