A beginner-friendly, learn-by-doing guide to understanding every part of this system.
This project is designed for beginners who are curious about how agents work. Instead of hiding the system behind a black box, it gives you a real agent you can run, inspect, and modify while you learn.
You type a message like "Create a Python script that sorts a list of names", and the assistant:
- Thinks about what to do
- Writes the code using tools (file editor, terminal, etc.)
- Runs it to verify
- Reports back with results
All of this happens in real time so you can learn by watching the agent think, write code, and use tools live.
┌─────────────────────────────────────────────────────────┐
│ YOU (the browser) │
│ │
│ "Create a Python script that sorts names" │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Chat Window │ ◄── you see live │
│ │ │ streaming text │
│ │ Agent: "I'll │ and tool results │
│ │ create a file │ │
│ │ called..." │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
(WebSocket)
│
▼
┌─────────────────────────────────────────────────────────┐
│ THE SERVER (backend) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Agent │───▶│ LLM │───▶│ Tools │ │
│ │ Loop │◀───│ API │ │ (bash, │ │
│ │ │ │ │ │ files) │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
This project has three ways to interact with the agent, all talking to the same backend:
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ │ │ │ │ │
│ DEVELOPER UI │ │ USER UI │ │ TERMINAL CLI │
│ (agent-ui/) │ │ (agent-user-ui/) │ │ (agent-cli/) │
│ │ │ │ │ │
│ For developers │ │ For end users │ │ For terminal │
│ Dark theme │ │ Forest Canopy │ │ lovers │
│ Tool blocks │ │ light theme │ │ │
│ Dev panel │ │ Activity indicators│ │ Rich REPL │
│ port 3500 │ │ port 3501 │ │ Direct call │
│ │ │ │ │ │
└──────────┬───────────┘ └──────────┬───────────┘ └──────────┬───────────┘
│ │ │
│ WebSocket │ WebSocket │ Direct
└────────────┬────────────┘ │
└──────────────────┬────────────────────┘
▼
┌──────────────────────┐
│ │
│ BACKEND │
│ (agent-api/) │
│ │
│ What THINKS │
│ and ACTS │
│ │
│ Python + FastAPI │
│ + LLM API │
│ │
└──────────────────────┘
Developer UI (agent-ui/) = full-featured chat interface with tool blocks, dev panel, and raw WebSocket inspector — built for developers building and debugging the agent
User UI (agent-user-ui/) = streamlined chat interface with activity indicators and simplified dialogs — built for end users who just want to use the agent
Terminal CLI (agent-cli/) = rich REPL with history, autocomplete, and vi mode — for terminal lovers
Backend = the brain that receives your messages, talks to the configured LLM provider, runs tools, and sends results back
They communicate in two ways:
| Method | What It Does | Analogy |
|---|---|---|
| REST API (HTTP) | Create conversations, list history, download files | Like sending a letter and getting a reply |
| WebSocket | Stream the agent's thinking and actions in real time | Like a phone call — always connected, instant |
The most important part of the entire system is the agent loop. It's surprisingly simple — just a while loop:
┌─────────────────────────────────────────────────────────────┐
│ THE AGENT LOOP │
│ │
│ START │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Send messages to │ │
│ │ the LLM │──── "Here's the conversation │
│ │ │ and available tools" │
│ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ The LLM responds │ │
│ │ │ │
│ │ Either: │ │
│ │ A) Just text ─────── DONE! (agent decided it's │
│ │ B) Tool requests │ finished) │
│ └──────────┬───────────┘ │
│ │ (B) │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Execute the tools │──── Run bash commands, │
│ │ the LLM asked for │ read/write files, etc. │
│ │ (all in parallel!) │ │
│ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Add tool results │ │
│ │ to the conversation │──── "Here's what happened │
│ │ │ when I ran your tools" │
│ └──────────┬───────────┘ │
│ │ │
│ └──────── Go back to the top ──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
In plain English:
- Send the full conversation + available tools to the LLM
- The LLM either gives a final answer (done!) or asks to use tools
- If tools were requested, run them all at the same time
- Send the results back to the LLM
- Repeat until the LLM gives a final answer (or we hit safety limits)
The agent uses the think tool to self-verify its work before finishing — reviewing tool results for errors, confirming all parts of the request are addressed, and fixing any issues it finds. No separate verification phase is needed; the agent handles this naturally within the loop.
This is the standard pattern used by tool-calling AI assistants and coding agents.
You say: "What files are in this directory?"
Turn 1:
You → LLM: "What files are in this directory?"
LLM → You: I'll check. [uses tool: bash("ls -la")]
System runs: bash("ls -la") → "file1.py file2.py README.md"
Turn 2:
You → LLM: [tool result: "file1.py file2.py README.md"]
LLM → You: "There are 3 files: file1.py, file2.py, and README.md"
(no more tool calls → DONE)
Today you might use Anthropic. Tomorrow you might want OpenAI, a self-hosted model, or some internal compatible endpoint. We don't want to rewrite the entire agent just to switch.
Think of electrical outlets around the world. Your laptop just needs electricity — it doesn't care whether the outlet is American, European, or British. A travel adapter handles the translation.
┌─────────────────────────────────────────────────────────────┐
│ │
│ WITHOUT adapter (tightly coupled — bad): │
│ │
│ Agent Loop ──── directly calls ──── Provider SDK/API │
│ │
│ (If you switch providers, you rewrite the entire loop) │
│ │
├─────────────────────────────────────────────────────────────┤
│ │
│ WITH adapter (loosely coupled — good): │
│ │
│ Agent Loop ──── calls ──── Universal Interface │
│ │ │
│ ├── ProviderAdapter A │
│ ├── ProviderAdapter B │
│ └── ProviderAdapter C │
│ │
│ (Switch provider by changing ONE line in main.py) │
│ │
└─────────────────────────────────────────────────────────────┘
Piece 1: The Contract — LLMClient Protocol
"Any LLM client must have these two methods:
create() → send messages, get a response (one shot)
stream() → send messages, get response piece by piece (live)
I don't care HOW you implement them. Just that they exist."
This is like saying "any electrical outlet must provide electricity at a certain voltage." The shape can vary — the adapter handles that.
Piece 2: The Universal Response — LLMResponse
Every provider returns data differently. This dataclass is the universal format that the loop always receives:
LLMResponse:
content → what the model said (text blocks, tool requests)
tool_calls → list of tools it wants to use
done → true if the model is finished (no tools requested)
input_tokens → how many tokens we sent
output_tokens → how many tokens we received
Piece 3: The Adapter — a provider adapter such as AnthropicAdapter or OpenAIAdapter
The actual translator. It does two things:
- Forward calls to the real provider SDK or API client
- Convert provider-specific responses into the universal
LLMResponse
┌─────────────┐ ┌───────────────────┐ ┌─────────────┐
│ Agent Loop │──────▶│ ProviderAdapter │──────▶│ Provider API │
│ │ │ │ │ │
│ "I just │ │ 1. Takes universal │ │ Returns │
│ need an │ │ format │ │ Provider- │
│ LLMResponse│ │ 2. Calls backend │ │ specific │
│ back" │◀──────│ 3. Converts back │◀──────│ response │
│ │ │ to LLMResponse │ │ object │
└─────────────┘ └───────────────────┘ └─────────────┘
| File | Knows about provider SDK details? | Role |
|---|---|---|
main.py |
No | Wires the configured provider into the application lifecycle. |
llm.py |
Yes (adapter layer only) | Defines the contract, provider adapters, and lazy SDK imports |
loop.py |
No | The agent loop. Only sees LLMClient. |
memory.py |
No | Memory system. Only sees LLMClient. |
websocket.py |
No | WebSocket handler. Only sees LLMClient. |
To add another backend, you'd write a new adapter and change the provider wiring. Nothing else changes.
The agent isn't just a chatbot — it can take actions. Tools are the hands of the agent.
┌─────────────────────────────────────────────────────────────┐
│ TOOL REGISTRY │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ bash │ │ read_file │ │write_file │ │
│ │ │ │ │ │ │ │
│ │ Run shell │ │ Read any │ │ Create or │ │
│ │ commands │ │ file in │ │ overwrite │ │
│ │ (ls, git, │ │ workspace │ │ files │ │
│ │ python) │ │ │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ edit_file │ │ think │ │ task │ │
│ │ │ │ │ │ │ │
│ │ Surgical │ │ Pause and │ │ Spawn a │ │
│ │ find & │ │ reason │ │ sub-agent │ │
│ │ replace │ │ step-by- │ │ for │ │
│ │ in files │ │ step │ │ subtasks │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ todo_* │ │ task_* │ │ compact │ │
│ │ │ │ │ │ │ │
│ │ Track │ │ Persistent│ │ Manually │ │
│ │ progress │ │ tasks │ │ compress │ │
│ │ with a │ │ with deps │ │ context │ │
│ │ checklist │ │ on disk │ │ window │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ list_skills │ │ read_skill │ │background_* │ │
│ │ │ │ │ │ │ │
│ │ See what │ │ Load expert │ │ Run long │ │
│ │ skills are │ │ knowledge │ │ commands in │ │
│ │ available │ │ (e.g. how to │ │ background │ │
│ │ │ │ make PDFs) │ │ (fire-and- │ │
│ └─────────────┘ └──────────────┘ │ forget) │ │
│ └──────────────┘ │
│ Plan mode tools: │
│ ┌────────────────┐ ┌─────────────────┐ │
│ │enter_plan_mode │ │ exit_plan_mode │ │
│ │ │ │ │ │
│ │Agent decides to│ │Signal that plan │ │
│ │enter read-only │ │is ready for │ │
│ │exploration mode│ │user approval │ │
│ │(sentinel-based)│ │(sentinel-based) │ │
│ └────────────────┘ └─────────────────┘ │
│ │
│ Team tools (when enabled): │
│ ┌──────────────┐ ┌──────────┐ ┌───────────┐ │
│ │spawn_teammate│ │send_ │ │ broadcast │ │
│ │ │ │ message │ │ │ │
│ │Create a named│ │Talk to a │ │Message │ │
│ │teammate agent│ │specific │ │all team │ │
│ │with its own │ │teammate │ │members at │ │
│ │agent loop │ │or lead │ │once │ │
│ └──────────────┘ └──────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────┘
Each tool has two parts:
- Definition — a JSON description that tells the LLM what the tool does and what inputs it needs (like a manual)
- Handler — the actual Python function that runs when the LLM calls the tool
When the LLM sees the tool definitions, it decides which ones to use. The system then calls the corresponding handler and sends the result back.
Tools aren't a free-for-all. There are guardrails:
- bash: Blocks dangerous commands (
rm -rf /,sudo,shutdown). Has a 60-second timeout. - file tools: Can only access files inside the workspace folder. Path traversal attacks (like
../../etc/passwd) are blocked. - task (sub-agents): Maximum depth of 2 — a sub-agent can't spawn more sub-agents forever.
- token budget: The whole system has a spending limit (default 200K tokens) so it doesn't run forever.
With normal HTTP (REST), the flow would be:
You: "Write me a program"
.......... (wait 30 seconds, staring at blank screen) ..........
Server: "Here's everything I did" (one giant response)
With WebSocket, you see everything as it happens:
You: "Write me a program"
Server: "I'll start by..." (text_delta)
Server: "Using tool: write_file" (tool_call)
Server: "File created successfully" (tool_result)
Server: "Now let me test it..." (text_delta)
Server: "Using tool: bash" (tool_call)
Server: "Output: Hello World!" (tool_result)
Server: "All done! Here's what I did..." (text_delta)
Server: {type: "done", files: [...], usage: {...}} (done)
The server sends different types of JSON events through the WebSocket:
┌──────────────────┬──────────────────────────────────────────┐
│ Event │ What it means │
├──────────────────┼──────────────────────────────────────────┤
│ text_delta │ A chunk of the agent's response text │
│ │ (arrives word by word, like typing) │
├──────────────────┼──────────────────────────────────────────┤
│ tool_call │ The agent wants to use a tool │
│ │ (e.g. "running bash command: ls") │
├──────────────────┼──────────────────────────────────────────┤
│ tool_result │ The tool finished, here's the output │
├──────────────────┼──────────────────────────────────────────┤
│ subagent_start │ A sub-agent was spawned for a subtask │
├──────────────────┼──────────────────────────────────────────┤
│ subagent_end │ The sub-agent finished │
├──────────────────┼──────────────────────────────────────────┤
│ todo_update │ The task checklist was updated │
├──────────────────┼──────────────────────────────────────────┤
│ task_update │ A persistent task was created, updated, │
│ │ or completed │
├──────────────────┼──────────────────────────────────────────┤
│ background_result│ A background command finished running │
├──────────────────┼──────────────────────────────────────────┤
│ teammate_status │ A teammate agent changed state │
│ │ (spawned, working, idle, shutdown) │
├──────────────────┼──────────────────────────────────────────┤
│ compact │ Old messages were summarized to save │
│ │ memory (context window management) │
├──────────────────┼──────────────────────────────────────────┤
│ tool_approval │ The agent wants permission to run a tool │
│ _request │ (when approval is enabled) │
├──────────────────┼──────────────────────────────────────────┤
│ tool_approval │ User responded to an approval request │
│ _result │ │
├──────────────────┼──────────────────────────────────────────┤
│ plan_mode │ Plan mode was toggled on or off │
│ _changed │ │
├──────────────────┼──────────────────────────────────────────┤
│ plan_ready │ The agent finished planning and the │
│ │ plan is ready for review │
├──────────────────┼──────────────────────────────────────────┤
│ plan_approved │ User approved the plan │
├──────────────────┼──────────────────────────────────────────┤
│ plan_rejected │ User rejected the plan (with optional │
│ │ feedback) │
├──────────────────┼──────────────────────────────────────────┤
│ teams_changed │ Teams were toggled on or off │
├──────────────────┼──────────────────────────────────────────┤
│ approval_changed │ Tool approval was toggled on or off │
├──────────────────┼──────────────────────────────────────────┤
│ llm_request │ (tracing) The exact prompt sent to │
│ │ the LLM │
├──────────────────┼──────────────────────────────────────────┤
│ llm_response │ (tracing) The exact response from │
│ │ the LLM │
├──────────────────┼──────────────────────────────────────────┤
│ interrupted │ Agent was interrupted mid-stream by │
│ │ user feedback │
├──────────────────┼──────────────────────────────────────────┤
│ done │ The agent loop finished. Includes │
│ │ token usage and any files created. │
├──────────────────┼──────────────────────────────────────────┤
│ error │ Something went wrong │
└──────────────────┴──────────────────────────────────────────┘
Let's trace what happens when you type "Create a hello world Python script" from start to finish:
STEP 1: You type and hit Send
═══════════════════════════════════════════════════════════
Browser (app.js)
│
├── Renders your message in the chat window
├── Disables the input box (streaming mode)
└── Sends via WebSocket:
{"type": "message", "content": "Create a hello world Python script"}
STEP 2: Server receives your message
═══════════════════════════════════════════════════════════
WebSocket handler (websocket.py)
│
├── Acquires per-conversation lock (prevents race conditions)
├── Loads conversation history from SQLite database
├── Appends your message to the history
├── Starts a cancel listener (in case you disconnect)
└── Calls agent_loop() ──────────────────────────────────┐
│
▼
STEP 3: The agent loop runs agent_loop()
═══════════════════════════════════════════════════════════
│
TURN 1: │
│ │
├── Builds system prompt: │
│ "You are a coding agent. Workspace: /workspace" │
│ + memory from past sessions │
│ + available skills list │
│ + available sub-agent types │
│ │
├── Calls the configured LLM via llm.stream(): │
│ - model: "configured model name" │
│ - system: (the system prompt above) │
│ - messages: [your message + any history] │
│ - tools: [bash, read_file, write_file, ...] │
│ │
├── Streams text back to browser: │
│ ◄── text_delta: "I'll create a hello world..." │
│ ◄── text_delta: "script for you." │
│ │
├── The LLM also requested a tool: │
│ write_file({path: "hello.py", content: "print(..."})│
│ │
│ ◄── tool_call: {tool: "write_file", input: {...}} │
│ │
├── Executes the tool: │
│ write_file handler creates the file on disk │
│ │
│ ◄── tool_result: {tool: "write_file", │
│ result: "File written: hello.py"} │
│ │
├── Appends tool result to conversation │
│ │
└── Continues to TURN 2... │
│
TURN 2: │
│ │
├── Calls the LLM again with updated history │
│ (now includes the tool result) │
│ │
├── The LLM responds with just text (no tools): │
│ ◄── text_delta: "I've created hello.py with..." │
│ │
├── response.done = true → EXIT LOOP │
│ │
└── Returns usage: {input_tokens: 1234, output_tokens: 89}
STEP 4: Cleanup and finalization
═══════════════════════════════════════════════════════════
WebSocket handler (websocket.py)
│
├── Saves all new messages to SQLite database
├── Saves token usage to database
├── Sets conversation title to "Create a hello world..."
├── Scans workspace for files the agent created
│ → finds: [{name: "hello.py", path: "hello.py", size: 28}]
│
└── Sends final event:
◄── done: {usage: {input: 1234, output: 89},
files: [{name: "hello.py", ...}]}
STEP 5: Browser updates
═══════════════════════════════════════════════════════════
Browser (app.js)
│
├── Re-enables the input box
├── Updates token counter in the header
├── Renders a "Files created" card with download link
└── Refreshes the conversation list in the sidebar
Instead of a separate verification phase injected by the loop, the agent uses the think tool to verify its own work before finishing. The system prompt instructs the agent to:
- Review tool results for unaddressed errors
- Confirm all parts of the original request were completed
- Re-read any files it created or modified
- Fix any issues it finds (using tools as needed)
- Summarize what was done
Agent writes file → Agent runs it → Agent uses think tool:
│
▼
"Let me verify:
- File created ✓
- Ran successfully ✓
- Output correct ✓
- All parts addressed ✓"
│
├── Finds issue → fixes with tools → re-verifies
└── All good → responds with summary → DONE!
Key design choices:
- Agent-driven: The agent decides when, what, and how to verify — not forced by the loop
- Natural flow: Verification happens within the normal tool loop, not as a separate phase
- No extra turns: For simple tasks, the agent may skip explicit verification and just respond
- Flexible depth: The agent can verify as many times as needed — no hardcoded round limits
LLMs have a limited "memory" per conversation (called a context window). As conversations get long, they might not fit. The system handles this with three layers of compaction, each progressively more aggressive:
Layer 1 — Micro-compact (every turn, zero LLM cost)
This is the cheapest trick in the book. Every turn, the system scans old tool results (more than 100 characters, excluding the last 3 results) and replaces them with a short placeholder like [Previous: used bash]. This happens silently every turn and costs nothing — no LLM call needed, just string replacement. It keeps the conversation history from ballooning with huge tool outputs that the agent no longer needs to see.
Layer 2 — Auto-compact with transcript preservation
When input tokens exceed a threshold, the system kicks in with a heavier approach. First, it saves the full conversation history to workspace/.transcripts/ as a backup (so you always have an audit trail). Then it uses the LLM to summarize the older messages into a concise recap. The last 8 messages are kept intact so the agent doesn't forget what it was just doing. If the LLM summary fails for any reason, the system falls back to hard truncation — it just drops the oldest messages to make things fit.
Layer 3 — Manual compact tool
The agent can also call the compact tool on demand to trigger compaction whenever it feels the context is getting cluttered. Under the hood, this uses the sentinel pattern: the tool returns __COMPACT_REQUESTED__, and the agent loop detects this special value and triggers the Layer 2 compaction process.
Here's how the three layers look in practice:
Layer 1 — Micro-compact (every turn, free):
tool_result from 10 turns ago: "read_file → [487 lines of code...]"
becomes: "[Previous: used read_file]"
Layer 2 — Auto-compact (when context gets full):
BEFORE compaction (too long!):
┌────────────────────────────────────────┐
│ Message 1: "Create a web server" │
│ Message 2: [tool calls and results] │
│ Message 3: "Now add authentication" │ ← old messages
│ Message 4: [tool calls and results] │ (summarized)
│ Message 5: "Add a database" │
│ Message 6: [tool calls and results] │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ Message 7: "Now deploy it" │
│ Message 8: [tool calls and results] │ ← recent messages
│ Message 9: "Fix the Docker issue" │ (kept intact)
│ Message 10: [tool calls and results] │
└────────────────────────────────────────┘
AFTER compaction (fits!):
┌────────────────────────────────────────┐
│ [Summary]: "User built a web server │ ← LLM-generated
│ with auth and database. Files: │ summary
│ server.py, auth.py, db.py" │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ Message 7: "Now deploy it" │
│ Message 8: [tool calls and results] │ ← kept as-is
│ Message 9: "Fix the Docker issue" │
│ Message 10: [tool calls and results] │
└────────────────────────────────────────┘
Layer 3 — Manual compact (agent decides):
Agent calls compact tool → loop triggers Layer 2
The key insight is that these layers work together: Layer 1 quietly trims the fat every turn, Layer 2 kicks in automatically when things get tight, and Layer 3 lets the agent take control when it knows a big summarization would help.
When you disconnect, the system analyzes your entire conversation and saves key learnings to a file (.agent/memory.md). Next time you connect, these learnings are loaded:
SESSION 1:
You: "I always use pytest for testing"
You: "My project uses PostgreSQL"
[disconnect]
│
▼ (LLM analyzes and saves to .agent/memory.md)
SESSION 2:
System prompt now includes:
"Memory from previous sessions:
- User prefers pytest for testing
- Project uses PostgreSQL database"
The memory file is organized into sections: User Preferences, Project Context, Decisions, and Corrections.
For complex tasks, the main agent can spawn sub-agents — smaller, focused agents that handle a specific subtask:
┌─────────────────────────────────────────────────┐
│ MAIN AGENT │
│ │
│ "I need to refactor this codebase. │
│ Let me break it down..." │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Sub-agent: │ │ Sub-agent: │ │
│ │ "explore" │ │ "code" │ │
│ │ │ │ │ │
│ │ Investigates│ │ Does the │ │
│ │ the codebase│ │ actual │ │
│ │ (read-only) │ │ refactoring │ │
│ │ │ │ (full tools)│ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ "Found 3 files "Refactored │
│ that need work" successfully" │
│ │
└─────────────────────────────────────────────────┘
There are four types:
| Type | Purpose | Tools Available |
|---|---|---|
| explore | Investigate code (read-only) | bash, read_file, think |
| code | Write and modify code | All tools |
| plan | Analyze and plan (no changes) | bash, read_file, think |
| research | Deep investigation | bash, read_file, write_file, think |
Sub-agents run in isolation — they have their own conversation history, their own system prompt, and they can't spawn more sub-agents (max depth = 2, preventing infinite loops).
Skills are like instruction manuals the agent can read when it needs specialized knowledge:
skills/
├── code-review/SKILL.md ← How to do thorough code reviews
├── api-design/SKILL.md ← REST API design best practices
├── dockerfile-builder/SKILL.md ← Docker image best practices
├── pdf-writer/SKILL.md ← How to generate PDFs with Python
├── excel-writer/SKILL.md ← How to create Excel files
├── ppt-writer/SKILL.md ← How to make PowerPoint slides
├── docx-writer/SKILL.md ← How to create Word documents
└── design/SKILL.md ← Frontend design patterns
The agent doesn't load all skills at once (that would waste context). Instead, it sees a short description of each skill, and only loads the full content when it decides it needs it. This is called progressive disclosure:
Layer 1 (always visible): "pdf-writer — Create PDF documents with Python"
│
Agent thinks: "I need this!"
│
▼
Layer 2 (loaded on demand): Full SKILL.md with templates, code examples,
library APIs, and best practices
│
▼
Layer 3 (referenced): scripts/, assets/ folders with helper files
The agent can behave differently based on the preset chosen:
┌─────────────────────────────────────┐
│ "coding" preset (default) │
│ │
│ System: "You are a coding agent. │
│ Use tools to write code. │
│ Plan → Act → Report." │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ "work" preset │
│ │
│ System: "You are a work assistant. │
│ Help create documents, │
│ spreadsheets, presentations." │
└─────────────────────────────────────┘
Presets are stored as PROMPT.md files in the prompts/ directory. When you start a new chat, you pick a preset, and that determines the agent's system prompt (its "personality" and instructions).
Sometimes the agent needs to pause and think before acting — especially when analyzing complex tool outputs, weighing trade-offs, planning multi-step tasks, or verifying its own work before finishing. The think tool gives it a dedicated space to reason without side effects:
Agent receives complex error output from bash tool
│
▼
┌──────────────────────────────────────────────────┐
│ think({ │
│ thought: "The error says 'ModuleNotFoundError' │
│ for pandas. This means: │
│ 1. pandas isn't installed in this env │
│ 2. I should check if there's a requirements │
│ file first before pip installing │
│ 3. The user might have a venv I should use │
│ Let me check for requirements.txt and venvs." │
│ }) │
└──────────────────────────────────────────────────┘
│
▼
Agent now calls bash("ls requirements*.txt .venv/")
(makes a better decision because it thought first)
The tool is a no-op — it doesn't fetch new information or change anything. The thought is simply logged in the conversation. The value is in forcing the model to articulate its reasoning as a discrete step, which often leads to better decisions on complex tasks.
The think tool serves double duty as the agent's self-verification mechanism. Before finishing a task, the agent uses think to review its work — checking tool results for errors, confirming all parts of the request were addressed, and deciding whether to fix issues or respond. This replaces the need for a separate verification phase in the loop.
The think tool is available everywhere: the main agent loop, all sub-agent types, and all teammates.
For truly complex projects, a single agent isn't enough. The system can spawn named teammate agents that work in parallel, each with their own agent loop:
┌──────────────────────────────────────────────────────┐
│ LEAD AGENT │
│ │
│ "I need to build a REST API with tests. │
│ Let me spawn teammates for parallel work." │
│ │
│ spawn_teammate("alice", "backend", "Build the API") │
│ spawn_teammate("bob", "tester", "Write tests") │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ "alice" │ │ "bob" │ │
│ │ (backend) │ │ (tester) │ │
│ │ │ │ │ │
│ │ WORKING... │ │ WORKING... │ │
│ │ Building │ │ Writing │ │
│ │ the API │ │ tests │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └──── message bus ────┘ │
│ (async inbox) │
│ │
│ Lead reads inbox: │
│ alice: "API ready at routes.py" │
│ bob: "Tests passing, 12 assertions" │
└──────────────────────────────────────────────────────┘
How teammates work:
Each teammate runs a WORK/IDLE state machine:
- WORK phase: Standard agent loop — call LLM, execute tools, drain inbox. When done, call the
idletool. - IDLE phase: Poll every 5 seconds for new inbox messages or unclaimed tasks on the task board. If nothing arrives after 60 seconds, auto-shutdown.
Teammates have their own tools: bash, file tools, think, send_message, read_inbox, and idle. They can not spawn more teammates (prevents infinite spawning).
Communication happens through a message bus — an async inbox per agent. The lead can send targeted messages, broadcast to everyone, or request structured protocols (shutdown requests, plan approvals).
Teams are opt-in per conversation — you enable them when creating a new chat.
Some commands take a long time (builds, test suites, downloads). Instead of blocking the agent loop, the agent can run them in the background:
Agent: "I'll run the test suite in the background
while I work on the next feature."
background_run("pytest -v tests/")
│
├── Returns immediately: "task_id: bg_001"
│
│ Agent continues working on other things...
│
▼ (later, at start of next turn)
Loop automatically drains notifications:
"bg_001 completed: 42 passed, 1 failed"
Agent sees the result and can react to it.
Background tasks run as async subprocesses. Their results are collected in a notification queue and automatically injected into the conversation at the start of each turn.
For complex multi-step work, the agent can create persistent tasks that survive context compaction:
┌─────────────────────────────────────────────────┐
│ TASK BOARD (.tasks/ directory) │
│ │
│ #1 ✓ Set up project structure │
│ #2 ● Implement authentication ← in progress │
│ blockedBy: [] │
│ #3 ○ Add rate limiting ← pending │
│ blockedBy: [#2] ← can't start │
│ #4 ○ Write integration tests │
│ blockedBy: [#2, #3] ← blocked by 2 │
└─────────────────────────────────────────────────┘
Each task is stored as a JSON file on disk (workspace/.tasks/task_1.json). Tasks support:
- Dependencies:
blockedByandblocksarrays - Cascade completion: Finishing task #2 automatically unblocks #3
- Ownership: Teammates can claim unclaimed tasks from the board
- Metadata: Arbitrary key-value pairs
This is separate from the simpler todo_* tools (which are in-memory checklists).
For complex tasks, it's better to plan first rather than dive straight into writing code. Plan mode is a read-only exploration phase where the agent can look at the codebase but can't make any changes.
┌─────────────────────┐
User sends │ Normal Mode │
complex task ───► │ (all tools) │
└──────────┬──────────┘
│
Agent calls enter_plan_mode
(or user activates it)
│
▼
┌─────────────────────┐
│ Plan Mode │
│ (read-only tools) │◄──── revise
│ │ │
│ - read_file │ │
│ - think │ │
│ - task_create │ │
│ - exit_plan_mode │ │
│ │ │
│ ✗ bash │ │
│ ✗ write_file │ │
│ ✗ edit_file │ │
└──────────┬──────────┘ │
│ │
Agent calls exit_plan_mode │
│ │
▼ │
┌─────────────────────┐ │
│ Plan presented │ │
│ for approval │ │
└──┬──────┬───────┬───┘ │
│ │ │ │
Approve Feedback Reject │
│ │ │
│ └────────────────────┘
▼
┌─────────────────────┐
│ Execute plan │
│ (full tools) │
└─────────────────────┘
How it works under the hood:
Plan mode uses the sentinel pattern (same as the compact tool). The enter_plan_mode tool returns a magic string __ENTER_PLAN_MODE__. The agent loop detects this and:
- Sets
_plan_mode = True(a mutable local variable) - Recalculates which tools are available (only read-only ones)
- Appends
PLAN_MODE_PROMPTto the system prompt - Emits a
plan_mode_changedevent to the UI
When the agent is done planning, it calls exit_plan_mode (returns __PLAN_READY__). The loop:
- Extracts the plan text from the last assistant message
- Emits a
plan_readyevent - Breaks out of the loop
The plan then goes through the approval flow — the user can approve (switches to execution), reject with feedback (agent revises), or just reject.
Four ways to activate plan mode:
| Method | Who initiates |
|---|---|
enable_plan_mode flag at chat creation |
User (UI/API) |
/plan slash command in CLI |
User (CLI) |
| Toggle button in web UI header | User (UI) |
enter_plan_mode tool call |
Agent (autonomous) |
The last one is new — the agent can decide on its own that a task is complex enough to warrant planning first, just like a human developer would sketch out an approach before coding.
Some tools are harmless — reading a file or thinking out loud can't break anything. But running a shell command like rm -rf / or overwriting a config file? You probably want a say in that.
When tool approval is enabled for a conversation, the agent pauses before running any "dangerous" tool and asks you for permission first. Read-only tools like think, read_file, list_skills, and compact are in a SAFE_TOOLS set — they execute automatically without bothering you. But write tools like bash, write_file, and edit_file trigger a permission prompt.
Here's what happens under the hood:
Agent wants to run bash("rm old_files/")
│
▼
Is bash in SAFE_TOOLS? → No
│
▼
Send tool_approval_request to user
│
▼
┌─────────────────────────────────┐
│ "Agent wants to run: │
│ bash: rm old_files/ │
│ │
│ [Approve] [Deny] [Auto] │
└─────────────────────────────────┘
│
├── Approve → execute normally
├── Deny → "User denied this tool call" sent to LLM
└── Auto-approve → no more prompts this session
When the agent hits a tool that needs approval, it sends a tool_approval_request event over the WebSocket to the UI. Then it blocks — the agent loop waits on an asyncio.Queue until it hears back from you. You have three choices:
- Approve: The tool runs normally, and the agent continues.
- Deny: The tool is skipped, and the LLM receives
"User denied this tool call"as the tool result. This lets the agent adjust its approach — maybe it'll try a safer command or ask you for guidance. - Auto-approve: Approval is turned off for the rest of the session. Every tool runs without asking from that point on.
There's a 5-minute timeout on the approval prompt. If you disconnect or walk away, the agent won't hang forever — it times out and treats it as a denial.
One important detail: sub-agents don't get their own approval prompts. When you approve the task tool (which spawns a sub-agent), that sub-agent runs autonomously. The approval happened at the parent level — you approved the delegation, so the sub-agent is trusted to do its job.
Tool approval is opt-in per conversation. You enable it when creating a chat (via the API or a UI toggle), so your normal quick-and-dirty sessions aren't slowed down by constant prompts.
Sometimes the agent starts heading in the wrong direction and you don't want to wait for it to finish before correcting course. Cancellation and interrupt let you stop the agent mid-stream and either bail out or redirect it.
Each WebSocket session creates an asyncio.Event called cancelled. This is the kill switch — the agent loop checks it before every LLM call and before every tool execution.
Agent is working... ──── user sends new message ────► Interrupt!
│ │
▼ ▼
Stops at next New agent turn starts with:
checkpoint "User interrupted: actually do X instead"
There are two flavors:
Cancel — just stop. The client sends {"type": "cancel"} over the WebSocket. The agent finishes whatever atomic operation it's in the middle of (it doesn't kill a running subprocess mid-byte), then stops at the next checkpoint and sends a done event. Clean exit.
Interrupt with feedback — stop and redirect. The client sends {"type": "interrupt", "content": "actually do X instead"}. The agent stops the same way, but instead of just quitting, it starts a new turn with your feedback as the user message. The agent sees something like "User interrupted: actually do X instead" and picks up from there with the new direction.
In the CLI, this works through Ctrl+C. When you interrupt, the CLI shows a feedback prompt — you can type a new direction (like "focus on the tests instead") or just press Enter to cancel without feedback.
In the web UI, there's a cancel button that appears in the input bar while the agent is streaming. Typing a new message while the agent is working triggers an interrupt with your message as the feedback.
One subtle detail: when the agent is interrupted mid-response, it might have produced a partial assistant message that includes a tool_use block without a matching tool_result. The provider message format requires these to be paired, so the system strips orphaned tool_use blocks from the interrupted message before starting the new turn. Without this cleanup, the next API call would fail validation.
The agent comes with a solid set of built-in tools (bash, file read/write, think, etc.), but what if you want it to interact with GitHub, Slack, a database, or some custom internal service? That's where MCP (Model Context Protocol) comes in.
MCP is a plugin system for AI tools. It lets external servers expose tools that the agent can use, just like its built-in ones. Think of it like browser extensions — the browser works fine on its own, but extensions give it new powers.
Without MCP:
Agent --> built-in tools only (bash, files, think...)
With MCP:
Agent --> built-in tools + external MCP tools
|
|-- mcp__filesystem__read_file
|-- mcp__filesystem__write_file
|-- mcp__github__create_issue
+-- mcp__slack__send_message
How to set it up: Create a file called mcp_servers.json in the backend directory. No file = no MCP, zero performance impact. The agent works exactly the same without it.
Here's what a config looks like:
{
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
"env": { "NODE_ENV": "production" }
},
"github": {
"url": "https://mcp.github.example.com",
"headers": { "Authorization": "Bearer ${GITHUB_TOKEN}" }
}
}Notice the ${GITHUB_TOKEN} syntax — the config supports environment variable substitution, so you don't have to hardcode secrets.
Two connection modes:
| Mode | How it works | When to use |
|---|---|---|
| Client-mode | The backend connects to MCP servers itself (local processes via stdio, or remote HTTP servers). Tools get namespaced as mcp__{server}__{tool} and injected into the ToolRegistry. |
Works with any LLM provider. You control the servers. |
| Remote-mode | Server connection params are passed directly to Anthropic's API — the MCP server runs on Anthropic's side. | For Anthropic-hosted MCP services. |
In client-mode, when the backend starts up, it reads mcp_servers.json, connects to each configured server, discovers what tools they offer, and registers them in the ToolRegistry with namespaced names. From the agent's perspective, mcp__filesystem__read_file is just another tool — it doesn't know or care that the tool lives in an external process.
┌──────────────────────────────────────────────────────────┐
│ Backend (agent-api) │
│ │
│ ToolRegistry │
│ |-- bash (built-in) │
│ |-- write_file (built-in) │
│ |-- read_file (built-in) │
│ |-- think (built-in) │
│ |-- mcp__filesystem__read_file --+ │
│ |-- mcp__filesystem__write_file --+-- from MCP server │
│ |-- mcp__github__create_issue --+ │
│ +-- mcp__slack__send_message --+ │
│ │
└──────────────────────────────────────────────────────────┘
| | |
v v v
+───────────+ +───────────+ +───────────+
| filesystem| | github | | slack |
| MCP server| | MCP server| | MCP server|
| (stdio) | | (HTTP) | | (HTTP) |
+───────────+ +───────────+ +───────────+
The key insight: MCP is purely additive. It doesn't change how the agent loop works, doesn't affect the LLM wrapper chain, and doesn't touch existing tools. It just makes the ToolRegistry bigger.
What happens if the agent is halfway through writing code and runs out of output tokens? Or hits the turn limit while still working? You'd get an incomplete response — a half-written file, an unexplained stop, or no summary of what was accomplished.
The agent loop has three safety nets to make sure you always get a proper conclusion:
1. Truncation auto-continuation
When the model hits max_output_tokens mid-generation (the API returns stop_reason=max_tokens), the loop doesn't just give up. It injects a continuation prompt — essentially saying "keep going from where you left off" — and lets the model finish its thought. This can happen up to 3 times in a row, preventing the common problem where the model writes code but stops before it can execute it.
LLM response stopped: stop_reason = "max_tokens" (hit the limit!)
|
v
Continuation #1: "Please continue from where you left off."
|
v
LLM continues... stop_reason = "max_tokens" (still going!)
|
v
Continuation #2: "Please continue from where you left off."
|
v
LLM continues... stop_reason = "end_turn" (done naturally!)
2. Wrap-up nudge
When the agent is 3 turns away from the turn limit, the system prompt gets a little extra paragraph telling the model to start wrapping things up. It's a gentle hint, not a hard stop — the agent can still use tools and do real work, but it knows time is running out.
3. Forced final summary
If the agent exhausts all its turns and still hasn't stopped, the loop makes one last LLM call with all tools disabled. The model has no choice but to produce a text response — a summary of everything it accomplished (and anything it didn't finish). This guarantees you always get a readable conclusion, never a silent stop.
Here's how these three mechanisms work together near the end of a run:
Turn 47/50: "You're approaching the turn limit. Please wrap up."
(nudge added to system prompt)
Turn 48/50: Agent continues working...
Turn 49/50: Agent continues working...
Turn 50/50: FORCED: "Summarize everything you've done" (no tools)
--> Agent produces final summary --> DONE
The result: no matter what happens — token limits, turn limits, or the model just being verbose — you always get a meaningful final response.
The system doesn't talk to a provider API directly from the loop. Instead, it wraps the adapter in transparent layers, each adding one concern:
┌───────────────────────────────────────────────────────┐
│ │
│ Agent Loop │
│ │ │
│ ▼ │
│ TracingLLMClient (optional, per-session) │
│ │ Emits llm_request/llm_response events │
│ │ to the WebSocket for the dev panel │
│ │ │
│ ▼ │
│ RetryingLLMClient │
│ │ Retries on 429 (rate limit), 500, 502, 503 │
│ │ Exponential backoff with jitter │
│ │ Max 3 retries │
│ │ │
│ ▼ │
│ ProviderAdapter │
│ │ Translates to/from provider SDK format │
│ │ │
│ ▼ │
│ Provider API │
│ │
└───────────────────────────────────────────────────────┘
Each wrapper satisfies the same LLMClient interface, so the agent loop doesn't know (or care) how many layers exist. You can add or remove layers by changing one line in main.py.
Everything is saved in a SQLite database (agent.db) with three tables:
┌─────────────────────────────────────────────────────────────┐
│ SQLite Database │
│ │
│ ┌─────────────────┐ │
│ │ conversations │ │
│ │─────────────────│ │
│ │ id (UUID) │──┐ │
│ │ title │ │ │
│ │ system_prompt │ │ one conversation │
│ │ preset │ │ has many messages │
│ │ enable_teams │ │ and many token usage records │
│ │ enable_tracing │ │ │
│ │ enable_approval │ │ │
│ │ enable_plan_mode│ │ │
│ │ created_at │ │ │
│ │ updated_at │ │ │
│ └─────────────────┘ │ │
│ │ │
│ ┌─────────────────┐ │ ┌─────────────────┐ │
│ │ messages │ │ │ token_usage │ │
│ │─────────────────│ │ │─────────────────│ │
│ │ id │ │ │ id │ │
│ │ conversation_id │◄─┤ │ conversation_id │◄──┘ │
│ │ role (user/ │ │ │ input_tokens │ │
│ │ assistant) │ │ │ output_tokens │ │
│ │ content (JSON) │ │ │ model │ │
│ │ created_at │ │ │ created_at │ │
│ └─────────────────┘ │ └─────────────────┘ │
│ │ │
└───────────────────────┴─────────────────────────────────────┘
This means you can close your browser, come back later, and your conversations are still there.
There are two web frontends, both built with plain HTML, CSS, and JavaScript — no React, no Vue, no build step. Just open them in a browser.
┌─────────────────────────────────────────────────────────────────┐
│ ┌──────────┐ ┌──────────────────────────────────────────────┐ │
│ │ SIDEBAR │ │ CHAT AREA │ │
│ │ │ │ │ │
│ │ + New │ │ ┌─────────────────────────────────┐ │ │
│ │ │ │ │ You: Create a Python script │ │ │
│ │ Chat 1 │ │ └─────────────────────────────────┘ │ │
│ │ Chat 2 ● │ │ │ │
│ │ Chat 3 │ │ ┌─────────────────────────────────┐ │ │
│ │ │ │ │ Agent: I'll create that for you │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ ▼ Tool: write_file │ │ │
│ │ │ │ │ {path: "script.py", ...} │ │ │
│ │ │ │ │ ✓ File written successfully │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ ▼ Tool: bash │ │ │
│ │ │ │ │ python script.py │ │ │
│ │ │ │ │ ✓ Output: Hello World! │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ Done! Created script.py │ │ │
│ │ │ │ └─────────────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ ┌─────────────────────────────────┐ │ │
│ │ │ │ │ 📄 Files: script.py (128 bytes) │ │ │
│ │ │ │ │ [Download] │ │ │
│ │ │ │ └─────────────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ ┌──────────────────────────┐ [Send] │ │
│ │ │ │ │ Type a message... │ │ │
│ │ │ │ └──────────────────────────┘ │ │
│ └──────────┘ └──────────────────────────────────────────────┘ │
│ │
│ Tokens: 1,234 in / 89 out ● Connected [Dev Panel] │
└─────────────────────────────────────────────────────────────────┘
| File | Purpose |
|---|---|
index.html |
Page structure, loads everything |
css/styles.css |
Dark theme styling |
js/app.js |
Main orchestrator — wires everything together |
js/state.js |
Simple state management (current conversation, messages, etc.) |
js/api.js |
HTTP REST client (create chats, list conversations, etc.) |
js/websocket.js |
WebSocket connection with auto-reconnect |
js/renderer.js |
Builds DOM elements (messages, tool blocks, file cards) |
js/markdown.js |
Renders markdown text with syntax highlighting |
js/devpanel.js |
Developer panel showing raw WebSocket traffic |
The User UI is a separate frontend designed for end users who want to interact with the agent without seeing developer-level detail. It connects to the same backend as the Developer UI.
┌─────────────────────────────────────────────────────────────────┐
│ Developer UI (agent-ui) │ User UI (agent-user-ui) │
│ │ │
│ ● Dark theme │ ● Forest Canopy light theme │
│ ● Expandable tool blocks │ ● Activity indicators │
│ with raw input/output │ ("Thinking...", "Writing") │
│ ● Dev panel (WebSocket │ ● Simplified approval │
│ frame inspector) │ dialogs │
│ ● Token usage display │ ● Clean, minimal layout │
│ ● Full technical detail │ ● User-friendly messaging │
│ │ │
│ For: developers, debugging │ For: end users, demos │
└─────────────────────────────────┴───────────────────────────────┘
| File | Purpose |
|---|---|
index.html |
Page structure, loads everything |
css/styles.css |
Forest Canopy light theme styling |
js/app.js |
Main orchestrator — wires everything together |
js/state.js |
Simple state management |
js/api.js |
HTTP REST client |
js/websocket.js |
WebSocket connection with auto-reconnect |
js/renderer.js |
Builds DOM elements (messages, activity indicators) |
js/markdown.js |
Renders markdown text with syntax highlighting |
js/config.js |
Configuration (API endpoint, ports) |
js/filepanel.js |
File browser panel |
Not everyone wants to open a browser. The Terminal CLI is a rich command-line interface that gives you the full agent experience right in your terminal. Same backend, same agent loop, same tools — just a different way to interact.
The binary is called openagent. You install it, type openagent, and you're in a conversation.
┌─────────────────────────────────────────────────────────┐
│ $ openagent │
│ │
│ OpenAgent v0.1.0 │
│ │
│ You: Create a hello world script │
│ │
│ ● Thinking... │
│ │
│ I'll create a simple Python script for you. │
│ │
│ ┌─ write_file: hello.py ────────────────────────┐ │
│ │ print("Hello, World!") │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ ┌─ bash ────────────────────────────────────────┐ │
│ │ $ python hello.py │ │
│ │ Hello, World! │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ Done! 1,234 in · 89 out · $0.02 · 3% ctx │
│ │
│ You: _ │
└─────────────────────────────────────────────────────────┘
Interactive REPL with prompt-toolkit. The input line uses Python's prompt_toolkit library, which gives you persistent history (your previous messages are saved across sessions), fish-style auto-suggestions (it grays out completions from your history as you type), and Esc+Enter for multiline input. There's even a vi mode toggle if you're that kind of person.
Slash commands. Type / to see what's available:
| Command | What it does |
|---|---|
/help |
Show available commands |
/clear |
Clear the screen |
/compact |
Manually trigger context compaction |
/model |
Switch the LLM model mid-conversation |
/history |
List saved sessions |
/resume |
Resume a previous session |
/cost |
Show cost breakdown for the current session |
/plan |
Enter plan mode (think before acting) |
/execute |
Exit plan mode and execute |
/teams |
Toggle multi-agent teams on/off |
/approval |
Toggle tool approval on/off |
/quit |
Exit the CLI |
Session persistence. Every conversation is automatically saved. You can come back later and pick up where you left off with openagent --resume (shows a list of recent sessions) or openagent --resume <ID> (jumps straight to a specific one).
Cost tracking. After every agent turn, the CLI shows a summary line with input tokens, output tokens, dollar cost for that turn, and a context window percentage (how full the context is). You always know what you're spending.
Done! 1,234 in · 89 out · $0.02 · 3% ctx
^ ^ ^ ^
| | | +-- context window usage
| | +-- cost for this turn
| +-- tokens the model generated
+-- tokens sent to the model
Pipe mode. For scripting and automation, you can pipe input directly:
echo "Explain this error: $(cat error.log)" | openagent --no-approvalIn pipe mode, the CLI reads from stdin, runs the agent without interactive prompts, prints the result, and exits. The --no-approval flag skips tool approval prompts so it can run unattended.
Interrupt with feedback. Press Ctrl+C while the agent is working and you get a choice: type new instructions to redirect the agent (like "stop, focus on the tests instead") or press Enter to just cancel. This is the same interrupt mechanism as the web UI, just adapted for the terminal.
Code block rendering. Code in the agent's responses is syntax-highlighted using rich.Syntax with the Monokai theme. It looks good even in a plain terminal.
Config file. Defaults live in ~/.openagent/config.toml — you can set your preferred model, approval mode, and other options so you don't have to pass flags every time.
Here's how every piece fits together:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ YOUR BROWSER │
│ │
│ agent-ui/ (Developer) agent-user-ui/ (User) │
│ ┌─────────┐ ┌───────────┐ ┌─────────┐ ┌───────────┐ │
│ │ app.js │ │renderer.js│ │ app.js │ │renderer.js│ │
│ │(wiring) │ │(DOM) │ │(wiring) │ │(activity │ │
│ └────┬────┘ └─────┬─────┘ └────┬────┘ │indicators)│ │
│ │ │ │ └─────┬─────┘ │
│ ┌────────────┐ ┌──────────┐ ┌────────────┐ ┌──────────┐ │
│ │websocket.js│ │ api.js │ │websocket.js│ │ api.js │ │
│ │(streaming) │ │(REST) │ │(streaming) │ │(REST) │ │
│ └─────┬──────┘ └────┬─────┘ └─────┬──────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └──────┬──────┘ └──────┬───────┘ │
│ └──────────────┬───────────────┘ │
│ │ │
└──────────────────────────────┼──────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ WebSocket │ REST/HTTP │
│ (streaming) │ (CRUD) │
│ │ │
┌─────────────┼───────────────┼───────────────┼───────────────────┐
│ ▼ ▼ │ │
│ ┌───────────────────────────────────────┐ │ BACKEND │
│ │ main.py │ │ (FastAPI) │
│ │ Creates everything at startup: │ │ │
│ │ - LLM adapter │ │ │
│ │ - Skill loader │ │ │
│ │ - Prompt loader │ │ │
│ │ - Database connection │ │ │
│ └───────────┬───────────────────────────┘ │ │
│ │ │ │
│ ┌─────────┴─────────┐ │ │
│ ▼ ▼ │ │
│ ┌───────────┐ ┌─────────────┐ │ │
│ │websocket │ │ routes │ │ │
│ │.py │ │ .py │ │ │
│ │ │ │ │ │ │
│ │ Manages │ │ REST API: │ │ │
│ │ real-time │ │ - CRUD │ │ │
│ │ streaming │ │ - tools │ │ │
│ │ sessions │ │ - skills │ │ │
│ └─────┬─────┘ │ - files │ │ │
│ │ └─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────┐ │ │
│ │ agent_loop() │ │ │
│ │ (loop.py) │ │ │
│ │ │ │ │
│ │ while not done: │ │ │
│ │ micro_compact() │ │ │
│ │ drain background notifs │ │ │
│ │ drain team inbox │ │ │
│ │ response = llm.stream(...) │───┐ │ │
│ │ if done: break │ │ │ │
│ │ execute tools in parallel │───┼──►│ ToolRegistry │
│ │ append results │ │ │ ┌──────────────┐ │
│ │ ── sentinel handling ── │ │ │ │ bash │ │
│ │ compact? → compress context │ │ │ │ file tools │ │
│ │ enter_plan? → switch mode │ │ │ │ think │ │
│ │ exit_plan? → emit plan_ready │ │ │ │ todo_* │ │
│ │ check budget & auto-compact │ │ │ │ task_* │ │
│ │ │ │ │ │ task │ │
│ └──────────────────────────────────┘ │ │ │ compact │ │
│ │ │ │ skills │ │
│ ┌──────────────────────────────────┐ │ │ │ background_* │ │
│ │ LLMClient (Protocol) │◄──┘ │ │ team tools │ │
│ │ │ │ │ plan_mode * │ │
│ │ ┌────────────────────────────┐ │ │ └──────────────┘ │
│ │ │ TracingLLMClient │ │ │ │
│ │ │ (optional per-session) │ │ │ │
│ │ ├────────────────────────────┤ │ │ └──────────────┘ │
│ │ │ RetryingLLMClient │ │ │ │
│ │ │ (auto-retry on errors) │ │ │ │
│ │ ├────────────────────────────┤ │ │ │
│ │ │ ProviderAdapter │ │ │ │
│ │ │ (translates to/from │ │ │ │
│ │ │ provider SDK format) │ │ │ │
│ │ └─────────────┬──────────────┘ │ │ │
│ └────────────────┼─────────────────┘ │ │
│ │ │ │
└───────────────────┼─────────────────────────┴───────────────────┘
│
▼
┌────────────────┐
│ Provider API │
│ (configured) │
│ │
│ The actual │
│ AI model │
└────────────────┘
Also in the backend:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SQLite DB │ │ SkillLoader │ │MemoryManager │
│ │ │ │ │ │
│ conversations│ │ Reads skill │ │ Reads/writes │
│ messages │ │ files from │ │ .agent/ │
│ token usage │ │ skills/ │ │ memory.md │
└──────────────┘ └──────────────┘ └──────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ TaskManager │ │ Background │ │ Teammate │
│ │ │ Manager │ │ Manager │
│ Persistent │ │ │ │ │
│ tasks with │ │ Fire-and- │ │ Spawns named │
│ dependencies │ │ forget async │ │ teammates │
│ (.tasks/) │ │ subprocesses │ │ with inbox │
└──────────────┘ └──────────────┘ └──────────────┘
| Term | Meaning |
|---|---|
| Agent | An AI that can take actions (not just chat) by using tools in a loop |
| Agent loop | The while-loop that repeatedly calls the LLM and executes tools until done |
| LLM | Large Language Model — the AI brain behind the agent |
| Context window | The maximum amount of text an LLM can "see" at once before older context must be compacted |
| Token | A unit of text (~4 characters or ~¾ of a word). LLMs count everything in tokens. |
| Tool | A function the AI can call (bash, file read/write, etc.) |
| Think tool | A no-op tool that gives the agent a space to reason step-by-step before acting and to self-verify work before finishing |
| Protocol | A Python interface — defines what methods a class must have, without specifying how |
| Adapter | A wrapper that translates between two different interfaces |
| Wrapper chain | Multiple transparent layers stacked on top of each other (retry → tracing → adapter) |
| WebSocket | A persistent connection between browser and server for real-time, two-way communication |
| REST API | Standard HTTP endpoints for request/response operations (create, read, update, delete) |
| Streaming | Sending data piece by piece as it's generated, instead of all at once |
| Sub-agent | A child agent spawned for a focused subtask, with its own isolated conversation |
| Teammate | A named agent running its own loop in parallel, communicating via an async message bus |
| Message bus | Per-agent async inbox (queue) for sending and receiving messages between agents |
| Compaction | Summarizing old messages to free up space in the context window |
| Background task | A long-running command (build, tests) run as a subprocess without blocking the agent |
| Task board | Persistent file-backed tasks with dependencies that survive context compaction |
| Preset | A system prompt template that defines the agent's personality and behavior |
| Skill | A knowledge document the agent can load on demand for specialized tasks |
| System prompt | Hidden instructions that define how the AI behaves (the user doesn't see this) |
| Plan mode | A read-only exploration phase where the agent designs a plan before making changes |
| Sentinel | A magic string returned by a tool that the loop detects and acts on (e.g. __ENTER_PLAN_MODE__) |
| Feature flags | Per-conversation toggles (enable_teams, enable_tracing, enable_plan_mode) set at chat creation time |
| FastAPI | A Python web framework for building APIs |
| SQLite | A lightweight database stored as a single file |
| MCP | Model Context Protocol — a standard for AI tools served by external processes, like plugins |
| Truncation auto-continuation | When the model hits its output limit mid-response, the loop automatically prompts it to keep going |
| REPL | Read-Eval-Print Loop — an interactive prompt that reads input, processes it, and prints the result |
| Pipe mode | Running the CLI non-interactively by piping input from another command (e.g. echo "..." | openagent) |