Multi-LLM orchestration plugin for Claude Code — coordinates Codex, Gemini, Perplexity, OpenRouter, Copilot, Qwen, and Ollama with consensus gates. Eight tentacles, zero blind spots.
🐙 Research, build, review, and ship — with eight AI providers checking each other's work. Say what you need, and the right workflow runs. A 75% consensus gate catches disagreements before they reach production. No single model's blind spots slip through.
🧠 Remembers across sessions. Integrates with claude-mem for persistent memory — past decisions, research, and context survive session boundaries.
⚡ Spec in, software out. Dark Factory mode takes a spec and autonomously runs the full pipeline — research, define, develop, deliver. You review the output, not every step.
🔄 Four-phase methodology, not just tools. Every task moves through Discover → Define → Develop → Deliver, with quality gates between phases. Other orchestrators give you infrastructure. Octopus gives you the workflows.
🐙 32 specialized personas (role-specific AI agents like security-auditor, backend-architect), 47 commands (slash commands you type), 50 skills (reusable workflow modules). Say "audit my API" and the right expert activates. Don't know the command? The smart router figures it out.
🐙 Works with just Claude. Scales to eight. Zero providers needed to start. Add them one at a time — each activates automatically when detected.
💰 Five providers cost nothing extra. Codex and Gemini use OAuth (included with subscriptions). Qwen has 1,000-2,000 free requests/day. Copilot uses your GitHub subscription. Ollama runs locally for free.
| Version | Best Features |
|---|---|
| v9 (current) | 8 providers (Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, Claude). Four-way AI debates. Smart router — just say what you need. 92% faster execution. Context-aware warnings. Session handoff across conversations. Adversarial review in every workflow. |
| v8 | Multi-LLM code review with inline PR comments. Parallel workstreams in isolated git worktrees. Reaction engine — auto-responds to CI failures. 32 specialized personas. Dark Factory autonomous pipeline. |
| v7 | Double Diamond workflow. Multi-provider dispatch. Quality gates and consensus scoring. Configurable sandbox modes. |
# Terminal (not inside a Claude Code session):
claude plugin marketplace add https://github.com/nyldn/claude-octopus.git
claude plugin install octo@nyldn-plugins
# Then inside Claude Code:
/octo:setupThat's it. Setup detects installed providers, shows what's missing, and walks you through configuration. You need zero external providers to start — Claude is built in.
Alternative install methods
From the Claude Code UI: Type /plugin in a session → Marketplace tab → install octo.
Factory AI (Droid):
droid plugin marketplace add https://github.com/nyldn/claude-octopus
droid plugin install octo@claude-octopusUpdate / Troubleshooting
# Update
claude plugin update octo
# Clean reinstall (if update fails)
claude plugin uninstall claude-octopus 2>/dev/null
claude plugin uninstall octo 2>/dev/null
rm -rf ~/.claude/plugins/cache/nyldn-plugins/claude-octopus
claude plugin marketplace add https://github.com/nyldn/claude-octopus.git
claude plugin install octo@nyldn-plugins🐙 Eight commands — one per arm. A real octopus has eight arms, each with its own neurons that can act independently. These eight tentacles work the same way: each orchestrates up to three AI providers, applies quality gates, and produces a deliverable.
/octo:embrace build stripe integration # Full lifecycle: research → define → develop → deliver
/octo:factory "build a CLI that converts CSV to JSON" # Autonomous pipeline — spec in, software out
/octo:debate monorepo vs microservices # Structured four-way AI debate with consensus
/octo:research htmx vs react in 2026 # Multi-source synthesis from three AI providers
/octo:design mobile checkout redesign # UI/UX design with BM25 style intelligence
/octo:tdd create user auth # Red-green-refactor with test discipline
/octo:security # OWASP vulnerability scan + remediation
/octo:prd mobile checkout redesign # AI-optimized PRD with 100-point scoringPlus 30 more: review, debug, extract, deck, docs, schedule, parallel, sentinel, brainstorm, claw, doctor, and the full set.
Don't remember the command name? Just describe what you need:
/octo:auto research microservices patterns -> routes to discover phase
/octo:auto build user authentication -> routes to develop phase
/octo:auto compare Redis vs DynamoDB -> routes to debate
The smart router parses your intent and selects the right workflow.
Not sure which command to use? Pick by goal:
| I want to... | Use |
|---|---|
| Research a topic thoroughly | /octo:research or /octo:discover |
| Debate two approaches | /octo:debate |
| Build a feature end-to-end | /octo:embrace |
| Design a UI or style system | /octo:design |
| Review existing code | /octo:review |
| Write tests first, then code | /octo:tdd |
| Scan for vulnerabilities | /octo:security |
| Write a product spec | /octo:prd |
| Go from spec to shipping code | /octo:factory |
| Debug a tricky issue | /octo:debug |
| Just run something quick | /octo:quick |
Or skip the table — type /octo:auto <what you want> or just say octo <what you want>, and the smart router picks for you. 🔍
Claude Octopus coordinates up to eight AI providers — one per tentacle:
| Provider | Role |
|---|---|
| 🔴 Codex (OpenAI) | Implementation depth — code patterns, technical analysis, architecture |
| 🟡 Gemini (Google) | Ecosystem breadth — alternatives, security review, research synthesis |
| 🟣 Perplexity | Live web search — CVE lookups, dependency research, current docs |
| 🌐 OpenRouter | Alternative model routing — access 100+ models via single API |
| 🟢 Copilot (GitHub) | Zero-cost research — uses existing GitHub Copilot subscription |
| 🟤 Qwen (Alibaba) | Free-tier research — 1,000-2,000 requests/day via Qwen OAuth |
| ⚫ Ollama (Local) | Zero-cost local LLM — offline, privacy-sensitive, fallback |
| 🔵 Claude (Anthropic) | Orchestration — quality gates, consensus building, final synthesis |
Providers run in parallel for research, sequentially for problem scoping, and adversarially for review. A 75% consensus quality gate prevents questionable work from shipping. Only Claude is required — all others are optional and auto-detected.
Four structured phases adapted from the UK Design Council's methodology:
| Phase | Command | What happens |
|---|---|---|
| Discover | /octo:discover |
Multi-AI research and broad exploration |
| Define | /octo:define |
Requirements clarification with consensus |
| Develop | /octo:develop |
Implementation with quality gates |
| Deliver | /octo:deliver |
Adversarial review and go/no-go scoring |
Run phases individually or all four with /octo:embrace. Configure autonomy: supervised (approve each phase), semi-autonomous (intervene on failures), or autonomous (run all four).
Specialized agents that activate automatically based on your request. When you say "audit my API for vulnerabilities," security-auditor activates. When you say "design a dashboard," ui-ux-designer takes over.
Categories: Software Engineering (11), Specialized Development (6), Documentation & Communication (5), Research & Strategy (3), Business & Compliance (3), Creative & Design (4).
Full persona reference | All 50 skills
When agents create PRs, the reaction engine monitors what happens next — CI failures, review comments, stale agents — and responds automatically. No new commands to learn. It fires transparently inside workflows you already use:
| Integration Point | When It Fires |
|---|---|
/octo:parallel |
Between poll cycles while monitoring work packages |
/octo:sentinel |
After triage scan completes |
agent-registry.sh health --react |
On-demand health check |
What it auto-handles:
| Event | Reaction | Limits |
|---|---|---|
| CI failure | Collects failure logs into agent inbox | 3 retries, escalates after 30m |
| Changes requested | Collects review comments into agent inbox | 2 retries, escalates after 60m |
| Agent stuck | Escalates to human | After 15m with no progress |
| PR approved + CI green | Notifies you it's ready to merge | — |
| PR merged | Marks agent complete | — |
Override defaults per project by creating .octo/reactions.conf:
# EVENT|ACTION|MAX_RETRIES|ESCALATE_AFTER_MIN|ENABLED
ci_failed|forward_logs|5|45|true
changes_requested|forward_comments|3|90|true
stuck|escalate|0|10|true
Reactions track 13 agent lifecycle states: running → pr_open → ci_pending → ci_failed / review_pending → changes_requested / approved → mergeable → merged → done.
| Method | Codex | Gemini | Claude |
|---|---|---|---|
| OAuth (recommended) | codex login — included in ChatGPT subscription |
Google account — included in AI subscription | Built into Claude Code |
| API key | OPENAI_API_KEY — per-token billing |
GEMINI_API_KEY — per-token billing |
Built into Claude Code |
OAuth users pay nothing beyond their existing subscriptions.
Everything except multi-AI features. You get all 32 personas, structured workflows, smart routing, context detection, and every skill. Multi-AI orchestration (parallel analysis, debate, consensus) activates when external providers are configured.
Namespace isolation — Only /octo:* commands and octo natural language prefix activate the plugin. Your existing Claude Code setup is untouched.
Data locations — Results in ~/.claude-octopus/results/, logs in ~/.claude-octopus/logs/, project state in .octo/. Nothing hidden.
Telemetry — Anonymous usage analytics help us improve the plugin. We collect session counts, workflow types used, and error rates — never prompts, file paths, or personal data. Disable anytime with OCTOPUS_TELEMETRY_OPT_OUT=1.
Provider transparency — Every command shows a 🐙 activation indicator on launch. Colored dots (🔴 🟡 🟣 🔵) show exactly which providers are running and when external APIs are called. You always know what's happening.
Clean uninstall — Run claude plugin uninstall octo from your terminal. If you see a scope error, add --scope project. No residual config changes.
Claude Octopus ships with a compatibility layer for OpenClaw, the open-source AI assistant framework. This lets you expose Octopus workflows to messaging platforms (Telegram, Discord, Signal, WhatsApp) without modifying the Claude Code plugin.
Claude Code Plugin (unchanged)
└── .mcp.json ─── MCP Server ─── orchestrate.sh
↑
OpenClaw Extension ─────────────────────┘
Three components, zero changes to the core plugin:
| Component | Location | Purpose |
|---|---|---|
| MCP Server | mcp-server/ |
Exposes 10 Octopus tools via Model Context Protocol |
| OpenClaw Extension | openclaw/ |
Wraps workflows for OpenClaw's extension API |
| Skill Schema | mcp-server/src/schema/skill-schema.json |
Universal skill metadata format |
The MCP server auto-starts when the plugin is enabled (via .mcp.json). It exposes:
octopus_discover,octopus_define,octopus_develop,octopus_deliver— Individual phasesoctopus_embrace— Full Double Diamond workflowoctopus_debate,octopus_review,octopus_security— Specialized workflowsoctopus_list_skills,octopus_status— Introspection
Any MCP-compatible client can connect to the server.
Install in an OpenClaw instance from git:
npm install github:nyldn/claude-octopus#main --prefix openclawOr clone and link locally:
cd openclaw && npm install && npm run buildThe extension registers as an OpenClaw plugin with configurable workflows, autonomy modes, and Claude Code path resolution.
./scripts/build-openclaw.sh # Regenerate skill registry from frontmatter
./scripts/build-openclaw.sh --check # CI mode — exits non-zero if out of sync
./tests/validate-openclaw.sh # 13-check validation suiteDo I need all three AI providers? No. One external provider plus Claude gives you multi-AI features. No external providers still gives you personas, workflows, and skills.
Will this break my existing Claude Code setup?
No. Activates only with the octo prefix. Results stored separately. Uninstalls cleanly.
What happens if a provider times out? The workflow continues with available providers. You'll see the status in the visual indicators.
Why "octopus"? 🐙 Fun fact: a real octopus has three hearts, blue blood, and 500 million neurons — two-thirds of which live in its eight arms. Each arm can taste, touch, and act independently. Claude Octopus works the same way: each tentacle (command) operates autonomously with its own squeeze of logic, then ink flows back as the final deliverable. The crossfire review? That's the squeeze — adversarial pressure that untangles everything before it ships.
- Documentation Guide — Start here
- Command Reference — Commands, triggers, and provider indicators
- Feature Gap Analysis — CC feature adoption tracker
- Architecture — Provider flow and execution model
- Plugin Architecture — Internal plugin structure
- Agents & Personas — All 32 personas
- CLI Reference — Direct CLI usage, debug mode, async, and tmux
- Changelog
- wolverin0/claude-skills — AI Debate Hub. MIT License.
- obra/superpowers — Discipline skills patterns. MIT License.
- nextlevelbuilder/ui-ux-pro-max-skill — BM25 design intelligence databases. MIT License.
- UK Design Council — Double Diamond methodology.
Join r/ClaudeOctopus for help, workflow tips, showcases, and updates.
- Report issues
- Submit PRs following existing code style
git clone https://github.com/nyldn/claude-octopus.git && make test
See CONTRIBUTING.md for details.
MIT — see LICENSE
nyldn | MIT License | r/ClaudeOctopus | Report Issues
