By Eric Hartford, Lazarus AI
Inspired by Anthropic's Glasswing.
The challenge: Produce similar results as Glasswing - using models everyone has access to.
Autonomous vulnerability scanner and source-code hunter built on LangGraph.
Clearwing is a dual-mode offensive-security tool:
- Network-pentest agent — a ReAct-loop agent with 63 bind-tools that scans live targets, detects services and vulnerabilities, runs sandboxed Kali tools, attempts exploits (gated through a human-approval guardrail), and writes reports to a persistent knowledge graph.
- Source-code hunter — a file-parallel agent-driven
pipeline that ranks source files, fans out per-file hunter agents
(full-shell or constrained), uses ASan/UBSan crashes as ground
truth, verifies findings with a 4-axis validator (REAL /
TRIGGERABLE / IMPACTFUL / GENERAL), runs PoC stability checks
across fresh containers, optionally generates validated patches,
and emits SARIF/markdown/JSON reports with explicit evidence levels
(
suspicion → static_corroboration → crash_reproduced → root_cause_explained → exploit_demonstrated → patch_validated). Features three-band budget promotion, entry-point sharding for large files, cross-subsystem hunting, a shared findings pool with root-cause deduplication, multi-turn agentic exploit development, and human-in-the-loop exploit elaboration. - N-day exploit pipeline — given CVE IDs, builds the vulnerable version, develops working exploits, and validates against the patched version to confirm the fix.
- Reverse engineering pipeline — decompiles closed-source ELF binaries via Ghidra, reconstructs plausible source with an LLM, then hunts vulnerabilities using a hybrid source + binary validation approach.
- Campaign orchestration — runs sourcehunt across dozens or hundreds of repositories from a single YAML config with shared budget, checkpoint/resume, and aggregate reporting.
- Responsible disclosure — human-in-the-loop validation workflow with MITRE/HackerOne template generation, SHA-3 cryptographic commitments for provable priority, timeline tracking, and batched disclosure.
- Benchmarking & evaluation — OSS-Fuzz crash severity ladder for model comparison, and an A/B testing framework for measuring whether preprocessing helps or hurts finding quality.
Authorized use only. Clearwing is a dual-use offensive-security
tool. Run it only against targets you own or have explicit written
authorization to test. Operators are responsible for scope, legal
authorization, and disclosure. See SECURITY.md.
End users — install the tagged release straight from GitHub:
git clone https://github.com/Lazarus-AI/clearwing.git
cd clearwing
# uv sync is recommended because Clearwing pins genai-pyo3 through
# tool.uv.sources in pyproject.toml.
uv sync --all-extras
source .venv/bin/activate # fish: source .venv/bin/activate.fish
# Interactive setup wizard — menu-driven provider selection,
# credential entry, optional live test, persists to ~/.clearwing/config.yaml
clearwing setup
# Environment check — verifies Python, credentials, Docker daemon,
# external tools, optional extras, and network reachability
clearwing doctor
clearwing --version # 1.0.0
clearwing --helpOr skip the wizard and configure directly:
# Anthropic direct
export ANTHROPIC_API_KEY=sk-ant-...
# Or any OpenAI-compatible endpoint — OpenRouter, Ollama, LM Studio,
# vLLM, Together, Groq, DeepSeek, OpenAI:
export CLEARWING_BASE_URL=https://openrouter.ai/api/v1
export CLEARWING_API_KEY=sk-or-...
export CLEARWING_MODEL=anthropic/claude-opus-4See docs/providers.md for provider-specific
recipes and per-task routing.
Developers — clone and install the locked development environment:
git clone https://github.com/Lazarus-AI/clearwing.git
cd clearwing
uv sync --all-extras
source .venv/bin/activate # fish: source .venv/bin/activate.fish
clearwing --helpRequirements: Python 3.10+, a recent Rust toolchain for the native
genai-pyo3 bridge, and optionally Docker for the Kali container and
sanitizer-image sandbox features. If the install fails with a Rust version
error, run rustup update stable.
# Network scan a single target
clearwing scan 192.168.1.10 -p 22,80,443 --detect-services
# Source-code hunt a repo (standard depth — sandboxed LLM hunters,
# adversarial verifier, mechanism memory, variant loop)
clearwing sourcehunt https://github.com/example/project \
--depth standard
# N-day exploit pipeline — build and exploit known CVEs
clearwing sourcehunt https://github.com/example/project \
--nday --cve-list CVE-2024-1234,CVE-2024-5678
# Reverse engineering — hunt vulnerabilities in closed-source binaries
clearwing sourcehunt /path/to/binary --reveng --arch x86_64
# Campaign-scale orchestration across multiple projects
clearwing campaign run campaign.yaml
# Responsible disclosure workflow
clearwing disclose queue ./sourcehunt-results/sh-*/
clearwing disclose review
# OSS-Fuzz crash severity benchmark
clearwing bench ossfuzz --corpus-dir ./oss-fuzz-projects --mode standard
# A/B test whether preprocessing helps or hurts
clearwing eval preprocessing --project https://github.com/example/project \
--configs glasswing_minimal,sourcehunt_full --runs 3
# Interactive ReAct chat with the full tool set
clearwing interactive
# Non-interactive CI mode with SARIF output for GitHub Code Scanning
clearwing ci --config .clearwing.ci.yaml --sarif results.sarifSee docs/quickstart.md for a fuller walkthrough
including credentials, session resume, and mission-mode operation.
The clearwing sourcehunt <url> CLI clones a remote URL. To hunt an
already-cloned tree (e.g. FFmpeg) with the native-async pipeline and a
self-hosted OpenAI-compatible backend, drive SourceHuntRunner directly:
# 1. Clone the target once
git clone https://github.com/FFmpeg/FFmpeg.git
# 2. Run sourcehunt against the local checkout
uv run python -u - <<'PY'
import logging
from clearwing.llm.native import AsyncLLMClient
from clearwing.sourcehunt.runner import SourceHuntRunner
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s: %(message)s')
REPO = './FFmpeg'
RUN_DIR = './sourcehunt-results-ffmpeg'
COMMON = dict(
provider_name='openai_resp', # or 'openai' for /v1/chat/completions
api_key='YOUR_KEY',
base_url='http://localhost:8183/v1', # any OpenAI-compatible endpoint
max_concurrency=15,
)
# One client per stage — routes each stage to a different model
ranker_llm = AsyncLLMClient(model_name='gpt-5.4-mini', **COMMON)
hunter_llm = AsyncLLMClient(model_name='gpt-5.4', **COMMON)
verifier_llm = AsyncLLMClient(model_name='gpt-5.4-mini', **COMMON)
exploiter_llm = AsyncLLMClient(model_name='gpt-5.3-codex', **COMMON)
runner = SourceHuntRunner(
repo_url=REPO, local_path=REPO,
depth='standard',
budget_usd=1000.0,
max_parallel=15,
output_dir=RUN_DIR,
output_formats=['json', 'markdown'],
ranker_llm=ranker_llm,
hunter_llm=hunter_llm,
verifier_llm=verifier_llm,
exploiter_llm=exploiter_llm,
enable_patch_oracle=True,
)
print(runner.run()) # sync wrapper; internally drives SourceHuntRunner.arun()
PYFindings land in ./sourcehunt-results-ffmpeg/sh-<session-id>/ as JSON +
markdown once the run completes. FFmpeg is ~10k source files, so expect the
large-repo ranker to preselect candidates and the tier-A hunter pool to run for hours.
Redirect stdout/stderr to a file if you plan to detach the process — the
runner's own artifacts are only written at the end.
AsyncLLMClient accepts provider_name values openai_resp (the streaming
/v1/responses shape) or openai (standard /v1/chat/completions); point
base_url at any OpenAI-compatible server. See
docs/providers.md for the managed-provider paths.
┌──────────────────────┐ ┌────────────────────────────────┐
│ Network-pentest agent│ │ Source-code hunter │
│ clearwing.agent.graph│ │ clearwing.sourcehunt.runner │
│ (63 tools, ReAct) │ │ │
│ │ │ preprocess → rank → pool → │
│ │ │ hunter → verify → exploit → │
│ │ │ variant loop → auto-patch → │
│ │ │ report │
└─────────┬────────────┘ └────────┬───────────────────────┘
│ │
└───────────┬─────────────────┘
▼
┌───────────────────────────────────────────────────────────────┐
│ N-day pipeline │ Reveng pipeline │ Campaign orchestrator │
│ Disclosure workflow + SHA-3 commitments │
├───────────────────────────────────────────────────────────────┤
│ Shared substrate │
│ Finding dataclass │ capabilities probe │ sandbox layer │
│ knowledge graph │ episodic memory │ event bus │
│ telemetry │ guardrails + audit │ CVSS scoring │
│ artifact store │ behavior monitor │ seccomp │
├───────────────────────────────────────────────────────────────┤
│ Bench: OSS-Fuzz severity ladder │ Eval: preprocessing A/B │
└───────────────────────────────────────────────────────────────┘
Deep dives live in docs/:
| Doc | What it covers |
|---|---|
docs/index.md |
Landing page + table of contents |
docs/quickstart.md |
Full install + first run walkthrough |
docs/providers.md |
OpenRouter / Ollama / LM Studio / vLLM / Together / Groq recipes, per-task routing, env-var precedence |
docs/architecture.md |
Both pipelines, substrate, capability gating, tool layout |
docs/cli.md |
Every subcommand flag, grouped by workflow |
docs/api.md |
API reference (mkdocstrings autogen) |
Once the GitHub Pages workflow ships, docs will be hosted at https://lazarus-ai.github.io/clearwing/.
uv sync --all-extras
source .venv/bin/activate # fish: source .venv/bin/activate.fish
pytest -q
ruff check clearwing/ tests/
ruff format --check clearwing/ tests/
mypy --follow-imports=silent \
clearwing/findings \
clearwing/sourcehunt \
clearwing/capabilities.py \
clearwing/agent/tools \
clearwing/core
python -m mkdocs serve --dev-addr 127.0.0.1:8000See CONTRIBUTING.md for the full dev-setup guide and PR
checklist.
There are two lanes, and they go to different places:
- Vulnerabilities in Clearwing → GitHub Security Advisories
(https://github.com/Lazarus-AI/clearwing/security/advisories/new).
See
SECURITY.mdfor scope, SLA, and safe-harbor. - Vulnerabilities Clearwing finds in someone else's software →
that vendor's disclosure channel.
clearwing sourcehunt --export-disclosuresgenerates pre-filled MITRE CVE-request and HackerOne templates for every finding atevidence_level >= root_cause_explained.
MIT. See LICENSE.