Summary
AgentThreatBench is the first benchmark that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable evaluation tasks. It was merged into UKGovernmentBEIS/inspect_evals — the UK AI Safety Institute's official eval suite.
Why it's relevant to NeMo Guardrails
NeMo Guardrails is designed to prevent exactly the attacks that AgentThreatBench measures:
| AgentThreatBench Task |
Attack |
NeMo Guardrails Relevance |
| Memory Poison (ASI06) |
Adversarial entries in RAG/memory |
fact_checking, output rails |
| Autonomy Hijack (ASI01) |
Indirect injection in tool output |
input rails, dialog rails |
| Data Exfiltration (ASI01) |
PII leak via tool call |
output rails, sensitive data |
Proposal
Reference AgentThreatBench in NeMo Guardrails documentation as a benchmark for measuring how well guardrail configurations defend against OWASP agentic threats. This would help users validate their guardrail setups against a standardized, OWASP-aligned test suite.
Benchmark docs: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
Source: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench
Summary
AgentThreatBench is the first benchmark that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable evaluation tasks. It was merged into UKGovernmentBEIS/inspect_evals — the UK AI Safety Institute's official eval suite.
Why it's relevant to NeMo Guardrails
NeMo Guardrails is designed to prevent exactly the attacks that AgentThreatBench measures:
fact_checking,output railsinput rails,dialog railsoutput rails,sensitive dataProposal
Reference AgentThreatBench in NeMo Guardrails documentation as a benchmark for measuring how well guardrail configurations defend against OWASP agentic threats. This would help users validate their guardrail setups against a standardized, OWASP-aligned test suite.
Benchmark docs: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
Source: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench