Feature request: Reference AgentThreatBench as a benchmark for validating guardrail effectiveness

## Summary

[AgentThreatBench](https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/) is the first benchmark that operationalizes the **OWASP Top 10 for Agentic Applications (2026)** into executable evaluation tasks. It was merged into [UKGovernmentBEIS/inspect_evals](https://github.com/UKGovernmentBEIS/inspect_evals/pull/1037) — the UK AI Safety Institute's official eval suite.

## Why it's relevant to NeMo Guardrails

NeMo Guardrails is designed to prevent exactly the attacks that AgentThreatBench measures:

| AgentThreatBench Task | Attack | NeMo Guardrails Relevance |
|---|---|---|
| Memory Poison (ASI06) | Adversarial entries in RAG/memory | `fact_checking`, `output rails` |
| Autonomy Hijack (ASI01) | Indirect injection in tool output | `input rails`, `dialog rails` |
| Data Exfiltration (ASI01) | PII leak via tool call | `output rails`, `sensitive data` |

## Proposal

Reference AgentThreatBench in NeMo Guardrails documentation as a benchmark for measuring how well guardrail configurations defend against OWASP agentic threats. This would help users validate their guardrail setups against a standardized, OWASP-aligned test suite.

**Benchmark docs:** https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
**Source:** https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Reference AgentThreatBench as a benchmark for validating guardrail effectiveness #1907

Summary

Why it's relevant to NeMo Guardrails

Proposal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AgentThreatBench Task	Attack	NeMo Guardrails Relevance
Memory Poison (ASI06)	Adversarial entries in RAG/memory	`fact_checking`, `output rails`
Autonomy Hijack (ASI01)	Indirect injection in tool output	`input rails`, `dialog rails`
Data Exfiltration (ASI01)	PII leak via tool call	`output rails`, `sensitive data`

Feature request: Reference AgentThreatBench as a benchmark for validating guardrail effectiveness #1907

Description

Summary

Why it's relevant to NeMo Guardrails

Proposal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions