Skip to content

killvxk/AgenticRed

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

AgenticRed

AgenticRed is an automated pipeline that leverages LLMs’ in-context learning to iteratively design and refine red-teaming systems without human intervention, based on the performance metric of each system.


Repository Structure

.
├── server/           # Scripts to start attacker, classifier, and defender model servers
├── search.sh         # Main script to run the search process
├── eval.sh           # Main script to run the evaluation process
└── README.md

Prerequisites

  • If you want to host your model locally, you need at least 3 GPUs (e.g., 3× L40) or equivalent capacity
    • One for the attacker model server
    • One for the classifier / judge model server
    • One for the defender / target model server
  • If you want to call API endpoints instead, we provide OpenRouter/OpenAI client API.

1. Start Model Servers

You can run local servers (recommended for experiments) or point to external APIs.

1.1 Local servers

Inside server/, you should have scripts or configs such as:

cd server/

# Example (adapt to actual script names)
bash start_attacker.sh          # Starts attacker model server
bash start_classifier.sh        # Starts classifier / guardrail model server
bash start_defender.sh          # Starts defender / target model server

Document the actual ports and endpoints so the rest of the pipeline can reference them. Eg. http://127.0.0.1/8000/v1

1.2 Using external APIs (optional)

Instead of local servers, you can configure the system to call:

  • OpenAI APIs
  • OpenRouter APIs

Update your API keys:

export OPENAI_API_KEY=''
export GEMINI_API_KEY=''
export DEEPSEEK_API_KEY=''
export OPENROUTER_API_KEY=''

2. Run the Search Process

The search step iteratively designs new red-teaming systems.

Typical usage:

cd _redteam
bash scripts/search.sh \
    --expr EXP_INDEX \
    --seed SEED \

e.g. bash search.sh --expr 1 --seed 42

Key arguments (adapt to your implementation):

  • --experiment-index: Experiment index corresponding to different configurations.
  • --seed: Random seed.
  • --config: Path to a config file specifying:
    • Attacker, classifier, defender endpoints
    • Search hyperparameters (iterations, beam width, budgets, etc.)
    • Output directory for JSON logs

3. Run the Evaluation Process

After search completes, evaluate the discovered attacks on a separate evaluation dataset and target model / benchmark.

cd _redteam
bash scripts/eval.sh

Example Workflow

# 1. Start servers if you are hosting local servers
bash server/attacker_model_server.sh
bash server/defender_model_server.sh
bash server/classify_model_server.sh

# 2. Run search
bash _redteam/search.sh --expr 1 --seed 42

# 3. Run evaluation
bash _redteam/eval.sh

Citation

@misc{yuan2026agenticredoptimizingagenticsystems,
      title={AgenticRed: Optimizing Agentic Systems for Automated Red-teaming}, 
      author={Jiayi Yuan and Jonathan Nöther and Natasha Jaques and Goran Radanović},
      year={2026},
      eprint={2601.13518},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.13518}, 
}

About

An automated pipeline that leverages LLM's meta-learning capability to iteratively design and refine red-teaming systems without human intervention.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.1%
  • Shell 3.9%