Inspired by Veritasium's video on Game Theory.
Run iterated Prisoner's Dilemma games between LLMs with different personalities (good, neutral, evil) to understand cooperation, defection, and emergent strategies in AI.
pip install -r requirements.txtThis installs:
- anthropic - Claude API client
- openai - GPT API client
- google-genai - Gemini API client (new SDK)
- python-dotenv - Environment variable loader
Copy the example environment file:
cp .env.example .envThen edit .env and add your API keys:
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-gpt-key-here
GOOGLE_API_KEY=your-google-api-key-here
DEEPSEEK_API_KEY=your-deepseek-api-key-here
Get your keys from:
- Anthropic (Claude): https://console.anthropic.com
- OpenAI (GPT): https://platform.openai.com
- Google (Gemini): https://makersuite.google.com
- DeepSeek (Reasoner): https://platform.deepseek.com
Iterated Prisoner's Dilemma (20 rounds)
Each player chooses either Cooperate or Defect simultaneously:
| Outcome | A Score | B Score |
|---|---|---|
| Both Cooperate | 3 | 3 |
| A Defects, B Cooperates | 5 | 0 |
| A Cooperates, B Defects | 0 | 5 |
| Both Defect | 1 | 1 |
- Each model plays 20 rounds against itself
- No personality override
- Tests: Do models cooperate naturally?
python src/experiment1_baseline.py --models claude-opus gemini-flash --rounds 20 --run 1- Good: Maximize SUM of both players' scores
- Neutral: Maximize only YOUR score
- Evil: Maximize YOUR score MINUS opponent's score
- All 9 personality combinations (3×3)
python src/experiment2_personalities.py --models claude-opus gemini-flash --rounds 20 --run 1- Claude vs Gemini
- Each model plays multiple rounds
- Tests: Model compatibility and mutual cooperation
python src/experiment5_cross_model.py --models claude-opus gemini-flash --rounds 20 --run 1Capture model decision-making process:
python src/experiment1_baseline.py --models claude-opus --rounds 20 --include-reasoning# Experiment 1 only
python src/experiment1_baseline.py --models claude-opus gemini-flash --rounds 20 --run 1
# With reasoning capture and visualization
python src/experiment1_baseline.py --models claude-opus --rounds 20 --include-reasoning
python visualize_only.py