Game Theory Experiments on LLMs: Prisoner's Dilemma

Inspired by Veritasium's video on Game Theory.

Run iterated Prisoner's Dilemma games between LLMs with different personalities (good, neutral, evil) to understand cooperation, defection, and emergent strategies in AI.

Setup

1. Install Dependencies

pip install -r requirements.txt

This installs:

anthropic - Claude API client
openai - GPT API client
google-genai - Gemini API client (new SDK)
python-dotenv - Environment variable loader

2. Configure API Keys

Copy the example environment file:

cp .env.example .env

Then edit .env and add your API keys:

ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-gpt-key-here
GOOGLE_API_KEY=your-google-api-key-here
DEEPSEEK_API_KEY=your-deepseek-api-key-here

Get your keys from:

Anthropic (Claude): https://console.anthropic.com
OpenAI (GPT): https://platform.openai.com
Google (Gemini): https://makersuite.google.com
DeepSeek (Reasoner): https://platform.deepseek.com

Methodology

Game Setup

Iterated Prisoner's Dilemma (20 rounds)

Each player chooses either Cooperate or Defect simultaneously:

Outcome	A Score	B Score
Both Cooperate	3	3
A Defects, B Cooperates	5	0
A Cooperates, B Defects	0	5
Both Defect	1	1

Experiments

Experiment 1: Baseline (Self-Play)

Each model plays 20 rounds against itself
No personality override
Tests: Do models cooperate naturally?

python src/experiment1_baseline.py --models claude-opus gemini-flash --rounds 20 --run 1

Experiment 2: Personality Testing (Good/Neutral/Evil)

Good: Maximize SUM of both players' scores
Neutral: Maximize only YOUR score
Evil: Maximize YOUR score MINUS opponent's score
All 9 personality combinations (3×3)

python src/experiment2_personalities.py --models claude-opus gemini-flash --rounds 20 --run 1

Experiment 5: Cross-Model Matchups

Claude vs Gemini
Each model plays multiple rounds
Tests: Model compatibility and mutual cooperation

python src/experiment5_cross_model.py --models claude-opus gemini-flash --rounds 20 --run 1

Include Reasoning

Capture model decision-making process:

python src/experiment1_baseline.py --models claude-opus --rounds 20 --include-reasoning

Running Experiments

Quick Start

# Experiment 1 only
python src/experiment1_baseline.py --models claude-opus gemini-flash --rounds 20 --run 1

# With reasoning capture and visualization
python src/experiment1_baseline.py --models claude-opus --rounds 20 --include-reasoning
python visualize_only.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
charts		charts
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
analysis.md		analysis.md
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock
verify.py		verify.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Game Theory Experiments on LLMs: Prisoner's Dilemma

Setup

1. Install Dependencies

2. Configure API Keys

Methodology

Game Setup

Experiments

Experiment 1: Baseline (Self-Play)

Experiment 2: Personality Testing (Good/Neutral/Evil)

Experiment 5: Cross-Model Matchups

Include Reasoning

Running Experiments

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Game Theory Experiments on LLMs: Prisoner's Dilemma

Setup

1. Install Dependencies

2. Configure API Keys

Methodology

Game Setup

Experiments

Experiment 1: Baseline (Self-Play)

Experiment 2: Personality Testing (Good/Neutral/Evil)

Experiment 5: Cross-Model Matchups

Include Reasoning

Running Experiments

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages