Voice Referee

AI-Powered Real-Time Voice Mediation for Co-Founder Disputes

Voice Referee is an AI mediator that joins voice calls between startup co-founders to help de-escalate conflicts and facilitate productive conversations. Built on professional mediation principles from the Harvard Negotiation Project's "Getting to Yes" framework, it listens to conversations in real-time, detects tension, and intervenes with contextually appropriate guidance.

Why This Exists

65% of startup failures stem from co-founder conflict, making this the single largest risk after product-market fit. Yet most founders lack access to professional mediation when disputes arise.

Voice Referee provides:

Real-time tension detection using speech analysis
Neutral, facilitative interventions that follow professional mediation frameworks
Speaker diarization to track who said what
Contextual responses generated by Claude that validate emotions and reframe accusations into needs

Who It's For

Startup accelerators wanting to support portfolio companies through co-founder conflicts
Co-working spaces and founder communities offering mediation resources
Founders themselves who need help having difficult conversations
Coaches and advisors looking for AI-augmented mediation tools

How It Works

┌─────────────────────────────────────────────────────────────┐
│                    Daily.co Room                            │
│  ┌─────────┐  ┌─────────┐  ┌─────────────────────────────┐ │
│  │Founder A│  │Founder B│  │    AI Mediator Agent        │ │
│  └────┬────┘  └────┬────┘  └──────────────┬──────────────┘ │
│       └────────────┴─────────────────────┬┘                 │
│                  Audio Streams           │                  │
└──────────────────────────────────────────┼──────────────────┘
                                           │
┌──────────────────────────────────────────▼──────────────────┐
│                 Pipecat Pipeline                            │
│  ┌───────────────┐  ┌────────────────┐  ┌───────────────┐  │
│  │ Silero VAD    │→ │ Deepgram STT   │→ │ Referee       │  │
│  │ (activity)    │  │ (diarization)  │  │ Monitor       │  │
│  └───────────────┘  └────────────────┘  └───────────────┘  │
│                              │                              │
│                              ▼                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │        Claude (Anthropic) - AI Mediator              │  │
│  └──────────────────────────────────────────────────────┘  │
│                              │                              │
│                              ▼                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │        ElevenLabs TTS - Natural Speech Output        │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Pipeline Flow

Daily.co Transport - WebRTC audio streaming from participants
Silero VAD - Voice Activity Detection to identify speech
Deepgram STT - Speech-to-text with speaker diarization (identifies who is speaking)
Referee Monitor - Analyzes conversation state, detects tension, decides when to intervene
Claude LLM - Generates contextual mediation responses
ElevenLabs TTS - Converts responses to natural speech
Daily.co Output - Streams audio back to participants

Intervention Triggers

The system monitors for:

High tension scores (> 0.7) based on sentiment analysis, interruption rate, and speaker imbalance
Circular arguments (same points repeated 3+ times)
Speaker dominance (one person talking > 80% of the time for > 5 minutes)

When triggered, it generates interventions like:

"Let's pause a second."
"What did you hear [other name] say?"
"What's the smallest step you'd both agree on?"

Quick Start

Prerequisites

Python 3.10+
API keys for:
- Daily.co - WebRTC infrastructure
- Deepgram - Speech-to-text with diarization
- Anthropic - Claude for AI responses
- ElevenLabs - Text-to-speech

Installation

# Clone the repository
git clone https://github.com/your-org/voice-referee.git
cd voice-referee

# Create and activate virtual environment
cd voice_referee
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

# Copy the example environment file
cp .env.example .env

Edit .env with your API keys:

# Daily.co Configuration
DAILY_ROOM_URL=https://your-domain.daily.co/your-room
DAILY_TOKEN=your_daily_token_here

# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key_here
DEEPGRAM_MODEL=nova-2
DEEPGRAM_DIARIZE=true

# LLM Configuration (Claude)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
LLM_MODEL=claude-3-5-sonnet-20241022

# TTS Configuration (ElevenLabs)
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
TTS_VOICE_ID=your_voice_id_here
TTS_MODEL=eleven_flash_v2_5

# Processor Configuration
TENSION_THRESHOLD=0.7
COOLDOWN_SECONDS=30
BUFFER_SIZE=50

Running

# From the voice_referee directory
python -m src.pipeline.main

The referee will:

Connect to the Daily.co room
Wait for participants to join
Introduce itself
Begin monitoring the conversation

Development

Project Structure

voice_referee/
├── src/
│   ├── config/
│   │   ├── settings.py        # Configuration with pydantic
│   │   └── daily_config.py    # Daily.co specific config
│   ├── processors/
│   │   ├── referee_monitor.py # Main orchestration processor
│   │   ├── speaker_mapper.py  # Maps speaker IDs to names
│   │   ├── conversation_state.py # Tracks conversation history
│   │   ├── analyzer.py        # Tension analysis
│   │   └── decider.py         # Intervention decisions
│   ├── services/
│   │   ├── daily_transport.py # Daily.co integration
│   │   ├── deepgram_stt.py    # Speech-to-text
│   │   ├── llm_service.py     # Claude integration
│   │   └── tts_service.py     # ElevenLabs TTS
│   ├── analysis/
│   │   └── conversation_analyzer.py
│   ├── decision/
│   │   └── intervention_decider.py
│   └── pipeline/
│       └── main.py            # Pipeline assembly
├── tests/
│   ├── unit/                  # Unit tests
│   └── integration/           # Integration tests
├── requirements.txt
└── .env.example

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/unit/test_analyzer.py -v

Key Components

RefereeMonitorProcessor (src/processors/referee_monitor.py)

Main orchestrator that receives transcription frames
Coordinates speaker mapping, state tracking, analysis, and intervention decisions
Builds context-aware prompts for Claude

ConversationState (src/processors/conversation_state.py)

Maintains rolling buffer of utterances (default 50)
Tracks per-speaker statistics
Calculates speaker balance

ConversationAnalyzer (src/analysis/conversation_analyzer.py)

Calculates tension score from multiple signals
Detects patterns like argument repetition
Weights: sentiment (0.3), interruption rate (0.3), imbalance (0.2), repetition (0.2)

InterventionDecider (src/decision/intervention_decider.py)

Applies decision rules to analysis results
Enforces 30-second cooldown between interventions
Generates contextual prompts for the LLM

Production Deployment

Requirements

Server with Python 3.10+
Network allowing WebRTC traffic
Firewall configured for Daily.co
SSL/TLS for secure connections

Docker Deployment

Create a Dockerfile:

FROM python:3.10-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    libportaudio2 \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install
COPY voice_referee/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY voice_referee/src ./src

# Set environment variables
ENV PYTHONUNBUFFERED=1

# Run the application
CMD ["python", "-m", "src.pipeline.main"]

Build and run:

docker build -t voice-referee .
docker run --env-file .env voice-referee

Environment Variables for Production

Store secrets in a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.):

# Required
DAILY_ROOM_URL=https://your-domain.daily.co/production-room
DAILY_TOKEN=<from_secrets_manager>
DEEPGRAM_API_KEY=<from_secrets_manager>
ANTHROPIC_API_KEY=<from_secrets_manager>
ELEVENLABS_API_KEY=<from_secrets_manager>

# Tuning
TENSION_THRESHOLD=0.7    # Lower = more interventions
COOLDOWN_SECONDS=30      # Minimum time between interventions
BUFFER_SIZE=50           # Number of utterances to track

# Logging
LOG_LEVEL=INFO

Monitoring

The system logs key events:

Participant joins/leaves
Utterances (with speaker attribution)
Analysis results (tension scores, patterns)
Intervention decisions
Pipeline stage latencies

Set up log aggregation (CloudWatch, Datadog, etc.) and monitor:

Error rate (target: < 5%)
End-to-end latency (target: < 800ms)
Intervention frequency
Session duration

Scaling

Each Voice Referee instance handles one mediation session. For multiple concurrent sessions:

Container orchestration (Kubernetes, ECS) to spawn instances on demand
Room management API to create Daily.co rooms and spawn referee instances
Session routing to connect founders to available referee instances

Performance Targets

Metric	Target	Notes
End-to-End Latency	< 800ms	Speech in → Speech out
STT Latency	< 300ms	Deepgram Nova-2
LLM Response	< 200ms	Claude with streaming
TTS Generation	< 300ms	ElevenLabs Flash v2.5
Memory Usage	< 500MB	Steady state

Mediation Principles

The AI mediator follows facilitative mediation principles:

Remain neutral - Never take sides or assign blame
Focus on interests, not positions - What do they need vs. what they demand
Validate emotions - Acknowledge feelings without agreeing with positions
Reframe accusations - "He never listens" → "Being heard matters to you"
Generate options - Help parties brainstorm solutions together

Escalation Boundaries

The system will pause mediation and recommend professional help for:

Legal matters (IP disputes, contract interpretation)
Safety concerns or threats
Allegations of fraud or fiduciary breach
Persistent impasse after multiple interventions

Documentation

Product Requirements Document - Full specification including mediation frameworks
Goal-Oriented Action Plan - Implementation plan with milestones

License

MIT License - See LICENSE for details.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/improvement)
Write tests for new functionality
Ensure all tests pass (pytest)
Submit a pull request

Acknowledgments

Pipecat - Voice AI pipeline framework
Daily.co - WebRTC infrastructure
Deepgram - Speech recognition with diarization
Anthropic Claude - AI language model
ElevenLabs - Natural text-to-speech
Harvard Negotiation Project's "Getting to Yes" framework

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
voice_referee		voice_referee
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aqe		aqe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Referee

Why This Exists

Who It's For

How It Works

Pipeline Flow

Intervention Triggers

Quick Start

Prerequisites

Installation

Configuration

Running

Development

Project Structure

Running Tests

Key Components

Production Deployment

Requirements

Docker Deployment

Environment Variables for Production

Monitoring

Scaling

Performance Targets

Mediation Principles

Escalation Boundaries

Documentation

License

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

natea/ump

Folders and files

Latest commit

History

Repository files navigation

Voice Referee

Why This Exists

Who It's For

How It Works

Pipeline Flow

Intervention Triggers

Quick Start

Prerequisites

Installation

Configuration

Running

Development

Project Structure

Running Tests

Key Components

Production Deployment

Requirements

Docker Deployment

Environment Variables for Production

Monitoring

Scaling

Performance Targets

Mediation Principles

Escalation Boundaries

Documentation

License

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages