AI-Powered Real-Time Voice Mediation for Co-Founder Disputes
Voice Referee is an AI mediator that joins voice calls between startup co-founders to help de-escalate conflicts and facilitate productive conversations. Built on professional mediation principles from the Harvard Negotiation Project's "Getting to Yes" framework, it listens to conversations in real-time, detects tension, and intervenes with contextually appropriate guidance.
65% of startup failures stem from co-founder conflict, making this the single largest risk after product-market fit. Yet most founders lack access to professional mediation when disputes arise.
Voice Referee provides:
- Real-time tension detection using speech analysis
- Neutral, facilitative interventions that follow professional mediation frameworks
- Speaker diarization to track who said what
- Contextual responses generated by Claude that validate emotions and reframe accusations into needs
- Startup accelerators wanting to support portfolio companies through co-founder conflicts
- Co-working spaces and founder communities offering mediation resources
- Founders themselves who need help having difficult conversations
- Coaches and advisors looking for AI-augmented mediation tools
┌─────────────────────────────────────────────────────────────┐
│ Daily.co Room │
│ ┌─────────┐ ┌─────────┐ ┌─────────────────────────────┐ │
│ │Founder A│ │Founder B│ │ AI Mediator Agent │ │
│ └────┬────┘ └────┬────┘ └──────────────┬──────────────┘ │
│ └────────────┴─────────────────────┬┘ │
│ Audio Streams │ │
└──────────────────────────────────────────┼──────────────────┘
│
┌──────────────────────────────────────────▼──────────────────┐
│ Pipecat Pipeline │
│ ┌───────────────┐ ┌────────────────┐ ┌───────────────┐ │
│ │ Silero VAD │→ │ Deepgram STT │→ │ Referee │ │
│ │ (activity) │ │ (diarization) │ │ Monitor │ │
│ └───────────────┘ └────────────────┘ └───────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Claude (Anthropic) - AI Mediator │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ElevenLabs TTS - Natural Speech Output │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Daily.co Transport - WebRTC audio streaming from participants
- Silero VAD - Voice Activity Detection to identify speech
- Deepgram STT - Speech-to-text with speaker diarization (identifies who is speaking)
- Referee Monitor - Analyzes conversation state, detects tension, decides when to intervene
- Claude LLM - Generates contextual mediation responses
- ElevenLabs TTS - Converts responses to natural speech
- Daily.co Output - Streams audio back to participants
The system monitors for:
- High tension scores (> 0.7) based on sentiment analysis, interruption rate, and speaker imbalance
- Circular arguments (same points repeated 3+ times)
- Speaker dominance (one person talking > 80% of the time for > 5 minutes)
When triggered, it generates interventions like:
- "Let's pause a second."
- "What did you hear [other name] say?"
- "What's the smallest step you'd both agree on?"
- Python 3.10+
- API keys for:
- Daily.co - WebRTC infrastructure
- Deepgram - Speech-to-text with diarization
- Anthropic - Claude for AI responses
- ElevenLabs - Text-to-speech
# Clone the repository
git clone https://github.com/your-org/voice-referee.git
cd voice-referee
# Create and activate virtual environment
cd voice_referee
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy the example environment file
cp .env.example .envEdit .env with your API keys:
# Daily.co Configuration
DAILY_ROOM_URL=https://your-domain.daily.co/your-room
DAILY_TOKEN=your_daily_token_here
# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key_here
DEEPGRAM_MODEL=nova-2
DEEPGRAM_DIARIZE=true
# LLM Configuration (Claude)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
LLM_MODEL=claude-3-5-sonnet-20241022
# TTS Configuration (ElevenLabs)
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
TTS_VOICE_ID=your_voice_id_here
TTS_MODEL=eleven_flash_v2_5
# Processor Configuration
TENSION_THRESHOLD=0.7
COOLDOWN_SECONDS=30
BUFFER_SIZE=50# From the voice_referee directory
python -m src.pipeline.mainThe referee will:
- Connect to the Daily.co room
- Wait for participants to join
- Introduce itself
- Begin monitoring the conversation
voice_referee/
├── src/
│ ├── config/
│ │ ├── settings.py # Configuration with pydantic
│ │ └── daily_config.py # Daily.co specific config
│ ├── processors/
│ │ ├── referee_monitor.py # Main orchestration processor
│ │ ├── speaker_mapper.py # Maps speaker IDs to names
│ │ ├── conversation_state.py # Tracks conversation history
│ │ ├── analyzer.py # Tension analysis
│ │ └── decider.py # Intervention decisions
│ ├── services/
│ │ ├── daily_transport.py # Daily.co integration
│ │ ├── deepgram_stt.py # Speech-to-text
│ │ ├── llm_service.py # Claude integration
│ │ └── tts_service.py # ElevenLabs TTS
│ ├── analysis/
│ │ └── conversation_analyzer.py
│ ├── decision/
│ │ └── intervention_decider.py
│ └── pipeline/
│ └── main.py # Pipeline assembly
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── requirements.txt
└── .env.example
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/unit/test_analyzer.py -vRefereeMonitorProcessor (src/processors/referee_monitor.py)
- Main orchestrator that receives transcription frames
- Coordinates speaker mapping, state tracking, analysis, and intervention decisions
- Builds context-aware prompts for Claude
ConversationState (src/processors/conversation_state.py)
- Maintains rolling buffer of utterances (default 50)
- Tracks per-speaker statistics
- Calculates speaker balance
ConversationAnalyzer (src/analysis/conversation_analyzer.py)
- Calculates tension score from multiple signals
- Detects patterns like argument repetition
- Weights: sentiment (0.3), interruption rate (0.3), imbalance (0.2), repetition (0.2)
InterventionDecider (src/decision/intervention_decider.py)
- Applies decision rules to analysis results
- Enforces 30-second cooldown between interventions
- Generates contextual prompts for the LLM
- Server with Python 3.10+
- Network allowing WebRTC traffic
- Firewall configured for Daily.co
- SSL/TLS for secure connections
Create a Dockerfile:
FROM python:3.10-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
libportaudio2 \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install
COPY voice_referee/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY voice_referee/src ./src
# Set environment variables
ENV PYTHONUNBUFFERED=1
# Run the application
CMD ["python", "-m", "src.pipeline.main"]Build and run:
docker build -t voice-referee .
docker run --env-file .env voice-refereeStore secrets in a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.):
# Required
DAILY_ROOM_URL=https://your-domain.daily.co/production-room
DAILY_TOKEN=<from_secrets_manager>
DEEPGRAM_API_KEY=<from_secrets_manager>
ANTHROPIC_API_KEY=<from_secrets_manager>
ELEVENLABS_API_KEY=<from_secrets_manager>
# Tuning
TENSION_THRESHOLD=0.7 # Lower = more interventions
COOLDOWN_SECONDS=30 # Minimum time between interventions
BUFFER_SIZE=50 # Number of utterances to track
# Logging
LOG_LEVEL=INFOThe system logs key events:
- Participant joins/leaves
- Utterances (with speaker attribution)
- Analysis results (tension scores, patterns)
- Intervention decisions
- Pipeline stage latencies
Set up log aggregation (CloudWatch, Datadog, etc.) and monitor:
- Error rate (target: < 5%)
- End-to-end latency (target: < 800ms)
- Intervention frequency
- Session duration
Each Voice Referee instance handles one mediation session. For multiple concurrent sessions:
- Container orchestration (Kubernetes, ECS) to spawn instances on demand
- Room management API to create Daily.co rooms and spawn referee instances
- Session routing to connect founders to available referee instances
| Metric | Target | Notes |
|---|---|---|
| End-to-End Latency | < 800ms | Speech in → Speech out |
| STT Latency | < 300ms | Deepgram Nova-2 |
| LLM Response | < 200ms | Claude with streaming |
| TTS Generation | < 300ms | ElevenLabs Flash v2.5 |
| Memory Usage | < 500MB | Steady state |
The AI mediator follows facilitative mediation principles:
- Remain neutral - Never take sides or assign blame
- Focus on interests, not positions - What do they need vs. what they demand
- Validate emotions - Acknowledge feelings without agreeing with positions
- Reframe accusations - "He never listens" → "Being heard matters to you"
- Generate options - Help parties brainstorm solutions together
The system will pause mediation and recommend professional help for:
- Legal matters (IP disputes, contract interpretation)
- Safety concerns or threats
- Allegations of fraud or fiduciary breach
- Persistent impasse after multiple interventions
- Product Requirements Document - Full specification including mediation frameworks
- Goal-Oriented Action Plan - Implementation plan with milestones
MIT License - See LICENSE for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Write tests for new functionality
- Ensure all tests pass (
pytest) - Submit a pull request
- Pipecat - Voice AI pipeline framework
- Daily.co - WebRTC infrastructure
- Deepgram - Speech recognition with diarization
- Anthropic Claude - AI language model
- ElevenLabs - Natural text-to-speech
- Harvard Negotiation Project's "Getting to Yes" framework