Skip to content

natea/ump

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Referee

AI-Powered Real-Time Voice Mediation for Co-Founder Disputes

Voice Referee is an AI mediator that joins voice calls between startup co-founders to help de-escalate conflicts and facilitate productive conversations. Built on professional mediation principles from the Harvard Negotiation Project's "Getting to Yes" framework, it listens to conversations in real-time, detects tension, and intervenes with contextually appropriate guidance.


Why This Exists

65% of startup failures stem from co-founder conflict, making this the single largest risk after product-market fit. Yet most founders lack access to professional mediation when disputes arise.

Voice Referee provides:

  • Real-time tension detection using speech analysis
  • Neutral, facilitative interventions that follow professional mediation frameworks
  • Speaker diarization to track who said what
  • Contextual responses generated by Claude that validate emotions and reframe accusations into needs

Who It's For

  • Startup accelerators wanting to support portfolio companies through co-founder conflicts
  • Co-working spaces and founder communities offering mediation resources
  • Founders themselves who need help having difficult conversations
  • Coaches and advisors looking for AI-augmented mediation tools

How It Works

┌─────────────────────────────────────────────────────────────┐
│                    Daily.co Room                            │
│  ┌─────────┐  ┌─────────┐  ┌─────────────────────────────┐ │
│  │Founder A│  │Founder B│  │    AI Mediator Agent        │ │
│  └────┬────┘  └────┬────┘  └──────────────┬──────────────┘ │
│       └────────────┴─────────────────────┬┘                 │
│                  Audio Streams           │                  │
└──────────────────────────────────────────┼──────────────────┘
                                           │
┌──────────────────────────────────────────▼──────────────────┐
│                 Pipecat Pipeline                            │
│  ┌───────────────┐  ┌────────────────┐  ┌───────────────┐  │
│  │ Silero VAD    │→ │ Deepgram STT   │→ │ Referee       │  │
│  │ (activity)    │  │ (diarization)  │  │ Monitor       │  │
│  └───────────────┘  └────────────────┘  └───────────────┘  │
│                              │                              │
│                              ▼                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │        Claude (Anthropic) - AI Mediator              │  │
│  └──────────────────────────────────────────────────────┘  │
│                              │                              │
│                              ▼                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │        ElevenLabs TTS - Natural Speech Output        │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Pipeline Flow

  1. Daily.co Transport - WebRTC audio streaming from participants
  2. Silero VAD - Voice Activity Detection to identify speech
  3. Deepgram STT - Speech-to-text with speaker diarization (identifies who is speaking)
  4. Referee Monitor - Analyzes conversation state, detects tension, decides when to intervene
  5. Claude LLM - Generates contextual mediation responses
  6. ElevenLabs TTS - Converts responses to natural speech
  7. Daily.co Output - Streams audio back to participants

Intervention Triggers

The system monitors for:

  • High tension scores (> 0.7) based on sentiment analysis, interruption rate, and speaker imbalance
  • Circular arguments (same points repeated 3+ times)
  • Speaker dominance (one person talking > 80% of the time for > 5 minutes)

When triggered, it generates interventions like:

  • "Let's pause a second."
  • "What did you hear [other name] say?"
  • "What's the smallest step you'd both agree on?"

Quick Start

Prerequisites

Installation

# Clone the repository
git clone https://github.com/your-org/voice-referee.git
cd voice-referee

# Create and activate virtual environment
cd voice_referee
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

# Copy the example environment file
cp .env.example .env

Edit .env with your API keys:

# Daily.co Configuration
DAILY_ROOM_URL=https://your-domain.daily.co/your-room
DAILY_TOKEN=your_daily_token_here

# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key_here
DEEPGRAM_MODEL=nova-2
DEEPGRAM_DIARIZE=true

# LLM Configuration (Claude)
ANTHROPIC_API_KEY=your_anthropic_api_key_here
LLM_MODEL=claude-3-5-sonnet-20241022

# TTS Configuration (ElevenLabs)
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
TTS_VOICE_ID=your_voice_id_here
TTS_MODEL=eleven_flash_v2_5

# Processor Configuration
TENSION_THRESHOLD=0.7
COOLDOWN_SECONDS=30
BUFFER_SIZE=50

Running

# From the voice_referee directory
python -m src.pipeline.main

The referee will:

  1. Connect to the Daily.co room
  2. Wait for participants to join
  3. Introduce itself
  4. Begin monitoring the conversation

Development

Project Structure

voice_referee/
├── src/
│   ├── config/
│   │   ├── settings.py        # Configuration with pydantic
│   │   └── daily_config.py    # Daily.co specific config
│   ├── processors/
│   │   ├── referee_monitor.py # Main orchestration processor
│   │   ├── speaker_mapper.py  # Maps speaker IDs to names
│   │   ├── conversation_state.py # Tracks conversation history
│   │   ├── analyzer.py        # Tension analysis
│   │   └── decider.py         # Intervention decisions
│   ├── services/
│   │   ├── daily_transport.py # Daily.co integration
│   │   ├── deepgram_stt.py    # Speech-to-text
│   │   ├── llm_service.py     # Claude integration
│   │   └── tts_service.py     # ElevenLabs TTS
│   ├── analysis/
│   │   └── conversation_analyzer.py
│   ├── decision/
│   │   └── intervention_decider.py
│   └── pipeline/
│       └── main.py            # Pipeline assembly
├── tests/
│   ├── unit/                  # Unit tests
│   └── integration/           # Integration tests
├── requirements.txt
└── .env.example

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/unit/test_analyzer.py -v

Key Components

RefereeMonitorProcessor (src/processors/referee_monitor.py)

  • Main orchestrator that receives transcription frames
  • Coordinates speaker mapping, state tracking, analysis, and intervention decisions
  • Builds context-aware prompts for Claude

ConversationState (src/processors/conversation_state.py)

  • Maintains rolling buffer of utterances (default 50)
  • Tracks per-speaker statistics
  • Calculates speaker balance

ConversationAnalyzer (src/analysis/conversation_analyzer.py)

  • Calculates tension score from multiple signals
  • Detects patterns like argument repetition
  • Weights: sentiment (0.3), interruption rate (0.3), imbalance (0.2), repetition (0.2)

InterventionDecider (src/decision/intervention_decider.py)

  • Applies decision rules to analysis results
  • Enforces 30-second cooldown between interventions
  • Generates contextual prompts for the LLM

Production Deployment

Requirements

  • Server with Python 3.10+
  • Network allowing WebRTC traffic
  • Firewall configured for Daily.co
  • SSL/TLS for secure connections

Docker Deployment

Create a Dockerfile:

FROM python:3.10-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    libportaudio2 \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements and install
COPY voice_referee/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY voice_referee/src ./src

# Set environment variables
ENV PYTHONUNBUFFERED=1

# Run the application
CMD ["python", "-m", "src.pipeline.main"]

Build and run:

docker build -t voice-referee .
docker run --env-file .env voice-referee

Environment Variables for Production

Store secrets in a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.):

# Required
DAILY_ROOM_URL=https://your-domain.daily.co/production-room
DAILY_TOKEN=<from_secrets_manager>
DEEPGRAM_API_KEY=<from_secrets_manager>
ANTHROPIC_API_KEY=<from_secrets_manager>
ELEVENLABS_API_KEY=<from_secrets_manager>

# Tuning
TENSION_THRESHOLD=0.7    # Lower = more interventions
COOLDOWN_SECONDS=30      # Minimum time between interventions
BUFFER_SIZE=50           # Number of utterances to track

# Logging
LOG_LEVEL=INFO

Monitoring

The system logs key events:

  • Participant joins/leaves
  • Utterances (with speaker attribution)
  • Analysis results (tension scores, patterns)
  • Intervention decisions
  • Pipeline stage latencies

Set up log aggregation (CloudWatch, Datadog, etc.) and monitor:

  • Error rate (target: < 5%)
  • End-to-end latency (target: < 800ms)
  • Intervention frequency
  • Session duration

Scaling

Each Voice Referee instance handles one mediation session. For multiple concurrent sessions:

  1. Container orchestration (Kubernetes, ECS) to spawn instances on demand
  2. Room management API to create Daily.co rooms and spawn referee instances
  3. Session routing to connect founders to available referee instances

Performance Targets

Metric Target Notes
End-to-End Latency < 800ms Speech in → Speech out
STT Latency < 300ms Deepgram Nova-2
LLM Response < 200ms Claude with streaming
TTS Generation < 300ms ElevenLabs Flash v2.5
Memory Usage < 500MB Steady state

Mediation Principles

The AI mediator follows facilitative mediation principles:

  1. Remain neutral - Never take sides or assign blame
  2. Focus on interests, not positions - What do they need vs. what they demand
  3. Validate emotions - Acknowledge feelings without agreeing with positions
  4. Reframe accusations - "He never listens" → "Being heard matters to you"
  5. Generate options - Help parties brainstorm solutions together

Escalation Boundaries

The system will pause mediation and recommend professional help for:

  • Legal matters (IP disputes, contract interpretation)
  • Safety concerns or threats
  • Allegations of fraud or fiduciary breach
  • Persistent impasse after multiple interventions

Documentation


License

MIT License - See LICENSE for details.


Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Write tests for new functionality
  4. Ensure all tests pass (pytest)
  5. Submit a pull request

Acknowledgments

  • Pipecat - Voice AI pipeline framework
  • Daily.co - WebRTC infrastructure
  • Deepgram - Speech recognition with diarization
  • Anthropic Claude - AI language model
  • ElevenLabs - Natural text-to-speech
  • Harvard Negotiation Project's "Getting to Yes" framework

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •