Skip to content

๐Ÿฆ„ Professional AI Speech Processing Platform - High-quality Speech-to-Text and Text-to-Speech services

License

Notifications You must be signed in to change notification settings

mastercda/Unicorn-Orator

Repository files navigation

๐Ÿฆ„ Unicorn Orator - Lightweight TTS with Hardware Acceleration

Unicorn Orator Logo

Docker License OpenAI Compatible

Efficient text-to-speech that runs on Intel iGPU, freeing your GPU for AI inference

Web Interface | Docker Hub | API Docs


๐ŸŽฏ Why Unicorn Orator?

The Problem: Running TTS alongside LLMs fights for GPU resources, slowing down inference and increasing latency.

Our Solution: Unicorn Orator offloads TTS to Intel integrated graphics or AMD NPUs, leaving your discrete GPU free for what it does best - running large language models.

Key Benefits

  • ๐Ÿš€ Free Your GPU: TTS runs on iGPU/NPU, preserving discrete GPU for LLM inference
  • โšก Resource Efficient: Uses ~15W on iGPU vs 100W+ on discrete GPU
  • ๐ŸŽญ 50+ Quality Voices: Kokoro v0.19 with diverse accents and styles
  • ๐Ÿ”Œ OpenAI Compatible: Drop-in replacement, no code changes needed
  • ๐Ÿณ Production Ready: Docker image available, battle-tested deployment

๐Ÿ–ผ๏ธ Web Interface

Unicorn Orator Web Interface
Clean, intuitive interface with 50+ voices and advanced settings

๐Ÿš€ Quick Start

Using Docker (Recommended)

# Pull and run the pre-built image
docker run -d --name unicorn-orator \
  -p 8885:8880 \
  -v $(pwd)/kokoro-tts/models:/app/models:ro \
  --device /dev/dri:/dev/dri \
  --group-add video \
  magicunicorn/unicorn-orator:intel-igpu-v1.0

# Visit http://localhost:8885/web for the interface

From Source

git clone https://github.com/Unicorn-Commander/Unicorn-Orator.git
cd Unicorn-Orator
docker-compose up -d

๐Ÿ’ก Technical Innovation

Intel iGPU Optimization

We've optimized Kokoro TTS to run efficiently on Intel integrated graphics via OpenVINO:

  • Hardware Detection: Automatically detects and uses Intel Xe/Arc iGPUs
  • FP16 Inference: Maintains quality while doubling throughput
  • Minimal Memory: ~300MB VRAM usage, leaving room for other tasks
  • Power Efficient: 10-15W TDP vs 75-350W for discrete GPUs

AMD NPU Support (Experimental)

For Ryzen AI laptops (7040/8040 series), we're developing custom NPU support:

  • Custom Runtime: Direct NPU access bypassing standard frameworks
  • INT8 Quantization: Optimized models for NPU architecture
  • Ultra Low Power: <10W for continuous synthesis

Performance Comparison

Hardware Power Usage VRAM Speed Purpose
Intel iGPU 15W 300MB 5x realtime TTS (This Project)
AMD NPU 10W 256MB 4x realtime TTS (Experimental)
NVIDIA 4090 350W 2GB 20x realtime Better used for LLMs
CPU (i7) 45W N/A 2x realtime Fallback option

๐Ÿ“ก API Usage

OpenAI-Compatible Endpoint

import requests

# Works exactly like OpenAI's API
response = requests.post('http://localhost:8885/v1/audio/speech',
    json={
        'text': 'Hello from Unicorn Orator!',
        'voice': 'af_heart',  # 50+ voices available
        'speed': 1.0
    }
)

with open('output.wav', 'wb') as f:
    f.write(response.content)

Available Voices (Selection)

Voice ID Description Best For
af_heart Warm, friendly female General narration
am_michael Professional male News/corporate
bf_emma British female Audiobooks
af_bella Young American female Social media
bm_george British male Documentation

[Full voice list available at /voices endpoint]

๐Ÿ—๏ธ Architecture

Your System:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Discrete GPU   โ”‚   Intel iGPU     โ”‚
โ”‚   (RTX/Arc/RX)   โ”‚   (Xe Graphics)  โ”‚
โ”‚                  โ”‚                  โ”‚
โ”‚   Running:       โ”‚   Running:       โ”‚
โ”‚   - LLMs         โ”‚   - Unicorn TTS  โ”‚
โ”‚   - Stable Diff  โ”‚   - Video decode โ”‚
โ”‚   - ML Training  โ”‚   - Display      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                  โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
        [High Performance AI]
         Without Competition

๐Ÿ”ฎ Roadmap

Current Release (v1.0)

  • โœ… Intel iGPU support via OpenVINO
  • โœ… 50+ Kokoro voices
  • โœ… OpenAI API compatibility
  • โœ… Docker deployment
  • โœ… Web interface

Planned Features

  • Real-time streaming
  • AMD NPU production support
  • Voice cloning (ethical use only)
  • SSML support
  • Batch processing API
  • Kubernetes operator

Future Exploration

  • Apple Neural Engine support
  • Qualcomm Hexagon DSP
  • Edge deployment (Jetson, Pi 5)
  • WebGPU browser runtime

๐Ÿ› ๏ธ Building From Source

Prerequisites

  • Docker & Docker Compose
  • Intel CPU with Xe/Arc graphics (or AMD Ryzen AI)
  • 8GB RAM minimum
  • Ubuntu 22.04+ or Windows 11 WSL2

Build Steps

# Clone repository
git clone https://github.com/Unicorn-Commander/Unicorn-Orator.git
cd Unicorn-Orator

# Download models (one-time, ~350MB)
./download_models.sh

# Build with hardware detection
./build.sh

# Run
docker-compose up -d

๐Ÿ“Š Benchmarks

Testing setup: Intel Core i7-13700K with Intel UHD 770 iGPU

Text Length Generation Time Realtime Factor
1 sentence 180ms 5.5x
1 paragraph 950ms 5.2x
1 page 4.2s 5.0x

Realtime factor = audio duration / generation time

๐Ÿค Contributing

We especially welcome contributions for:

  • Hardware optimization (OpenVINO, XDNA, CoreML)
  • Additional TTS models beyond Kokoro
  • Voice training and fine-tuning
  • Performance improvements

See CONTRIBUTING.md for guidelines.

๐Ÿ™ Acknowledgments

  • Kokoro TTS - The excellent TTS model we build upon
  • OpenVINO Toolkit - Intel's inference optimization framework
  • Hugging Face - Model hosting and community

๐Ÿ“œ License

MIT License - See LICENSE for details

๐Ÿข UC-1 Pro Ecosystem

Unicorn Orator is part of the UC-1 Pro AI infrastructure suite:

Service Purpose Port
Unicorn Orator Text-to-speech 8885
Unicorn Amanuensis Speech-to-text 8886
Unicorn vLLM LLM inference 8000
Open-WebUI Chat interface 3000

Free your GPU. Enhance your AI.

๐Ÿณ Docker Hub โ€ข ๐Ÿ› Issues โ€ข ๐Ÿ’ฌ Discussions



Built by Magic Unicorn Unconventional Technology & Stuff Inc.

About

๐Ÿฆ„ Professional AI Speech Processing Platform - High-quality Speech-to-Text and Text-to-Speech services

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •