The Problem: Running TTS alongside LLMs fights for GPU resources, slowing down inference and increasing latency.
Our Solution: Unicorn Orator offloads TTS to Intel integrated graphics or AMD NPUs, leaving your discrete GPU free for what it does best - running large language models.
- ๐ Free Your GPU: TTS runs on iGPU/NPU, preserving discrete GPU for LLM inference
- โก Resource Efficient: Uses ~15W on iGPU vs 100W+ on discrete GPU
- ๐ญ 50+ Quality Voices: Kokoro v0.19 with diverse accents and styles
- ๐ OpenAI Compatible: Drop-in replacement, no code changes needed
- ๐ณ Production Ready: Docker image available, battle-tested deployment
# Pull and run the pre-built image
docker run -d --name unicorn-orator \
-p 8885:8880 \
-v $(pwd)/kokoro-tts/models:/app/models:ro \
--device /dev/dri:/dev/dri \
--group-add video \
magicunicorn/unicorn-orator:intel-igpu-v1.0
# Visit http://localhost:8885/web for the interfacegit clone https://github.com/Unicorn-Commander/Unicorn-Orator.git
cd Unicorn-Orator
docker-compose up -dWe've optimized Kokoro TTS to run efficiently on Intel integrated graphics via OpenVINO:
- Hardware Detection: Automatically detects and uses Intel Xe/Arc iGPUs
- FP16 Inference: Maintains quality while doubling throughput
- Minimal Memory: ~300MB VRAM usage, leaving room for other tasks
- Power Efficient: 10-15W TDP vs 75-350W for discrete GPUs
For Ryzen AI laptops (7040/8040 series), we're developing custom NPU support:
- Custom Runtime: Direct NPU access bypassing standard frameworks
- INT8 Quantization: Optimized models for NPU architecture
- Ultra Low Power: <10W for continuous synthesis
| Hardware | Power Usage | VRAM | Speed | Purpose |
|---|---|---|---|---|
| Intel iGPU | 15W | 300MB | 5x realtime | TTS (This Project) |
| AMD NPU | 10W | 256MB | 4x realtime | TTS (Experimental) |
| NVIDIA 4090 | 350W | 2GB | 20x realtime | Better used for LLMs |
| CPU (i7) | 45W | N/A | 2x realtime | Fallback option |
import requests
# Works exactly like OpenAI's API
response = requests.post('http://localhost:8885/v1/audio/speech',
json={
'text': 'Hello from Unicorn Orator!',
'voice': 'af_heart', # 50+ voices available
'speed': 1.0
}
)
with open('output.wav', 'wb') as f:
f.write(response.content)| Voice ID | Description | Best For |
|---|---|---|
af_heart |
Warm, friendly female | General narration |
am_michael |
Professional male | News/corporate |
bf_emma |
British female | Audiobooks |
af_bella |
Young American female | Social media |
bm_george |
British male | Documentation |
[Full voice list available at /voices endpoint]
Your System:
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ Discrete GPU โ Intel iGPU โ
โ (RTX/Arc/RX) โ (Xe Graphics) โ
โ โ โ
โ Running: โ Running: โ
โ - LLMs โ - Unicorn TTS โ
โ - Stable Diff โ - Video decode โ
โ - ML Training โ - Display โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโฌโโโโโโโโโโโโ
โ
[High Performance AI]
Without Competition
- โ Intel iGPU support via OpenVINO
- โ 50+ Kokoro voices
- โ OpenAI API compatibility
- โ Docker deployment
- โ Web interface
- Real-time streaming
- AMD NPU production support
- Voice cloning (ethical use only)
- SSML support
- Batch processing API
- Kubernetes operator
- Apple Neural Engine support
- Qualcomm Hexagon DSP
- Edge deployment (Jetson, Pi 5)
- WebGPU browser runtime
- Docker & Docker Compose
- Intel CPU with Xe/Arc graphics (or AMD Ryzen AI)
- 8GB RAM minimum
- Ubuntu 22.04+ or Windows 11 WSL2
# Clone repository
git clone https://github.com/Unicorn-Commander/Unicorn-Orator.git
cd Unicorn-Orator
# Download models (one-time, ~350MB)
./download_models.sh
# Build with hardware detection
./build.sh
# Run
docker-compose up -dTesting setup: Intel Core i7-13700K with Intel UHD 770 iGPU
| Text Length | Generation Time | Realtime Factor |
|---|---|---|
| 1 sentence | 180ms | 5.5x |
| 1 paragraph | 950ms | 5.2x |
| 1 page | 4.2s | 5.0x |
Realtime factor = audio duration / generation time
We especially welcome contributions for:
- Hardware optimization (OpenVINO, XDNA, CoreML)
- Additional TTS models beyond Kokoro
- Voice training and fine-tuning
- Performance improvements
See CONTRIBUTING.md for guidelines.
- Kokoro TTS - The excellent TTS model we build upon
- OpenVINO Toolkit - Intel's inference optimization framework
- Hugging Face - Model hosting and community
MIT License - See LICENSE for details
Unicorn Orator is part of the UC-1 Pro AI infrastructure suite:
| Service | Purpose | Port |
|---|---|---|
| Unicorn Orator | Text-to-speech | 8885 |
| Unicorn Amanuensis | Speech-to-text | 8886 |
| Unicorn vLLM | LLM inference | 8000 |
| Open-WebUI | Chat interface | 3000 |
๐ณ Docker Hub โข ๐ Issues โข ๐ฌ Discussions
Built by Magic Unicorn Unconventional Technology & Stuff Inc.

