NGHI-TTS

A browser-based Vietnamese Text-to-Speech application powered by Piper TTS models and ONNX Runtime Web. Generate high-quality speech directly in your browser without requiring a server for inference. Live demo: https://text2speech.work.

Features

🌐 Browser-Based TTS: Fully client-side text-to-speech processing using Web Workers
🇻🇳 Vietnamese Language Support: Advanced Vietnamese text processing with automatic conversion of:
- Numbers to words (0 to billions)
- Dates and date ranges
- Time expressions
- Currency (VND, USD)
- Percentages and decimals
- Phone numbers
- Ordinals
🎤 Multi-Speaker Models: Support for models with multiple voices
⚡ Real-Time Streaming: Stream audio chunks as they're generated
🎚️ Speed Control: Adjustable speech speed
📥 Audio Download: Export generated audio as WAV files
🌙 Dark Mode: Built-in theme toggle
📊 Text Statistics: Character and word count display
🔄 Dynamic Model Loading: Load models on-demand from Cloudflare R2 storage

🧠 Model Training Details

This project is built on top of Piper TTS and fine-tuned using a custom dataset to generate realistic voices. Please see the Training Video here: https://www.youtube.com/watch?v=WgvBOljtNvE

🔹 Base Model

Based on Piper (English checkpoint)
Lightweight, fast, and optimized for local inference
Designed for real-time speech generation

🔹 Fine-Tuning Process

Dataset:

Dataset size: ~1,000 audio samples
Voices: Multiple famous celebrity voices
Training method: Fine-tuning on existing Piper English checkpoint
Epochs: ~2,000 epochs
Download training datasets: View Datasets on Google Drive

Available datasets include:

Vietnamese celebrity voices (Mỹ Tâm, Ngọc Ngân, Trấn Thành, Việt Thảo)
Multi-speaker datasets
Various dataset sizes (200, 1000+ samples)
English voice datasets

Audio Preparation:

Cleaned and normalized audio
Matched text–audio pairs
Consistent sample rate
Noise removed

What the Model Learns:

Voice tone
Accent
Speech rhythm
Natural pronunciation

⚡ Inference Method

Web-based inference: No server required
Runs fully locally: All processing happens in your browser
Very fast inference: ~5× real-time speed
User-friendly: Simply enter text, select a voice, and generate speech instantly

✅ Key Benefits

✔ Based on Piper TTS
✔ Fine-tuned with 1,000+ audio samples
✔ Trained for ~2,000 epochs
✔ No server required
✔ Web-based & lightweight
✔ Fast inference (≈5× real-time)
✔ Free & open-source
✔ Allowed for commercial use
✔ Easy to deploy or modify

📦 Available Models

Pre-trained Vietnamese TTS models are available for download:

Download from Google Drive: View Available Models

Model List

calmwoman3688.onnx (~60.6 MB)
- Configuration: calmwoman3688.onnx.json
deepman3909.onnx (~60.6 MB)
- Configuration: deepman3909.onnx.json
ngocngan3701.onnx (~60.6 MB)
- Configuration: ngocngan3701.onnx.json
vietthao3886.onnx (~60.6 MB)
- Configuration: vietthao3886.onnx.json
Giọng mới: Mỹ Tâm, Trấn Thành, Ngọc Huyền (review phim), Oryx (giọng nam siêu trầm)

Each model includes both the .onnx model file and its corresponding .onnx.json configuration file. Download both files for each model to use it in the application.

Tech Stack

Frontend: Vue 3 + Vite
TTS Engine: Piper TTS (ONNX format)
Runtime: ONNX Runtime Web (WASM)
Hosting: Cloudflare Pages
Storage: Cloudflare R2 (for model files)
Styling: Tailwind CSS
Icons: Lucide Vue Next

Project Structure

nghitts/
├── src/
│   ├── App.vue                 # Main application component
│   ├── components/             # Vue components
│   │   ├── AudioChunk.vue     # Audio playback component
│   │   ├── ModelSelector.vue  # Model selection dropdown
│   │   ├── SpeedControl.vue   # Speech speed slider
│   │   ├── TextStatistics.vue # Text stats display
│   │   ├── ThemeToggle.vue    # Dark/light mode toggle
│   │   └── VoiceSelector.vue  # Voice selection component
│   ├── lib/
│   │   └── piper-tts.js       # Piper TTS implementation
│   ├── utils/
│   │   ├── model-cache.js     # Model file caching
│   │   ├── model-detector.js  # Model discovery from API
│   │   ├── text-cleaner.js    # Text cleaning and chunking
│   │   └── vietnamese-processor.js  # Vietnamese text processing
│   └── workers/
│       └── tts-worker.js      # Web Worker for TTS processing
├── functions/
│   └── api/
│       ├── models.ts          # List available models
│       └── model/[name].ts    # Serve model files from R2
└── public/
    └── non-vietnamese-words.csv  # Word replacement dictionary

How It Works

Model Loading: Models are stored in Cloudflare R2 and served via Cloudflare Pages Functions
Text Processing: Vietnamese text is processed to convert numbers, dates, times, etc. to spoken words
Text Chunking: Input text is intelligently split into chunks for optimal processing
Phoneme Conversion: Text is converted to phonemes using the phonemizer library
Audio Generation: ONNX Runtime Web runs the Piper TTS model in a Web Worker
Streaming: Audio chunks are streamed back to the main thread and played as they're generated
Audio Merging: Chunks are merged, normalized, and trimmed for final output

Vietnamese Text Processing

The application includes comprehensive Vietnamese text processing that handles:

Numbers: Automatic conversion to Vietnamese words (e.g., "123" → "một trăm hai mươi ba")
Dates: Multiple formats (DD/MM/YYYY, DD-MM-YYYY, date ranges)
Times: Time expressions (HH:MM, HH:MM:SS, "X giờ Y phút")
Currency: VND (đồng) and USD conversion
Percentages: Automatic conversion (e.g., "50%" → "năm mươi phần trăm")
Decimals: Vietnamese decimal format (comma as decimal separator)
Phone Numbers: Digit-by-digit reading
Ordinals: Conversion of ordinal numbers (thứ 2 → thứ hai)

Development

Prerequisites

Node.js 18+
npm or yarn

Installation

npm install

Development Server

npm run dev

Build

npm run build

Preview Production Build

npm run preview

Deployment

The project is configured for Cloudflare Pages deployment:

Models should be stored in a Cloudflare R2 bucket named tts-bucket
Models should be placed in the piper/ prefix
Each model requires two files:
- {model-name}.onnx - The ONNX model file
- {model-name}.onnx.json - The model configuration file

The Cloudflare Pages Function at /api/models will automatically discover available models from the R2 bucket.

Configuration

Wrangler Configuration

The wrangler.toml file configures:

Pages build output directory
R2 bucket binding (piper → tts-bucket)

Model Format

Models must be in Piper TTS ONNX format with:

.onnx file containing the ONNX model
.onnx.json file containing voice configuration (phoneme_id_map, audio settings, etc.)

Features in Detail

Text Cleaning

Removes emojis and special characters
Normalizes Unicode (NFC)
Handles Vietnamese-specific punctuation
Cleans whitespace

Text Chunking

Intelligently splits text into optimal chunks
Respects sentence boundaries
Handles long sentences by splitting at word boundaries
Maintains minimum and maximum chunk sizes for optimal processing

Audio Processing

Real-time streaming of audio chunks
Automatic normalization and peak limiting
Silence trimming
Sample rate preservation

Browser Compatibility

Modern browsers with WebAssembly support
Web Workers support required
ES Modules support required

📜 License & Usage

This project is:

✅ Free to use
✅ Open source
✅ Allowed for commercial use
✅ Customizable and deployable

⚠️ Important: Users are responsible for complying with voice and content laws when using generated audio.

Acknowledgments

Built on Piper TTS (GPL) by OHF-Voice
Inspired by piper-tts-web-demo by clowerweb
Uses ONNX Runtime Web for browser-based inference

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
functions/api		functions/api
node_modules/phonemizer		node_modules/phonemizer
public		public
src		src
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js
wrangler.toml		wrangler.toml

Folders and files

Latest commit

History

Repository files navigation

NGHI-TTS

Features

🧠 Model Training Details

🔹 Base Model

🔹 Fine-Tuning Process

⚡ Inference Method

✅ Key Benefits

📦 Available Models

Model List

Tech Stack

Project Structure

How It Works

Vietnamese Text Processing

Development

Prerequisites

Installation

Development Server

Build

Preview Production Build

Deployment

Configuration

Wrangler Configuration

Model Format

Features in Detail

Text Cleaning

Text Chunking

Audio Processing

Browser Compatibility

📜 License & Usage

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages