Skip to content

clackken-vni/nghitts

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NGHI-TTS

A browser-based Vietnamese Text-to-Speech application powered by Piper TTS models and ONNX Runtime Web. Generate high-quality speech directly in your browser without requiring a server for inference. Live demo: https://text2speech.work.

Features

  • 🌐 Browser-Based TTS: Fully client-side text-to-speech processing using Web Workers
  • πŸ‡»πŸ‡³ Vietnamese Language Support: Advanced Vietnamese text processing with automatic conversion of:
    • Numbers to words (0 to billions)
    • Dates and date ranges
    • Time expressions
    • Currency (VND, USD)
    • Percentages and decimals
    • Phone numbers
    • Ordinals
  • 🎀 Multi-Speaker Models: Support for models with multiple voices
  • ⚑ Real-Time Streaming: Stream audio chunks as they're generated
  • 🎚️ Speed Control: Adjustable speech speed
  • πŸ“₯ Audio Download: Export generated audio as WAV files
  • πŸŒ™ Dark Mode: Built-in theme toggle
  • πŸ“Š Text Statistics: Character and word count display
  • πŸ”„ Dynamic Model Loading: Load models on-demand from Cloudflare R2 storage

🧠 Model Training Details

This project is built on top of Piper TTS and fine-tuned using a custom dataset to generate realistic voices. Please see the Training Video here: https://www.youtube.com/watch?v=WgvBOljtNvE

πŸ”Ή Base Model

  • Based on Piper (English checkpoint)
  • Lightweight, fast, and optimized for local inference
  • Designed for real-time speech generation

πŸ”Ή Fine-Tuning Process

Dataset:

  • Dataset size: ~1,000 audio samples
  • Voices: Multiple famous celebrity voices
  • Training method: Fine-tuning on existing Piper English checkpoint
  • Epochs: ~2,000 epochs
  • Download training datasets: View Datasets on Google Drive

Available datasets include:

  • Vietnamese celebrity voices (Mα»Ή TΓ’m, Ngọc NgΓ’n, TrαΊ₯n ThΓ nh, Việt ThαΊ£o)
  • Multi-speaker datasets
  • Various dataset sizes (200, 1000+ samples)
  • English voice datasets

Audio Preparation:

  • Cleaned and normalized audio
  • Matched text–audio pairs
  • Consistent sample rate
  • Noise removed

What the Model Learns:

  • Voice tone
  • Accent
  • Speech rhythm
  • Natural pronunciation

⚑ Inference Method

  • Web-based inference: No server required
  • Runs fully locally: All processing happens in your browser
  • Very fast inference: ~5Γ— real-time speed
  • User-friendly: Simply enter text, select a voice, and generate speech instantly

βœ… Key Benefits

  • βœ” Based on Piper TTS
  • βœ” Fine-tuned with 1,000+ audio samples
  • βœ” Trained for ~2,000 epochs
  • βœ” No server required
  • βœ” Web-based & lightweight
  • βœ” Fast inference (β‰ˆ5Γ— real-time)
  • βœ” Free & open-source
  • βœ” Allowed for commercial use
  • βœ” Easy to deploy or modify

πŸ“¦ Available Models

Pre-trained Vietnamese TTS models are available for download:

Download from Google Drive: View Available Models

Model List

  1. calmwoman3688.onnx (~60.6 MB)

    • Configuration: calmwoman3688.onnx.json
  2. deepman3909.onnx (~60.6 MB)

    • Configuration: deepman3909.onnx.json
  3. ngocngan3701.onnx (~60.6 MB)

    • Configuration: ngocngan3701.onnx.json
  4. vietthao3886.onnx (~60.6 MB)

    • Configuration: vietthao3886.onnx.json
  5. Giọng mα»›i: Mα»Ή TΓ’m, TrαΊ₯n ThΓ nh, Ngọc Huyền (review phim), Oryx (giọng nam siΓͺu trαΊ§m)

Each model includes both the .onnx model file and its corresponding .onnx.json configuration file. Download both files for each model to use it in the application.

Tech Stack

  • Frontend: Vue 3 + Vite
  • TTS Engine: Piper TTS (ONNX format)
  • Runtime: ONNX Runtime Web (WASM)
  • Hosting: Cloudflare Pages
  • Storage: Cloudflare R2 (for model files)
  • Styling: Tailwind CSS
  • Icons: Lucide Vue Next

Project Structure

nghitts/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ App.vue                 # Main application component
β”‚   β”œβ”€β”€ components/             # Vue components
β”‚   β”‚   β”œβ”€β”€ AudioChunk.vue     # Audio playback component
β”‚   β”‚   β”œβ”€β”€ ModelSelector.vue  # Model selection dropdown
β”‚   β”‚   β”œβ”€β”€ SpeedControl.vue   # Speech speed slider
β”‚   β”‚   β”œβ”€β”€ TextStatistics.vue # Text stats display
β”‚   β”‚   β”œβ”€β”€ ThemeToggle.vue    # Dark/light mode toggle
β”‚   β”‚   └── VoiceSelector.vue  # Voice selection component
β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   └── piper-tts.js       # Piper TTS implementation
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ model-cache.js     # Model file caching
β”‚   β”‚   β”œβ”€β”€ model-detector.js  # Model discovery from API
β”‚   β”‚   β”œβ”€β”€ text-cleaner.js    # Text cleaning and chunking
β”‚   β”‚   └── vietnamese-processor.js  # Vietnamese text processing
β”‚   └── workers/
β”‚       └── tts-worker.js      # Web Worker for TTS processing
β”œβ”€β”€ functions/
β”‚   └── api/
β”‚       β”œβ”€β”€ models.ts          # List available models
β”‚       └── model/[name].ts    # Serve model files from R2
└── public/
    └── non-vietnamese-words.csv  # Word replacement dictionary

How It Works

  1. Model Loading: Models are stored in Cloudflare R2 and served via Cloudflare Pages Functions
  2. Text Processing: Vietnamese text is processed to convert numbers, dates, times, etc. to spoken words
  3. Text Chunking: Input text is intelligently split into chunks for optimal processing
  4. Phoneme Conversion: Text is converted to phonemes using the phonemizer library
  5. Audio Generation: ONNX Runtime Web runs the Piper TTS model in a Web Worker
  6. Streaming: Audio chunks are streamed back to the main thread and played as they're generated
  7. Audio Merging: Chunks are merged, normalized, and trimmed for final output

Vietnamese Text Processing

The application includes comprehensive Vietnamese text processing that handles:

  • Numbers: Automatic conversion to Vietnamese words (e.g., "123" β†’ "mα»™t trΔƒm hai mΖ°Ζ‘i ba")
  • Dates: Multiple formats (DD/MM/YYYY, DD-MM-YYYY, date ranges)
  • Times: Time expressions (HH:MM, HH:MM:SS, "X giờ Y phΓΊt")
  • Currency: VND (Δ‘α»“ng) and USD conversion
  • Percentages: Automatic conversion (e.g., "50%" β†’ "nΔƒm mΖ°Ζ‘i phαΊ§n trΔƒm")
  • Decimals: Vietnamese decimal format (comma as decimal separator)
  • Phone Numbers: Digit-by-digit reading
  • Ordinals: Conversion of ordinal numbers (thα»© 2 β†’ thα»© hai)

Development

Prerequisites

  • Node.js 18+
  • npm or yarn

Installation

npm install

Development Server

npm run dev

Build

npm run build

Preview Production Build

npm run preview

Deployment

The project is configured for Cloudflare Pages deployment:

  1. Models should be stored in a Cloudflare R2 bucket named tts-bucket
  2. Models should be placed in the piper/ prefix
  3. Each model requires two files:
    • {model-name}.onnx - The ONNX model file
    • {model-name}.onnx.json - The model configuration file

The Cloudflare Pages Function at /api/models will automatically discover available models from the R2 bucket.

Configuration

Wrangler Configuration

The wrangler.toml file configures:

  • Pages build output directory
  • R2 bucket binding (piper β†’ tts-bucket)

Model Format

Models must be in Piper TTS ONNX format with:

  • .onnx file containing the ONNX model
  • .onnx.json file containing voice configuration (phoneme_id_map, audio settings, etc.)

Features in Detail

Text Cleaning

  • Removes emojis and special characters
  • Normalizes Unicode (NFC)
  • Handles Vietnamese-specific punctuation
  • Cleans whitespace

Text Chunking

  • Intelligently splits text into optimal chunks
  • Respects sentence boundaries
  • Handles long sentences by splitting at word boundaries
  • Maintains minimum and maximum chunk sizes for optimal processing

Audio Processing

  • Real-time streaming of audio chunks
  • Automatic normalization and peak limiting
  • Silence trimming
  • Sample rate preservation

Browser Compatibility

  • Modern browsers with WebAssembly support
  • Web Workers support required
  • ES Modules support required

πŸ“œ License & Usage

This project is:

  • βœ… Free to use
  • βœ… Open source
  • βœ… Allowed for commercial use
  • βœ… Customizable and deployable

⚠️ Important: Users are responsible for complying with voice and content laws when using generated audio.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 79.1%
  • JavaScript 13.9%
  • Python 5.6%
  • Vue 0.8%
  • C 0.2%
  • CMake 0.2%
  • Other 0.2%