Support quantization for kokoro #77

mmwillet · 2025-06-01T19:05:21Z

I will upload quantized models to the existing hugging face repository.

I cursory performance test on varying prompt lengths demonstrated a minor end-to-end speed improvement of about ~18.8% with Q4 quantization. I spot checked the produced speech and it sounds fine.

danielzgtg · 2025-06-02T00:41:43Z

@mmwillet Please reupload Kokoro_no_espeak.gguf

./quantize --quantized-type 1 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_F16.danielzgtg.gguf
./quantize --quantized-type 2 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q4.danielzgtg.gguf
./quantize --quantized-type 6 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q5.danielzgtg.gguf
./quantize --quantized-type 8 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q8.danielzgtg.gguf
./quantize --quantized-type 1 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_F16.danielzgtg.gguf
./quantize --quantized-type 2 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q4.danielzgtg.gguf
./quantize --quantized-type 6 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q5.danielzgtg.gguf
./quantize --quantized-type 8 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q8.danielzgtg.gguf

home@daniel-desktop3:~/CLionProjects/TTS-cpp/TTS.cpp/cmake-build-release/bin$ sha256sum *.gguf
65ff6a252f1ea6053d7ee28d4bf0b5b9382ca2a40250ff6a8024d2e5859eced6  Kokoro_espeak_F16.danielzgtg.gguf
ea9ea03fd8f794df36c776036e2689e872b00319c474d5e2656b9ede432ed9a5  Kokoro_espeak_Q4.danielzgtg.gguf
88649a63f50c4021cb8ffc7e9506125359f6205685f480539f70f167a7d2625b  Kokoro_espeak_Q5.danielzgtg.gguf
628bfe44fa6b411263607161c6b1178e07a8db65ec81c4e316488f6826d49133  Kokoro_espeak_Q8.danielzgtg.gguf
7529446263e0f3fd1ce95cd0d9bb9c50db7e27077423787ecb65fb20d98f2419  Kokoro_no_espeak_F16.danielzgtg.gguf
70c1c2afa5f2ca60008b180ac942ddaab011371372745e6f45265da3d2a7a852  Kokoro_no_espeak_Q4.danielzgtg.gguf
02c6a917c857724f757e7ec14b0432b99472a131685544fff2d77d8ac5a53cb7  Kokoro_no_espeak_Q5.danielzgtg.gguf
b5f16905757e494f802cffa06f5726b738158e336ceabdc7d9101e7045df65fb  Kokoro_no_espeak_Q8.danielzgtg.gguf
home@daniel-desktop3:~/CLionProjects/llmscripts/Kokoro_GGUF$ sha256sum *.gguf
65ff6a252f1ea6053d7ee28d4bf0b5b9382ca2a40250ff6a8024d2e5859eced6  Kokoro_espeak_F16.gguf
73e3d657c52d6d8359a323da906c1a1dd5ae8f155a37cb25bcb7e9353e38d230  Kokoro_espeak.gguf
ea9ea03fd8f794df36c776036e2689e872b00319c474d5e2656b9ede432ed9a5  Kokoro_espeak_Q4.gguf
88649a63f50c4021cb8ffc7e9506125359f6205685f480539f70f167a7d2625b  Kokoro_espeak_Q5.gguf
628bfe44fa6b411263607161c6b1178e07a8db65ec81c4e316488f6826d49133  Kokoro_espeak_Q8.gguf
e8904fe000b9a24412967ff3a83929bb84c5c5a8b9c98a5c1fae3b37729be714  Kokoro_no_espeak_F16.gguf
c3fa1ae88a3e78d4fae523657879e072603f721dfe1274e20ed8671e2baa1793  Kokoro_no_espeak.gguf
0d3e4182cbe280adc0c2e3beace357092ca285a960ef8543ca64fa4a2c52a61e  Kokoro_no_espeak_Q4.gguf
c6cd6cb7a391e366e8528874ba7708e1992fea4e9994d84e3104fbd7e8afbf8f  Kokoro_no_espeak_Q5.gguf
cfe4d612c6979239e383ca23d95af836ff796f77cf29a6bbc241ce42b867ce05  Kokoro_no_espeak_Q8.gguf

--- gguf-dump-no-espeak-f16.txt 2025-06-01 20:38:44.853047808 -0400
+++ gguf-dump-no-espeak-f16.danielzgtg.txt      2025-06-01 20:40:10.541320648 -0400
@@ -1,13 +1,13 @@
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 94 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
-      2: UINT64     |        1 | GGUF.tensor_count = 797
+      2: UINT64     |        1 | GGUF.tensor_count = 775
       3: UINT64     |        1 | GGUF.kv_count = 91
       4: STRING     |        1 | general.architecture = 'kokoro'
-      5: [STRING]   |       50 | kokoro.voices = ['af_alloy', 'af_aoede', 'af_bella', 'af_heart', 'af_jessica', 'af_kore', ...]
+      5: [STRING]   |       28 | kokoro.voices = ['af_alloy', 'af_aoede', 'af_bella', 'af_heart', 'af_jessica', 'af_kore', ...]
       6: STRING     |        1 | general.type = 'model'
       7: STRING     |        1 | general.name = 'kokoro'
-      8: STRING     |        1 | general.size_label = '88M'
+      8: STRING     |        1 | general.size_label = '85M'
       9: UINT32     |        1 | tokenizer.ggml.padding_token_id = 0
      10: UINT32     |        1 | kokoro.decoder_start_token_id = 0
      11: UINT32     |        1 | kokoro.duration_predictor.albert.context_length = 512
@@ -83,7 +83,7 @@
      81: UINT32     |        1 | kokoro.decoder.generator.up_convs.1.padding = 3
      82: UINT32     |        1 | kokoro.decoder.generator.up_convs.1.stride = 6
      83: UINT32     |        1 | phonemizer.type = 0
-     84: UINT32     |        1 | phonemizer.phoneme_type = 1
+     84: UINT32     |        1 | phonemizer.phoneme_type = 0
      85: [STRING]   |     4422 | phonemizer.graphemes = ['oil', 'ise', 'val', 'dia', 'nii', 'miner', ...]
      86: [STRING]   |   995928 | phonemizer.rules.keys = ['a', 'a.^', 'a.^.$', 'a.^.aro', 'a.^.fa', 'a.^.fe', ...]
      87: [STRING]   |   995928 | phonemizer.rules.phonemes = ['ə', 'ɐ', 'ˈeɪ', 'ˈ', 'ɐ', 'ɐ', ...]
@@ -94,7 +94,7 @@
      92: UINT32     |        1 | tokenizer.ggml.eos_token_id = 0
      93: UINT32     |        1 | general.quantization_version = 2
      94: UINT32     |        1 | general.quantization_type = 1
-* Dumping 797 tensor(s)
+* Dumping 775 tensor(s)
       1:      22784 |   128,   178,     1,     1 | F32     | kokoro.albert.token_embd
       2:      65536 |   128,   512,     1,     1 | F32     | kokoro.albert.position_embd
       3:        128 |   128,     1,     1,     1 | F32     | kokoro.albert.token_type_embd
@@ -870,25 +870,3 @@
     773:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.bm_fable
     774:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.bm_george
     775:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.bm_lewis
-    776:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.ef_dora
-    777:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.em_alex
-    778:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.em_santa
-    779:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.ff_siwis
-    780:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hf_alpha
-    781:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hf_beta
-    782:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hm_omega
-    783:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hm_psi
-    784:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.if_sara
-    785:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.im_nicola
-    786:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_alpha
-    787:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_gongitsune
-    788:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_nezumi
-    789:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_tebukuro
-    790:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jm_kumo
-    791:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.pf_dora
-    792:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.pm_alex
-    793:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.pm_santa
-    794:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaobei
-    795:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaoni
-    796:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaoxiao
-    797:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaoyi

I got a sha256sum match on the espeak versions, but our results for no-espeak are different.

examples/quantize/quantize.cpp

src/tts.cpp

src/kokoro_model.h

danielzgtg

All uploaded .gguf models sound human-indistinguishable for "Hi, how are you?"

src/tts.cpp

ecyht2 · 2025-06-02T07:58:21Z

@mmwillet Consider joining the waitlist for xet. I heard that it is suppose to faster and saves space compared to git LFS https://huggingface.co/xet-team.

Co-authored-by: Daniel Tang <[email protected]>

mmwillet · 2025-06-02T21:06:24Z

@ecyht2 signed up for the waitlist. Thanks for letting me know!

mmwillet · 2025-06-02T21:10:06Z

@danielzgtg thanks for the callout. I think I used an older version of the no_espeak model that was still in my models directory.

danielzgtg · 2025-06-02T21:24:20Z

The code itself LGTM. Merging now to unblock the starting of my args.cpp work; I will refresh mmwillet2/Kokoro_GGUF tomorrow.

mmwillet · 2025-06-02T21:31:28Z

src/tts.cpp

+bool kokoro_is_quantizable(std::string name, struct quantization_params * params) {
+    if (kokoro_is_f16_compatible(name)) {
+        if (has_prefix(name, "kokoro.albert") || has_prefix(name, "kokoro.text_encoder.lstm")) {
+            return true;
+        } else if (has_prefix(name, "kokoro.duration_predictor.")) {
+            std::vector<std::string> parts = split(name, ".");
+            for (std::string part : DURATION_PREDICTOR_QUANTIZATION_COMPATIBLE_PARTS) {
+                if (part == parts[2]) {
+                    return true;
+                }
+            }
+        }
+    }
+    return false;
+}


I'd like to move this responsibility to the discrete model files in the long run.

What a great idea! It was supposed to be the second follow-up "Folder encapsulation and conditional compilation of each model" to #58

mmwillet added 2 commits June 1, 2025 14:59

adds support for quantizing Kokoro

6642eae

small changes

7259ee6

mmwillet requested review from danielzgtg and ecyht2 June 1, 2025 19:05

mmwillet mentioned this pull request Jun 1, 2025

Add and test quantization from Kokoro #15

Closed

danielzgtg reviewed Jun 2, 2025

View reviewed changes

examples/quantize/quantize.cpp Outdated Show resolved Hide resolved

danielzgtg reviewed Jun 2, 2025

View reviewed changes

src/tts.cpp Outdated Show resolved Hide resolved

danielzgtg reviewed Jun 2, 2025

View reviewed changes

src/kokoro_model.h Show resolved Hide resolved

danielzgtg approved these changes Jun 2, 2025

View reviewed changes

ecyht2 reviewed Jun 2, 2025

View reviewed changes

src/tts.cpp Show resolved Hide resolved

ecyht2 approved these changes Jun 2, 2025

View reviewed changes

ecyht2 linked an issue Jun 2, 2025 that may be closed by this pull request

Add and test quantization from Kokoro #15

Closed

mmwillet and others added 2 commits June 2, 2025 13:36

Update examples/quantize/quantize.cpp

159ce59

Co-authored-by: Daniel Tang <[email protected]>

Update src/tts.cpp

d69ec27

Co-authored-by: Daniel Tang <[email protected]>

danielzgtg merged commit 439174e into main Jun 2, 2025
2 checks passed

mmwillet commented Jun 2, 2025

View reviewed changes

mmwillet deleted the support-quantization-for-kokoro branch June 2, 2025 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support quantization for kokoro #77

Support quantization for kokoro #77

Uh oh!

mmwillet commented Jun 1, 2025 •

edited

Loading

Uh oh!

danielzgtg commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielzgtg left a comment

Uh oh!

Uh oh!

ecyht2 commented Jun 2, 2025

Uh oh!

mmwillet commented Jun 2, 2025

Uh oh!

mmwillet commented Jun 2, 2025

Uh oh!

danielzgtg commented Jun 2, 2025

Uh oh!

Uh oh!

mmwillet Jun 2, 2025

Uh oh!

danielzgtg Jun 3, 2025

Uh oh!

Uh oh!

Support quantization for kokoro #77

Support quantization for kokoro #77

Uh oh!

Conversation

mmwillet commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielzgtg commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielzgtg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ecyht2 commented Jun 2, 2025

Uh oh!

mmwillet commented Jun 2, 2025

Uh oh!

mmwillet commented Jun 2, 2025

Uh oh!

danielzgtg commented Jun 2, 2025

Uh oh!

Uh oh!

mmwillet Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

danielzgtg Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mmwillet commented Jun 1, 2025 •

edited

Loading