Skip to content

Conversation

mmwillet
Copy link
Owner

@mmwillet mmwillet commented Jun 1, 2025

I will upload quantized models to the existing hugging face repository.

I cursory performance test on varying prompt lengths demonstrated a minor end-to-end speed improvement of about ~18.8% with Q4 quantization. I spot checked the produced speech and it sounds fine.

@danielzgtg
Copy link
Collaborator

@mmwillet Please reupload Kokoro_no_espeak.gguf

./quantize --quantized-type 1 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_F16.danielzgtg.gguf
./quantize --quantized-type 2 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q4.danielzgtg.gguf
./quantize --quantized-type 6 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q5.danielzgtg.gguf
./quantize --quantized-type 8 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q8.danielzgtg.gguf
./quantize --quantized-type 1 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_F16.danielzgtg.gguf
./quantize --quantized-type 2 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q4.danielzgtg.gguf
./quantize --quantized-type 6 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q5.danielzgtg.gguf
./quantize --quantized-type 8 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q8.danielzgtg.gguf
home@daniel-desktop3:~/CLionProjects/TTS-cpp/TTS.cpp/cmake-build-release/bin$ sha256sum *.gguf
65ff6a252f1ea6053d7ee28d4bf0b5b9382ca2a40250ff6a8024d2e5859eced6  Kokoro_espeak_F16.danielzgtg.gguf
ea9ea03fd8f794df36c776036e2689e872b00319c474d5e2656b9ede432ed9a5  Kokoro_espeak_Q4.danielzgtg.gguf
88649a63f50c4021cb8ffc7e9506125359f6205685f480539f70f167a7d2625b  Kokoro_espeak_Q5.danielzgtg.gguf
628bfe44fa6b411263607161c6b1178e07a8db65ec81c4e316488f6826d49133  Kokoro_espeak_Q8.danielzgtg.gguf
7529446263e0f3fd1ce95cd0d9bb9c50db7e27077423787ecb65fb20d98f2419  Kokoro_no_espeak_F16.danielzgtg.gguf
70c1c2afa5f2ca60008b180ac942ddaab011371372745e6f45265da3d2a7a852  Kokoro_no_espeak_Q4.danielzgtg.gguf
02c6a917c857724f757e7ec14b0432b99472a131685544fff2d77d8ac5a53cb7  Kokoro_no_espeak_Q5.danielzgtg.gguf
b5f16905757e494f802cffa06f5726b738158e336ceabdc7d9101e7045df65fb  Kokoro_no_espeak_Q8.danielzgtg.gguf
home@daniel-desktop3:~/CLionProjects/llmscripts/Kokoro_GGUF$ sha256sum *.gguf
65ff6a252f1ea6053d7ee28d4bf0b5b9382ca2a40250ff6a8024d2e5859eced6  Kokoro_espeak_F16.gguf
73e3d657c52d6d8359a323da906c1a1dd5ae8f155a37cb25bcb7e9353e38d230  Kokoro_espeak.gguf
ea9ea03fd8f794df36c776036e2689e872b00319c474d5e2656b9ede432ed9a5  Kokoro_espeak_Q4.gguf
88649a63f50c4021cb8ffc7e9506125359f6205685f480539f70f167a7d2625b  Kokoro_espeak_Q5.gguf
628bfe44fa6b411263607161c6b1178e07a8db65ec81c4e316488f6826d49133  Kokoro_espeak_Q8.gguf
e8904fe000b9a24412967ff3a83929bb84c5c5a8b9c98a5c1fae3b37729be714  Kokoro_no_espeak_F16.gguf
c3fa1ae88a3e78d4fae523657879e072603f721dfe1274e20ed8671e2baa1793  Kokoro_no_espeak.gguf
0d3e4182cbe280adc0c2e3beace357092ca285a960ef8543ca64fa4a2c52a61e  Kokoro_no_espeak_Q4.gguf
c6cd6cb7a391e366e8528874ba7708e1992fea4e9994d84e3104fbd7e8afbf8f  Kokoro_no_espeak_Q5.gguf
cfe4d612c6979239e383ca23d95af836ff796f77cf29a6bbc241ce42b867ce05  Kokoro_no_espeak_Q8.gguf
--- gguf-dump-no-espeak-f16.txt 2025-06-01 20:38:44.853047808 -0400
+++ gguf-dump-no-espeak-f16.danielzgtg.txt      2025-06-01 20:40:10.541320648 -0400
@@ -1,13 +1,13 @@
 * File is LITTLE endian, script is running on a LITTLE endian host.
 * Dumping 94 key/value pair(s)
       1: UINT32     |        1 | GGUF.version = 3
-      2: UINT64     |        1 | GGUF.tensor_count = 797
+      2: UINT64     |        1 | GGUF.tensor_count = 775
       3: UINT64     |        1 | GGUF.kv_count = 91
       4: STRING     |        1 | general.architecture = 'kokoro'
-      5: [STRING]   |       50 | kokoro.voices = ['af_alloy', 'af_aoede', 'af_bella', 'af_heart', 'af_jessica', 'af_kore', ...]
+      5: [STRING]   |       28 | kokoro.voices = ['af_alloy', 'af_aoede', 'af_bella', 'af_heart', 'af_jessica', 'af_kore', ...]
       6: STRING     |        1 | general.type = 'model'
       7: STRING     |        1 | general.name = 'kokoro'
-      8: STRING     |        1 | general.size_label = '88M'
+      8: STRING     |        1 | general.size_label = '85M'
       9: UINT32     |        1 | tokenizer.ggml.padding_token_id = 0
      10: UINT32     |        1 | kokoro.decoder_start_token_id = 0
      11: UINT32     |        1 | kokoro.duration_predictor.albert.context_length = 512
@@ -83,7 +83,7 @@
      81: UINT32     |        1 | kokoro.decoder.generator.up_convs.1.padding = 3
      82: UINT32     |        1 | kokoro.decoder.generator.up_convs.1.stride = 6
      83: UINT32     |        1 | phonemizer.type = 0
-     84: UINT32     |        1 | phonemizer.phoneme_type = 1
+     84: UINT32     |        1 | phonemizer.phoneme_type = 0
      85: [STRING]   |     4422 | phonemizer.graphemes = ['oil', 'ise', 'val', 'dia', 'nii', 'miner', ...]
      86: [STRING]   |   995928 | phonemizer.rules.keys = ['a', 'a.^', 'a.^.$', 'a.^.aro', 'a.^.fa', 'a.^.fe', ...]
      87: [STRING]   |   995928 | phonemizer.rules.phonemes = ['ə', 'ɐ', 'ˈeɪ', 'ˈ', 'ɐ', 'ɐ', ...]
@@ -94,7 +94,7 @@
      92: UINT32     |        1 | tokenizer.ggml.eos_token_id = 0
      93: UINT32     |        1 | general.quantization_version = 2
      94: UINT32     |        1 | general.quantization_type = 1
-* Dumping 797 tensor(s)
+* Dumping 775 tensor(s)
       1:      22784 |   128,   178,     1,     1 | F32     | kokoro.albert.token_embd
       2:      65536 |   128,   512,     1,     1 | F32     | kokoro.albert.position_embd
       3:        128 |   128,     1,     1,     1 | F32     | kokoro.albert.token_type_embd
@@ -870,25 +870,3 @@
     773:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.bm_fable
     774:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.bm_george
     775:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.bm_lewis
-    776:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.ef_dora
-    777:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.em_alex
-    778:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.em_santa
-    779:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.ff_siwis
-    780:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hf_alpha
-    781:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hf_beta
-    782:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hm_omega
-    783:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.hm_psi
-    784:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.if_sara
-    785:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.im_nicola
-    786:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_alpha
-    787:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_gongitsune
-    788:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_nezumi
-    789:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jf_tebukuro
-    790:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.jm_kumo
-    791:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.pf_dora
-    792:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.pm_alex
-    793:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.pm_santa
-    794:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaobei
-    795:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaoni
-    796:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaoxiao
-    797:     130560 |   256,   510,     1,     1 | F32     | kokoro.voice_tensors.zf_xiaoyi

I got a sha256sum match on the espeak versions, but our results for no-espeak are different.

Copy link
Collaborator

@danielzgtg danielzgtg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All uploaded .gguf models sound human-indistinguishable for "Hi, how are you?"

@ecyht2
Copy link
Collaborator

ecyht2 commented Jun 2, 2025

@mmwillet Consider joining the waitlist for xet. I heard that it is suppose to faster and saves space compared to git LFS https://huggingface.co/xet-team.

@ecyht2 ecyht2 linked an issue Jun 2, 2025 that may be closed by this pull request
mmwillet and others added 2 commits June 2, 2025 13:36
@mmwillet
Copy link
Owner Author

mmwillet commented Jun 2, 2025

@ecyht2 signed up for the waitlist. Thanks for letting me know!

@mmwillet
Copy link
Owner Author

mmwillet commented Jun 2, 2025

@danielzgtg thanks for the callout. I think I used an older version of the no_espeak model that was still in my models directory.

@danielzgtg
Copy link
Collaborator

The code itself LGTM. Merging now to unblock the starting of my args.cpp work; I will refresh mmwillet2/Kokoro_GGUF tomorrow.

@danielzgtg danielzgtg merged commit 439174e into main Jun 2, 2025
2 checks passed
Comment on lines +154 to +168
bool kokoro_is_quantizable(std::string name, struct quantization_params * params) {
if (kokoro_is_f16_compatible(name)) {
if (has_prefix(name, "kokoro.albert") || has_prefix(name, "kokoro.text_encoder.lstm")) {
return true;
} else if (has_prefix(name, "kokoro.duration_predictor.")) {
std::vector<std::string> parts = split(name, ".");
for (std::string part : DURATION_PREDICTOR_QUANTIZATION_COMPATIBLE_PARTS) {
if (part == parts[2]) {
return true;
}
}
}
}
return false;
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move this responsibility to the discrete model files in the long run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a great idea! It was supposed to be the second follow-up "Folder encapsulation and conditional compilation of each model" to #58

@mmwillet mmwillet deleted the support-quantization-for-kokoro branch June 2, 2025 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add and test quantization from Kokoro
3 participants