-
Notifications
You must be signed in to change notification settings - Fork 19
Support quantization for kokoro #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mmwillet Please reupload ./quantize --quantized-type 1 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_F16.danielzgtg.gguf
./quantize --quantized-type 2 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q4.danielzgtg.gguf
./quantize --quantized-type 6 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q5.danielzgtg.gguf
./quantize --quantized-type 8 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_no_espeak.gguf --quantized-model-path ./Kokoro_no_espeak_Q8.danielzgtg.gguf
./quantize --quantized-type 1 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_F16.danielzgtg.gguf
./quantize --quantized-type 2 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q4.danielzgtg.gguf
./quantize --quantized-type 6 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q5.danielzgtg.gguf
./quantize --quantized-type 8 --convert-non-quantized-to-f16 --model-path ~/CLionProjects/llmscripts/Kokoro_GGUF/Kokoro_espeak.gguf --quantized-model-path ./Kokoro_espeak_Q8.danielzgtg.gguf home@daniel-desktop3:~/CLionProjects/TTS-cpp/TTS.cpp/cmake-build-release/bin$ sha256sum *.gguf
65ff6a252f1ea6053d7ee28d4bf0b5b9382ca2a40250ff6a8024d2e5859eced6 Kokoro_espeak_F16.danielzgtg.gguf
ea9ea03fd8f794df36c776036e2689e872b00319c474d5e2656b9ede432ed9a5 Kokoro_espeak_Q4.danielzgtg.gguf
88649a63f50c4021cb8ffc7e9506125359f6205685f480539f70f167a7d2625b Kokoro_espeak_Q5.danielzgtg.gguf
628bfe44fa6b411263607161c6b1178e07a8db65ec81c4e316488f6826d49133 Kokoro_espeak_Q8.danielzgtg.gguf
7529446263e0f3fd1ce95cd0d9bb9c50db7e27077423787ecb65fb20d98f2419 Kokoro_no_espeak_F16.danielzgtg.gguf
70c1c2afa5f2ca60008b180ac942ddaab011371372745e6f45265da3d2a7a852 Kokoro_no_espeak_Q4.danielzgtg.gguf
02c6a917c857724f757e7ec14b0432b99472a131685544fff2d77d8ac5a53cb7 Kokoro_no_espeak_Q5.danielzgtg.gguf
b5f16905757e494f802cffa06f5726b738158e336ceabdc7d9101e7045df65fb Kokoro_no_espeak_Q8.danielzgtg.gguf
home@daniel-desktop3:~/CLionProjects/llmscripts/Kokoro_GGUF$ sha256sum *.gguf
65ff6a252f1ea6053d7ee28d4bf0b5b9382ca2a40250ff6a8024d2e5859eced6 Kokoro_espeak_F16.gguf
73e3d657c52d6d8359a323da906c1a1dd5ae8f155a37cb25bcb7e9353e38d230 Kokoro_espeak.gguf
ea9ea03fd8f794df36c776036e2689e872b00319c474d5e2656b9ede432ed9a5 Kokoro_espeak_Q4.gguf
88649a63f50c4021cb8ffc7e9506125359f6205685f480539f70f167a7d2625b Kokoro_espeak_Q5.gguf
628bfe44fa6b411263607161c6b1178e07a8db65ec81c4e316488f6826d49133 Kokoro_espeak_Q8.gguf
e8904fe000b9a24412967ff3a83929bb84c5c5a8b9c98a5c1fae3b37729be714 Kokoro_no_espeak_F16.gguf
c3fa1ae88a3e78d4fae523657879e072603f721dfe1274e20ed8671e2baa1793 Kokoro_no_espeak.gguf
0d3e4182cbe280adc0c2e3beace357092ca285a960ef8543ca64fa4a2c52a61e Kokoro_no_espeak_Q4.gguf
c6cd6cb7a391e366e8528874ba7708e1992fea4e9994d84e3104fbd7e8afbf8f Kokoro_no_espeak_Q5.gguf
cfe4d612c6979239e383ca23d95af836ff796f77cf29a6bbc241ce42b867ce05 Kokoro_no_espeak_Q8.gguf --- gguf-dump-no-espeak-f16.txt 2025-06-01 20:38:44.853047808 -0400
+++ gguf-dump-no-espeak-f16.danielzgtg.txt 2025-06-01 20:40:10.541320648 -0400
@@ -1,13 +1,13 @@
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 94 key/value pair(s)
1: UINT32 | 1 | GGUF.version = 3
- 2: UINT64 | 1 | GGUF.tensor_count = 797
+ 2: UINT64 | 1 | GGUF.tensor_count = 775
3: UINT64 | 1 | GGUF.kv_count = 91
4: STRING | 1 | general.architecture = 'kokoro'
- 5: [STRING] | 50 | kokoro.voices = ['af_alloy', 'af_aoede', 'af_bella', 'af_heart', 'af_jessica', 'af_kore', ...]
+ 5: [STRING] | 28 | kokoro.voices = ['af_alloy', 'af_aoede', 'af_bella', 'af_heart', 'af_jessica', 'af_kore', ...]
6: STRING | 1 | general.type = 'model'
7: STRING | 1 | general.name = 'kokoro'
- 8: STRING | 1 | general.size_label = '88M'
+ 8: STRING | 1 | general.size_label = '85M'
9: UINT32 | 1 | tokenizer.ggml.padding_token_id = 0
10: UINT32 | 1 | kokoro.decoder_start_token_id = 0
11: UINT32 | 1 | kokoro.duration_predictor.albert.context_length = 512
@@ -83,7 +83,7 @@
81: UINT32 | 1 | kokoro.decoder.generator.up_convs.1.padding = 3
82: UINT32 | 1 | kokoro.decoder.generator.up_convs.1.stride = 6
83: UINT32 | 1 | phonemizer.type = 0
- 84: UINT32 | 1 | phonemizer.phoneme_type = 1
+ 84: UINT32 | 1 | phonemizer.phoneme_type = 0
85: [STRING] | 4422 | phonemizer.graphemes = ['oil', 'ise', 'val', 'dia', 'nii', 'miner', ...]
86: [STRING] | 995928 | phonemizer.rules.keys = ['a', 'a.^', 'a.^.$', 'a.^.aro', 'a.^.fa', 'a.^.fe', ...]
87: [STRING] | 995928 | phonemizer.rules.phonemes = ['ə', 'ɐ', 'ˈeɪ', 'ˈ', 'ɐ', 'ɐ', ...]
@@ -94,7 +94,7 @@
92: UINT32 | 1 | tokenizer.ggml.eos_token_id = 0
93: UINT32 | 1 | general.quantization_version = 2
94: UINT32 | 1 | general.quantization_type = 1
-* Dumping 797 tensor(s)
+* Dumping 775 tensor(s)
1: 22784 | 128, 178, 1, 1 | F32 | kokoro.albert.token_embd
2: 65536 | 128, 512, 1, 1 | F32 | kokoro.albert.position_embd
3: 128 | 128, 1, 1, 1 | F32 | kokoro.albert.token_type_embd
@@ -870,25 +870,3 @@
773: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.bm_fable
774: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.bm_george
775: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.bm_lewis
- 776: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.ef_dora
- 777: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.em_alex
- 778: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.em_santa
- 779: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.ff_siwis
- 780: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.hf_alpha
- 781: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.hf_beta
- 782: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.hm_omega
- 783: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.hm_psi
- 784: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.if_sara
- 785: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.im_nicola
- 786: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.jf_alpha
- 787: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.jf_gongitsune
- 788: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.jf_nezumi
- 789: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.jf_tebukuro
- 790: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.jm_kumo
- 791: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.pf_dora
- 792: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.pm_alex
- 793: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.pm_santa
- 794: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.zf_xiaobei
- 795: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.zf_xiaoni
- 796: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.zf_xiaoxiao
- 797: 130560 | 256, 510, 1, 1 | F32 | kokoro.voice_tensors.zf_xiaoyi I got a sha256sum match on the espeak versions, but our results for no-espeak are different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All uploaded .gguf
models sound human-indistinguishable for "Hi, how are you?"
@mmwillet Consider joining the waitlist for |
Co-authored-by: Daniel Tang <[email protected]>
Co-authored-by: Daniel Tang <[email protected]>
@ecyht2 signed up for the waitlist. Thanks for letting me know! |
@danielzgtg thanks for the callout. I think I used an older version of the no_espeak model that was still in my models directory. |
The code itself LGTM. Merging now to unblock the starting of my args.cpp work; I will refresh mmwillet2/Kokoro_GGUF tomorrow. |
bool kokoro_is_quantizable(std::string name, struct quantization_params * params) { | ||
if (kokoro_is_f16_compatible(name)) { | ||
if (has_prefix(name, "kokoro.albert") || has_prefix(name, "kokoro.text_encoder.lstm")) { | ||
return true; | ||
} else if (has_prefix(name, "kokoro.duration_predictor.")) { | ||
std::vector<std::string> parts = split(name, "."); | ||
for (std::string part : DURATION_PREDICTOR_QUANTIZATION_COMPATIBLE_PARTS) { | ||
if (part == parts[2]) { | ||
return true; | ||
} | ||
} | ||
} | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to move this responsibility to the discrete model files in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a great idea! It was supposed to be the second follow-up "Folder encapsulation and conditional compilation of each model" to #58
I will upload quantized models to the existing hugging face repository.
I cursory performance test on varying prompt lengths demonstrated a minor end-to-end speed improvement of about ~18.8% with Q4 quantization. I spot checked the produced speech and it sounds fine.