Skip to content

Tags: edude03/llama.cpp

Tags

b3190

Toggle b3190's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common: fix warning (ggml-org#8036)

* common: fix warning

* Update common/common.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b3189

Toggle b3189's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Fix windows build and inference (ggml-org#8003)

* add sycl preset

* fix debug link error. fix windows crash

* update README

b3188

Toggle b3188's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: stream-k decomposition for MMQ (ggml-org#8018)

* CUDA: stream-k decomposition for MMQ

* fix undefined memory reads for small matrices

b3187

Toggle b3187's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : fix `ggml_metal_supports_op` for BF16 (ggml-org#8021)

Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.

b3186

Toggle b3186's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : fix smart slot selection (ggml-org#8020)

b3184

Toggle b3184's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : synchronize threads using barriers (ggml-org#7993)

b3183

Toggle b3183's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
codecov : remove (ggml-org#8004)

b3182

Toggle b3182's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] refactor (ggml-org#6408)

* seperate lower precision GEMM from the main files

* fix workgroup size hardcode

b3181

Toggle b3181's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
tokenizer : BPE fixes (ggml-org#7530)

* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t

b3180

Toggle b3180's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Only use FIM middle token if it exists (ggml-org#7648)

* Only use FIM middle if it exists

* Only use FIM middle if it exists