Skip to content

Tags: fish23/llama.cpp

Tags

b2116

Toggle b2116's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : use autoreleasepool to avoid memory leaks (ggml-org#5437)

There appears to be a known memory leak when using the
`MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in
[1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

This change-set wraps the `ggml_metal_graph_compute` in a
`@autoreleasepool`.

This commit addresses ggml-org#5436

b2114

Toggle b2114's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
sync : ggml

b2110

Toggle b2110's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : fix prompt caching for repeated prompts (ggml-org#5420)

b2109

Toggle b2109's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : do not cap thread count when MoE on CPU (ggml-org#5419)

* Not capping thread count when MoE inference is running on CPU

* Whitespace

b2107

Toggle b2107's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : fix `error C2078: too many initializers` for MSVC ARM64 (ggml-…

…org#5404)

b2106

Toggle b2106's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix Vulkan crash on APUs with very little device memory (ggml-org#5424)

* Fix Vulkan crash on APUs with very little device memory

* Fix debug output function names

b2105

Toggle b2105's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: more warps for mmvq on NVIDIA (ggml-org#5394)

b2104

Toggle b2104's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : do not print "offloading layers" message in CPU-only builds (g…

…gml-org#5416)

b2103

Toggle b2103's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix f16_sycl cpy call from Arc (ggml-org#5411)

* fix f16_sycl cpy call

* rm old logic

* add fp16 build CI

* use macro

* format fix

b2101

Toggle b2101's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix trailing whitespace (ggml-org#5407)