Tags: fish23/llama.cpp
Tags
metal : use autoreleasepool to avoid memory leaks (ggml-org#5437) There appears to be a known memory leak when using the `MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in [1,2] [1] https://developer.apple.com/forums/thread/662721 [2] https://forums.developer.apple.com/forums/thread/120931 This change-set wraps the `ggml_metal_graph_compute` in a `@autoreleasepool`. This commit addresses ggml-org#5436
server : fix prompt caching for repeated prompts (ggml-org#5420)
llama : do not cap thread count when MoE on CPU (ggml-org#5419) * Not capping thread count when MoE inference is running on CPU * Whitespace
Fix Vulkan crash on APUs with very little device memory (ggml-org#5424) * Fix Vulkan crash on APUs with very little device memory * Fix debug output function names
Fix f16_sycl cpy call from Arc (ggml-org#5411) * fix f16_sycl cpy call * rm old logic * add fp16 build CI * use macro * format fix
PreviousNext