Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

llama: automatically set runtime parameters such as --n-gpu-layers to fit VRAM ggml changes relating to the ggml tensor library for machine learning
#14067 opened Jun 8, 2025 by JohannesGaessler Draft
Server: add support for "tool_calls" (MeetKai/functionary model) demo Demonstrate some concept or idea, not intended to be merged
#5695 opened Feb 23, 2024 by ngxson Draft
Fix locale-dependent float printing in GGUF metadata examples server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#17331 opened Nov 17, 2025 by ssam18 Loading…
CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
#17094 opened Nov 8, 2025 by AgainstEntropy Loading…
Fuse matrix multiplication + SiLU performance Speed related topics refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#5413 opened Feb 8, 2024 by JohannesGaessler Draft
Introduce Q8_0 and Q4_0 with Bf16 delta values examples ggml changes relating to the ggml tensor library for machine learning python python script changes Review Complexity : High Generally require indepth knowledge of LLMs or GPUs Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#7497 opened May 23, 2024 by Srihari-mcw Loading…
vulkan: optimize mxfp4 ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#15363 opened Aug 16, 2025 by lovedheart Loading…
SOLVE_TRI extension to more dimensions examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs server testing Everything test related
#17793 opened Dec 5, 2025 by pwilkin Loading…
support GLM-4.5V and GLM-4.1V vision models examples help wanted Needs help from the community model Model specific python python script changes
#16600 opened Oct 15, 2025 by ddh0 Draft
Q4_0 scale selection using RMSE enhancement New feature or request Less than 4 bits Efforts related to viable quantized models using <4 bits research 🔬 Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
#835 opened Apr 7, 2023 by sw Draft
ci: add linux binaries to release build
#1505 opened May 17, 2023 by Green-Sky Loading…
Penalty threshold: A mechanism for improving repetition penalties enhancement New feature or request generation quality Quality of model output Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#5561 opened Feb 18, 2024 by p-e-w Loading…
Rebalancing Metal threads workload in dot product kernel kernel_mul_mv_f16_f32_l4 Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#7522 opened May 24, 2024 by izard Loading…
server: Windows 7 compatibility build Compilation issues examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
#8208 opened Jun 29, 2024 by Zor-X-L Loading…
2 of 4 tasks
WIP: ggml-cuda: Add bf16 cuda support to fattn (Flash Attention) examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes
#15261 opened Aug 12, 2025 by eous Loading…
Smooth Sampling / Quadratic Sampling support generation quality Quality of model output performance Speed related topics Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
#6445 opened Apr 2, 2024 by kalomaze Loading…
sycl: flash-attention implementation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
#16969 opened Nov 3, 2025 by ye-NX Loading…
Quantize: specify each major tensor quant in CLI for common LLMs demo Demonstrate some concept or idea, not intended to be merged examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#8917 opened Aug 7, 2024 by Nexesenex Draft
2 of 4 tasks
webgpu : fix build on emscripten build Compilation issues ggml changes relating to the ggml tensor library for machine learning script Script related testing Everything test related
#15826 opened Sep 5, 2025 by ngxson Draft
cmake : set RPATH to $ORIGIN on Linux (#13740) build Compilation issues
#13741 opened May 24, 2025 by sunhaitao Loading…
(draft) tts: Orpheus support examples ggml changes relating to the ggml tensor library for machine learning python python script changes
#12487 opened Mar 21, 2025 by jsrgb Draft
llama.android : Rewrite Android binding android Issues specific to Android documentation Improvements or additions to documentation examples ggml changes relating to the ggml tensor library for machine learning
#17152 opened Nov 10, 2025 by hanyin-arm Loading…
ProTip! Filter pull requests by the default branch with base:master.