Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

tokenization: no double BOS tokens refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#7107 opened May 6, 2024 by JohannesGaessler Loading…
Server: add support for "tool_calls" (MeetKai/functionary model) demo Demonstrate some concept or idea, not intended to be merged
#5695 opened Feb 23, 2024 by ngxson Draft
Fix locale-dependent float printing in GGUF metadata examples server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#17331 opened Nov 17, 2025 by ssam18 Loading…
Fuse matrix multiplication + SiLU performance Speed related topics refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#5413 opened Feb 8, 2024 by JohannesGaessler Draft
Introduce Q8_0 and Q4_0 with Bf16 delta values examples ggml changes relating to the ggml tensor library for machine learning python python script changes Review Complexity : High Generally require indepth knowledge of LLMs or GPUs Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#7497 opened May 23, 2024 by Srihari-mcw Loading…
vulkan: optimize mxfp4 ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#15363 opened Aug 16, 2025 by lovedheart Loading…
CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
#17094 opened Nov 8, 2025 by AgainstEntropy Loading…
support GLM-4.5V and GLM-4.1V vision models examples help wanted Needs help from the community model Model specific python python script changes
#16600 opened Oct 15, 2025 by ddh0 Draft
SOLVE_TRI extension to more dimensions examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs server testing Everything test related
#17793 opened Dec 5, 2025 by pwilkin Loading…
Penalty threshold: A mechanism for improving repetition penalties enhancement New feature or request generation quality Quality of model output Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#5561 opened Feb 18, 2024 by p-e-w Loading…
Rebalancing Metal threads workload in dot product kernel kernel_mul_mv_f16_f32_l4 Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#7522 opened May 24, 2024 by izard Loading…
ci: add linux binaries to release build
#1505 opened May 17, 2023 by Green-Sky Loading…
server: Windows 7 compatibility build Compilation issues examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
#8208 opened Jun 29, 2024 by Zor-X-L Loading…
2 of 4 tasks
Smooth Sampling / Quadratic Sampling support generation quality Quality of model output performance Speed related topics Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
#6445 opened Apr 2, 2024 by kalomaze Loading…
WIP: ggml-cuda: Add bf16 cuda support to fattn (Flash Attention) examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes
#15261 opened Aug 12, 2025 by eous Loading…
sycl: flash-attention implementation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
#16969 opened Nov 3, 2025 by ye-NX Loading…
webgpu : fix build on emscripten build Compilation issues ggml changes relating to the ggml tensor library for machine learning script Script related testing Everything test related
#15826 opened Sep 5, 2025 by ngxson Draft
Quantize: specify each major tensor quant in CLI for common LLMs demo Demonstrate some concept or idea, not intended to be merged examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#8917 opened Aug 7, 2024 by Nexesenex Draft
2 of 4 tasks
[Perf] [CPU] eliminate redundant memory access in group query attention ggml changes relating to the ggml tensor library for machine learning
#13319 opened May 5, 2025 by ZelinMa557 Loading…
P-Step Truncation Sampling generation quality Quality of model output need feedback Testing and feedback with results are needed refactoring Refactoring Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
#5675 opened Feb 23, 2024 by p-e-w Loading…
cuda : use amd wave sharing intrinsics for warp_reduce functions performance Speed related topics Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
#6522 opened Apr 7, 2024 by Engininja2 Loading…
ggml : fix race-condition in ggml-rpc ggml changes relating to the ggml tensor library for machine learning
#13600 opened May 17, 2025 by gkpln3 Loading…
ProTip! Add no:assignee to see everything that’s not assigned.