Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

graph : reuse SSM graphs
#16490 opened Oct 9, 2025 by ggerganov Loading…
Add PaliGemma Support examples ggml changes relating to the ggml tensor library for machine learning Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
#7553 opened May 27, 2024 by abetlen Loading…
server : separate the notion of position and KV tokens, remove prompt truncation breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. examples python python script changes server
#13576 opened May 15, 2025 by ngxson Loading…
feat: add changes to handle jina v2 chinese code python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#7795 opened Jun 6, 2024 by JoanFM Loading…
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization examples ggml changes relating to the ggml tensor library for machine learning
#16653 opened Oct 18, 2025 by JohannesGaessler Loading…
log.h improvements obsolete? Marker for potentially obsolete PR
#3219 opened Sep 16, 2023 by staviq Draft
llama : adds llama-grammar memoization stacks (#4218) examples testing Everything test related
#9833 opened Oct 11, 2024 by clarismiranda Loading…
2 of 4 tasks
Update server.cpp example with correct startup sequence examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#6739 opened Apr 18, 2024 by mann1x Draft
CPUSet support for Windows and Linux bugfix fixes an issue or bug Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#6832 opened Apr 22, 2024 by mann1x Loading…
--numa mirror: mirror model weights to every Numa node in the system Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs devops improvements to build systems and github actions examples ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend python python script changes SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
#16000 opened Sep 15, 2025 by dbsanfte Draft
Implement automatic NGL detection enhancement New feature or request need feedback Testing and feedback with results are needed Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#6502 opened Apr 5, 2024 by SleepyYui Draft
server: implement GLM-style MTP examples hot Something that is hot server
#15225 opened Aug 11, 2025 by F1LM1 Draft
Add complete Megrez-MoE support: GGUF conversion + inference. model Model specific python python script changes
#17141 opened Nov 10, 2025 by tamarPal Loading…
llama : first attempt to implement vision API (WIP) examples python python script changes
#9687 opened Sep 29, 2024 by ngxson Draft
4 of 7 tasks
llamafile : improve moe prompt eval speed on cpu enhancement New feature or request ggml changes relating to the ggml tensor library for machine learning Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
#6840 opened Apr 23, 2024 by jart Loading…
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications ggml changes relating to the ggml tensor library for machine learning
#12727 opened Apr 3, 2025 by bartowski1182 Loading…
ggml-quants : weighted rounding algorithms with cumulative search generation quality Quality of model output ggml changes relating to the ggml tensor library for machine learning Less than 4 bits Efforts related to viable quantized models using <4 bits research 🔬 Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level Tensor Encoding Scheme https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#12557 opened Mar 25, 2025 by compilade Draft
CPU SIMD and pipeline optimizations across vec/mmq/ops/kv-cache/repack ggml changes relating to the ggml tensor library for machine learning
#17113 opened Nov 8, 2025 by NoahOksuz Loading…
Add basic support for function calls in oai python server Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level server/api
#3431 opened Oct 1, 2023 by xaedes Draft
convert : write tensors in parallel performance Speed related topics python python script changes
#12837 opened Apr 8, 2025 by compilade Loading…
3 of 6 tasks
Mamba2 SSD Apple Metal https://en.wikipedia.org/wiki/Metal_(API) examples ggml changes relating to the ggml tensor library for machine learning model Model specific Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
#16982 opened Nov 3, 2025 by gabe-l-hart Draft
PHI3-vision gguf conversion examples ggml changes relating to the ggml tensor library for machine learning python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
#7705 opened Jun 3, 2024 by farris Loading…
Optimize locking behavior threading Parallel processing and thread management
#813 opened Apr 6, 2023 by janekb04 Loading…
ProTip! Exclude everything labeled bug with -label:bug.