-
Notifications
You must be signed in to change notification settings - Fork 14k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add PaliGemma Support
examples
ggml
changes relating to the ggml tensor library for machine learning
Review Complexity : Low
Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
#7553
opened May 27, 2024 by
abetlen
Loading…
server : separate the notion of position and KV tokens, remove prompt truncation
breaking change
Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility.
examples
python
python script changes
server
#13576
opened May 15, 2025 by
ngxson
Loading…
feat: add changes to handle jina v2 chinese code
python
python script changes
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#7795
opened Jun 6, 2024 by
JoanFM
Loading…
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
examples
ggml
changes relating to the ggml tensor library for machine learning
#16653
opened Oct 18, 2025 by
JohannesGaessler
Loading…
llama : adds llama-grammar memoization stacks (#4218)
examples
testing
Everything test related
#9833
opened Oct 11, 2024 by
clarismiranda
Loading…
2 of 4 tasks
Update server.cpp example with correct startup sequence
examples
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
CPUSet support for Windows and Linux
bugfix
fixes an issue or bug
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#6832
opened Apr 22, 2024 by
mann1x
Loading…
--numa mirror: mirror model weights to every Numa node in the system
Apple Metal
Implement automatic NGL detection
enhancement
New feature or request
need feedback
Testing and feedback with results are needed
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
Add complete Megrez-MoE support: GGUF conversion + inference.
model
Model specific
python
python script changes
#17141
opened Nov 10, 2025 by
tamarPal
Loading…
llamafile : improve moe prompt eval speed on cpu
enhancement
New feature or request
ggml
changes relating to the ggml tensor library for machine learning
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#6840
opened Apr 23, 2024 by
jart
Loading…
contrastive: PoC for improving reasoning via contrastive decoding
#3984
opened Nov 8, 2023 by
trabbart
Loading…
Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example.
examples
python
python script changes
#15262
opened Aug 12, 2025 by
tc-mb
Loading…
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
ggml
changes relating to the ggml tensor library for machine learning
#12727
opened Apr 3, 2025 by
bartowski1182
Loading…
ggml-quants : weighted rounding algorithms with cumulative search
generation quality
Quality of model output
ggml
changes relating to the ggml tensor library for machine learning
Less than 4 bits
Efforts related to viable quantized models using <4 bits
research 🔬
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
Tensor Encoding Scheme
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
CPU SIMD and pipeline optimizations across vec/mmq/ops/kv-cache/repack
ggml
changes relating to the ggml tensor library for machine learning
#17113
opened Nov 8, 2025 by
NoahOksuz
Loading…
Add basic support for function calls in oai python server
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
server/api
convert : write tensors in parallel
performance
Speed related topics
python
python script changes
#12837
opened Apr 8, 2025 by
compilade
Loading…
3 of 6 tasks
Mamba2 SSD
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
#16982
opened Nov 3, 2025 by
gabe-l-hart
•
Draft
PHI3-vision gguf conversion
examples
ggml
changes relating to the ggml tensor library for machine learning
python
python script changes
Review Complexity : Low
Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
#7705
opened Jun 3, 2024 by
farris
Loading…
Optimize locking behavior
threading
Parallel processing and thread management
#813
opened Apr 6, 2023 by
janekb04
Loading…
ProTip!
Exclude everything labeled
bug with -label:bug.