-
Notifications
You must be signed in to change notification settings - Fork 14k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
tokenization: no double BOS tokens
refactoring
Refactoring
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#7107
opened May 6, 2024 by
JohannesGaessler
Loading…
Fuse matrix multiplication + SiLU
performance
Speed related topics
refactoring
Refactoring
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#5413
opened Feb 8, 2024 by
JohannesGaessler
•
Draft
Introduce Q8_0 and Q4_0 with Bf16 delta values
examples
ggml
changes relating to the ggml tensor library for machine learning
python
python script changes
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Tensor Encoding Scheme
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#7497
opened May 23, 2024 by
Srihari-mcw
Loading…
vulkan: optimize mxfp4
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15363
opened Aug 16, 2025 by
lovedheart
Loading…
server: add support for local image path loading for server
examples
python
python script changes
server
#16874
opened Oct 30, 2025 by
cchadowitz
Loading…
CUDA & CPU: support F32 kernel type for changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
CONV_TRANSPOSE_2D
ggml
#17094
opened Nov 8, 2025 by
AgainstEntropy
Loading…
support GLM-4.5V and GLM-4.1V vision models
examples
help wanted
Needs help from the community
model
Model specific
python
python script changes
SOLVE_TRI extension to more dimensions
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
server
testing
Everything test related
#17793
opened Dec 5, 2025 by
pwilkin
Loading…
Penalty threshold: A mechanism for improving repetition penalties
enhancement
New feature or request
generation quality
Quality of model output
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#5561
opened Feb 18, 2024 by
p-e-w
Loading…
Rebalancing Metal threads workload in dot product kernel kernel_mul_mv_f16_f32_l4
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#7522
opened May 24, 2024 by
izard
Loading…
server: Windows 7 compatibility
build
Compilation issues
examples
Review Complexity : Low
Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
server
#8208
opened Jun 29, 2024 by
Zor-X-L
Loading…
2 of 4 tasks
model : Fix marker placement for LFM2-VL in single turn llama-mtmd-cli
examples
#17616
opened Nov 30, 2025 by
tdakhran
Loading…
Smooth Sampling / Quadratic Sampling support
generation quality
Quality of model output
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6445
opened Apr 2, 2024 by
kalomaze
Loading…
WIP: ggml-cuda: Add bf16 cuda support to fattn (Flash Attention)
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
#15261
opened Aug 12, 2025 by
eous
Loading…
Quantize: specify each major tensor quant in CLI for common LLMs
demo
Demonstrate some concept or idea, not intended to be merged
examples
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
[Perf] [CPU] eliminate redundant memory access in group query attention
ggml
changes relating to the ggml tensor library for machine learning
#13319
opened May 5, 2025 by
ZelinMa557
Loading…
P-Step Truncation Sampling
generation quality
Quality of model output
need feedback
Testing and feedback with results are needed
refactoring
Refactoring
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#5675
opened Feb 23, 2024 by
p-e-w
Loading…
cuda : use amd wave sharing intrinsics for warp_reduce functions
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6522
opened Apr 7, 2024 by
Engininja2
Loading…
ggml : fix race-condition in ggml-rpc
ggml
changes relating to the ggml tensor library for machine learning
#13600
opened May 17, 2025 by
gkpln3
Loading…
ProTip!
Add no:assignee to see everything that’s not assigned.