-
Notifications
You must be signed in to change notification settings - Fork 14k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
llama: automatically set runtime parameters such as --n-gpu-layers to fit VRAM
ggml
changes relating to the ggml tensor library for machine learning
#14067
opened Jun 8, 2025 by
JohannesGaessler
•
Draft
CUDA & CPU: support F32 kernel type for changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
CONV_TRANSPOSE_2D
ggml
#17094
opened Nov 8, 2025 by
AgainstEntropy
Loading…
server: add support for local image path loading for server
examples
python
python script changes
server
#16874
opened Oct 30, 2025 by
cchadowitz
Loading…
Fuse matrix multiplication + SiLU
performance
Speed related topics
refactoring
Refactoring
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#5413
opened Feb 8, 2024 by
JohannesGaessler
•
Draft
Introduce Q8_0 and Q4_0 with Bf16 delta values
examples
ggml
changes relating to the ggml tensor library for machine learning
python
python script changes
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Tensor Encoding Scheme
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#7497
opened May 23, 2024 by
Srihari-mcw
Loading…
vulkan: optimize mxfp4
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15363
opened Aug 16, 2025 by
lovedheart
Loading…
SOLVE_TRI extension to more dimensions
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
server
testing
Everything test related
#17793
opened Dec 5, 2025 by
pwilkin
Loading…
support GLM-4.5V and GLM-4.1V vision models
examples
help wanted
Needs help from the community
model
Model specific
python
python script changes
model : Fix marker placement for LFM2-VL in single turn llama-mtmd-cli
examples
#17616
opened Nov 30, 2025 by
tdakhran
Loading…
Q4_0 scale selection using RMSE
enhancement
New feature or request
Less than 4 bits
Efforts related to viable quantized models using <4 bits
research 🔬
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Penalty threshold: A mechanism for improving repetition penalties
enhancement
New feature or request
generation quality
Quality of model output
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#5561
opened Feb 18, 2024 by
p-e-w
Loading…
Rebalancing Metal threads workload in dot product kernel kernel_mul_mv_f16_f32_l4
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#7522
opened May 24, 2024 by
izard
Loading…
server: Windows 7 compatibility
build
Compilation issues
examples
Review Complexity : Low
Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
server
#8208
opened Jun 29, 2024 by
Zor-X-L
Loading…
2 of 4 tasks
WIP: ggml-cuda: Add bf16 cuda support to fattn (Flash Attention)
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
#15261
opened Aug 12, 2025 by
eous
Loading…
Smooth Sampling / Quadratic Sampling support
generation quality
Quality of model output
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6445
opened Apr 2, 2024 by
kalomaze
Loading…
Quantize: specify each major tensor quant in CLI for common LLMs
demo
Demonstrate some concept or idea, not intended to be merged
examples
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
cmake : set Compilation issues
RPATH to $ORIGIN on Linux (#13740)
build
#13741
opened May 24, 2025 by
sunhaitao
Loading…
llama.android : Rewrite Android binding
android
Issues specific to Android
documentation
Improvements or additions to documentation
examples
ggml
changes relating to the ggml tensor library for machine learning
#17152
opened Nov 10, 2025 by
hanyin-arm
Loading…
ProTip!
Filter pull requests by the default branch with base:master.