Skip to content

Releases: ggml-org/llama.cpp

b6475

14 Sep 21:46
b8e09f0

Choose a tag to compare

model : add grok-2 support (#15539)

* add grok-2 support

* type fix

* type fix

* type fix

* "fix" vocab for invalid sequences

* fix expert tensor mapping and spaces in vocab

* add chat template

* fix norm tensor mapping

* rename layer_out_norm to ffn_post_norm

* ensure ffn_post_norm is mapped

* fix experts merging

* remove erroneous FFN_GATE entry

* concatenate split tensors and add more metadata

* process all expert layers and try cat instead of hstack

* add support for community BPE vocab

* fix expert feed forward length and ffn_down concat

* commit this too

* add ffn_up/gate/down, unsure if sequence is right

* add ffn_gate/down/up to tensor names

* correct residual moe (still not working)

* mess--

* fix embedding scale being applied twice

* add built in chat template

* change beta fast for grok if default value

* remove spm vocab in favor of community bpe vocab

* change attention temp length metadata type to integer

* update attention temp length metadata

* remove comment

* replace M_SQRT2 with std::sqrt(2)

* add yarn metadata, move defaults to hparams

b6474

14 Sep 20:48
6c019cb

Choose a tag to compare

server : only attempt to enable thinking if using jinja (#15967)

b6473

14 Sep 20:24
9dcd200

Choose a tag to compare

metal : remove memory pools (#15966)

* metal : remove mem pool usage

ggml-ci

* metal : remove mem pool implementation

ggml-ci

* metal : take into account the actual allocated memory of the tensor

ggml-ci

* cont : use ggml_backend_buft_get_alloc_size

ggml-ci

* cont : improve, comments

ggml-ci

* cont : add functions for the extra tensor sizes

* metal : add comments

ggml-ci

* metal : implement .get_alloc_size for the rest of the buffer types

ggml-ci

* metal : remove ggml_metal_heap

ggml-ci

b6471

14 Sep 15:55
261e6a2

Choose a tag to compare

Vulkan: Clean up mul_mm shader (#15987)

* vulkan: move mul_mm dequantization steps into a separate file and functions

* improve mul_mm vector load code

* fix debug mode issues and warnings

b6470

14 Sep 15:54
a0e13dc

Choose a tag to compare

build: fix the build failures of Windows HIP release job (#15984)

* build: fix the cache keys for Windows HIP release job

Update the cache keys to include the HIP SDK version, preventing the
use of outdated ROCm installation caches.

* build: sync changes from release.yml to build.yml

- Update HIP SDK version to 25.Q3 and ROCm version to 6.4.2
- Update the cache keys to reflect the new versions

* build: remove Windows HIP release for gfx1151
since the current stable rocWMMA does not support gfx1151.

b6469

14 Sep 14:40
a14bd35

Choose a tag to compare

metal : fix kernel requirements (#15983)

* metal : fix kernel requirements

ggml-ci

* cont : fix supports_op

* cont : fix supports_op for ARGMAX

b6451

11 Sep 21:38
360d653

Choose a tag to compare

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)

* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

b6447

11 Sep 11:15
2b3efea

Choose a tag to compare

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed (#15614)

* kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed

* removes the Whisper-specific check for GET_ROWS support

b6445

10 Sep 20:54
00681df

Choose a tag to compare

CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…

b6444

10 Sep 19:26
4f65885

Choose a tag to compare

llama : support T5 models with unequal number of encoder-decoder laye…