Releases · ggml-org/llama.cpp

14 Sep 21:46

b8e09f0

b6475

model : add grok-2 support (#15539)

* add grok-2 support

* type fix

* type fix

* type fix

* "fix" vocab for invalid sequences

* fix expert tensor mapping and spaces in vocab

* add chat template

* fix norm tensor mapping

* rename layer_out_norm to ffn_post_norm

* ensure ffn_post_norm is mapped

* fix experts merging

* remove erroneous FFN_GATE entry

* concatenate split tensors and add more metadata

* process all expert layers and try cat instead of hstack

* add support for community BPE vocab

* fix expert feed forward length and ffn_down concat

* commit this too

* add ffn_up/gate/down, unsure if sequence is right

* add ffn_gate/down/up to tensor names

* correct residual moe (still not working)

* mess--

* fix embedding scale being applied twice

* add built in chat template

* change beta fast for grok if default value

* remove spm vocab in favor of community bpe vocab

* change attention temp length metadata type to integer

* update attention temp length metadata

* remove comment

* replace M_SQRT2 with std::sqrt(2)

* add yarn metadata, move defaults to hparams

Assets 15

14 Sep 20:48

github-actions

b6474

6c019cb

b6474

server : only attempt to enable thinking if using jinja (#15967)

Assets 15

14 Sep 20:24

github-actions

b6473

9dcd200

b6473

metal : remove memory pools (#15966)

* metal : remove mem pool usage

ggml-ci

* metal : remove mem pool implementation

ggml-ci

* metal : take into account the actual allocated memory of the tensor

ggml-ci

* cont : use ggml_backend_buft_get_alloc_size

ggml-ci

* cont : improve, comments

ggml-ci

* cont : add functions for the extra tensor sizes

* metal : add comments

ggml-ci

* metal : implement .get_alloc_size for the rest of the buffer types

ggml-ci

* metal : remove ggml_metal_heap

ggml-ci

Assets 15

14 Sep 15:55

github-actions

b6471

261e6a2

b6471

Vulkan: Clean up mul_mm shader (#15987)

* vulkan: move mul_mm dequantization steps into a separate file and functions

* improve mul_mm vector load code

* fix debug mode issues and warnings

Assets 15

14 Sep 15:54

github-actions

b6470

a0e13dc

b6470

build: fix the build failures of Windows HIP release job (#15984)

* build: fix the cache keys for Windows HIP release job

Update the cache keys to include the HIP SDK version, preventing the
use of outdated ROCm installation caches.

* build: sync changes from release.yml to build.yml

- Update HIP SDK version to 25.Q3 and ROCm version to 6.4.2
- Update the cache keys to reflect the new versions

* build: remove Windows HIP release for gfx1151
since the current stable rocWMMA does not support gfx1151.

Assets 15

14 Sep 14:40

github-actions

b6469

a14bd35

b6469

metal : fix kernel requirements (#15983)

* metal : fix kernel requirements

ggml-ci

* cont : fix supports_op

* cont : fix supports_op for ARGMAX

Assets 15

11 Sep 21:38

github-actions

b6451

360d653

b6451

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)

* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

Assets 15

11 Sep 11:15

github-actions

b6447

2b3efea

b6447

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed (#15614)

* kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed

* removes the Whisper-specific check for GET_ROWS support

Assets 15

10 Sep 20:54

github-actions

b6445

00681df

b6445

CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…

Assets 15

10 Sep 19:26

github-actions

b6444

4f65885

b6444

llama : support T5 models with unequal number of encoder-decoder laye…

Assets 15

Releases: ggml-org/llama.cpp

b6475

Uh oh!

b6474

Uh oh!

b6473

Uh oh!

b6471

Uh oh!

b6470

Uh oh!

b6469

Uh oh!

b6451

Uh oh!

b6447

Uh oh!

b6445

Uh oh!

b6444

Uh oh!