changelog : `libllama` API

# Overview

This is a list of changes to the public interface of the `llama` library. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the `master` branch.

If you are building a 3rd party project that relies on `libllama`, it is recommended to follow this issue and check it before upgrading to new versions.

See also:

- [Changelog for `llama-server` REST API](https://github.com/ggerganov/llama.cpp/issues/9291)

## Recent API changes (most recent at the top)

| version | PR  | desc |
| ---     | --- | ---  |
| TBD.  | #20346 | Update `llama_model_quantize_params` |
| b8049 | #19280 | Update `llama_*_adapter_lora()` API |
| b7672 | #18390 | Memory margin per device for `llama_params_fit` |
| b7668 | #18166 | Update `llama_model_params` - new `use_direct_io` flag |
| b7639 | #18607 | Add `llama_model_n_embd_out` |
| b7628 | #17004 | Add backend sampling API |
| b7551 | #18374 | Return an enum instead of a boolean to indicate teh result of `llama_params_fit` |
| b7407 | #16653 | Add flag `no_alloc` to `llama_model_param`, no change with `llama_model_default_params` |
| b6976 | #16928 | Add `llama_model_n_embd_inp` |
| TBD   | #15665 | Remove `llama_sampler_init_softmax()` + `dist` sampler no longer implicitly sorts |
| b6239 | #15472 | Remove `llama_kv_self_...` API |
| b6157 | #15293 | Add `llama_state_seq_..._ext` API | 
| b5913 | #14363 | Update `llama_context_params` - add `bool kv_unified` |
| b5740 | #13037 | Update `llama_model_quantize_params` |
| b5870 | #14631 | Remove `enum llama_vocab_pre_type` |
| b5435 | #13653 | Remove `llama_kv_cache_view_*` API |
| b5429 | #13194 | Update `llama_context_params` - add `bool swa_full` |
| b5311 | #13284 | Update `llama_context_params` - remove `logits_all` + rearrange flags |
| b5125 | #12511 | Update `llama_model_quantize_params` |
| b5028 | #11397 | Update `llama_model_params` |
| b4882 | #12181 | Change `llama_kv_cache_...` -> `llama_kv_self_...` |
| b4599 | #9639 | Add llama_sampler_init_grammar_lazy to support lazy grammars w/ trigger words & tokens |
| b4524 | #11016 | Add name parameter to llama_model_chat_template (uses default template if NULL) |
| b4501  | #11262 | Remove `rpc_servers` from `llama_model` and `llama_model_params` |
| b4464 | #11110 | Add `llama_vocab` and rename various structs and calls |
| b4424 | #11063 | Update `llama_model` API naming | 
| b4357 | #10784 | Remove `llama_model_get_tensor()` |
| b4337 | #10803 | Change `llama_sampler_init_penalties()` |
| b4282 | #10446 | Remove support for `Q4_0_N_M` model files in favor of automatic repacking of `Q4_0` |
| b4167 | #10497 | Add `devices` to `llama_model_params` |
| b3948 | #9897 | Deprecate `softmax` sampler and update `dist` sampler` |
| b3988 | #10071 | Remove Tail-Free sampling |
| b3943 | #9745 | Remove `all_pos_0, all_pos_1, all_seq_id` from `llama_batch` |
| b3908 | #9798 | Update FIM-related API |
| b3841 | #9510 | Add `LLAMA_POOLING_TYPE_RANK` |
| b3774 | #9512 | Add `llama_n_head()` |
| b3750 | #9355 | Add `llama_perf` API + param to disable internal profiling |
| b3749 | #9445 | Add `llama_sampler_chain_remove()` |
| b3681 | #9294 | Major changes to the sampling API (see PR for more info)|
| b3651 | #8980 | Add `LLAMA_VOCAB_TYPE_RWKV` enum value |
| b3644 | #8672 | Add `llama_threadpool` API + change `uint32_t` -> `int32_t` |
| b3614 | #8526 | Add `llama_model_is_recurrent` |

*For older changes, use:*

```bash
git log --oneline -p b3614 -- include/llama.h
```

(For collaborators) To link between PR number vs Build number:

```bash
git log --oneline | tail -r | nl
```

## Upcoming API changes

- TBD


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changelog : `libllama` API #9289

Overview

Recent API changes (most recent at the top)

Upcoming API changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

version	PR	desc
TBD.	#20346	Update `llama_model_quantize_params`
b8049	#19280	Update `llama_*_adapter_lora()` API
b7672	#18390	Memory margin per device for `llama_params_fit`
b7668	#18166	Update `llama_model_params` - new `use_direct_io` flag
b7639	#18607	Add `llama_model_n_embd_out`
b7628	#17004	Add backend sampling API
b7551	#18374	Return an enum instead of a boolean to indicate teh result of `llama_params_fit`
b7407	#16653	Add flag `no_alloc` to `llama_model_param`, no change with `llama_model_default_params`
b6976	#16928	Add `llama_model_n_embd_inp`
TBD	#15665	Remove `llama_sampler_init_softmax()` + `dist` sampler no longer implicitly sorts
b6239	#15472	Remove `llama_kv_self_...` API
b6157	#15293	Add `llama_state_seq_..._ext` API
b5913	#14363	Update `llama_context_params` - add `bool kv_unified`
b5740	#13037	Update `llama_model_quantize_params`
b5870	#14631	Remove `enum llama_vocab_pre_type`
b5435	#13653	Remove `llama_kv_cache_view_*` API
b5429	#13194	Update `llama_context_params` - add `bool swa_full`
b5311	#13284	Update `llama_context_params` - remove `logits_all` + rearrange flags
b5125	#12511	Update `llama_model_quantize_params`
b5028	#11397	Update `llama_model_params`
b4882	#12181	Change `llama_kv_cache_...` -> `llama_kv_self_...`
b4599	#9639	Add llama_sampler_init_grammar_lazy to support lazy grammars w/ trigger words & tokens
b4524	#11016	Add name parameter to llama_model_chat_template (uses default template if NULL)
b4501	#11262	Remove `rpc_servers` from `llama_model` and `llama_model_params`
b4464	#11110	Add `llama_vocab` and rename various structs and calls
b4424	#11063	Update `llama_model` API naming
b4357	#10784	Remove `llama_model_get_tensor()`
b4337	#10803	Change `llama_sampler_init_penalties()`
b4282	#10446	Remove support for `Q4_0_N_M` model files in favor of automatic repacking of `Q4_0`
b4167	#10497	Add `devices` to `llama_model_params`
b3948	#9897	Deprecate `softmax` sampler and update `dist` sampler`
b3988	#10071	Remove Tail-Free sampling
b3943	#9745	Remove `all_pos_0, all_pos_1, all_seq_id` from `llama_batch`
b3908	#9798	Update FIM-related API
b3841	#9510	Add `LLAMA_POOLING_TYPE_RANK`
b3774	#9512	Add `llama_n_head()`
b3750	#9355	Add `llama_perf` API + param to disable internal profiling
b3749	#9445	Add `llama_sampler_chain_remove()`
b3681	#9294	Major changes to the sampling API (see PR for more info)
b3651	#8980	Add `LLAMA_VOCAB_TYPE_RWKV` enum value
b3644	#8672	Add `llama_threadpool` API + change `uint32_t` -> `int32_t`
b3614	#8526	Add `llama_model_is_recurrent`

changelog : libllama API #9289

Description

Overview

Recent API changes (most recent at the top)

Upcoming API changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

changelog : `libllama` API #9289