| TBD. |
#20346 |
Update llama_model_quantize_params |
| b8049 |
#19280 |
Update llama_*_adapter_lora() API |
| b7672 |
#18390 |
Memory margin per device for llama_params_fit |
| b7668 |
#18166 |
Update llama_model_params - new use_direct_io flag |
| b7639 |
#18607 |
Add llama_model_n_embd_out |
| b7628 |
#17004 |
Add backend sampling API |
| b7551 |
#18374 |
Return an enum instead of a boolean to indicate teh result of llama_params_fit |
| b7407 |
#16653 |
Add flag no_alloc to llama_model_param, no change with llama_model_default_params |
| b6976 |
#16928 |
Add llama_model_n_embd_inp |
| TBD |
#15665 |
Remove llama_sampler_init_softmax() + dist sampler no longer implicitly sorts |
| b6239 |
#15472 |
Remove llama_kv_self_... API |
| b6157 |
#15293 |
Add llama_state_seq_..._ext API |
| b5913 |
#14363 |
Update llama_context_params - add bool kv_unified |
| b5740 |
#13037 |
Update llama_model_quantize_params |
| b5870 |
#14631 |
Remove enum llama_vocab_pre_type |
| b5435 |
#13653 |
Remove llama_kv_cache_view_* API |
| b5429 |
#13194 |
Update llama_context_params - add bool swa_full |
| b5311 |
#13284 |
Update llama_context_params - remove logits_all + rearrange flags |
| b5125 |
#12511 |
Update llama_model_quantize_params |
| b5028 |
#11397 |
Update llama_model_params |
| b4882 |
#12181 |
Change llama_kv_cache_... -> llama_kv_self_... |
| b4599 |
#9639 |
Add llama_sampler_init_grammar_lazy to support lazy grammars w/ trigger words & tokens |
| b4524 |
#11016 |
Add name parameter to llama_model_chat_template (uses default template if NULL) |
| b4501 |
#11262 |
Remove rpc_servers from llama_model and llama_model_params |
| b4464 |
#11110 |
Add llama_vocab and rename various structs and calls |
| b4424 |
#11063 |
Update llama_model API naming |
| b4357 |
#10784 |
Remove llama_model_get_tensor() |
| b4337 |
#10803 |
Change llama_sampler_init_penalties() |
| b4282 |
#10446 |
Remove support for Q4_0_N_M model files in favor of automatic repacking of Q4_0 |
| b4167 |
#10497 |
Add devices to llama_model_params |
| b3948 |
#9897 |
Deprecate softmax sampler and update dist sampler` |
| b3988 |
#10071 |
Remove Tail-Free sampling |
| b3943 |
#9745 |
Remove all_pos_0, all_pos_1, all_seq_id from llama_batch |
| b3908 |
#9798 |
Update FIM-related API |
| b3841 |
#9510 |
Add LLAMA_POOLING_TYPE_RANK |
| b3774 |
#9512 |
Add llama_n_head() |
| b3750 |
#9355 |
Add llama_perf API + param to disable internal profiling |
| b3749 |
#9445 |
Add llama_sampler_chain_remove() |
| b3681 |
#9294 |
Major changes to the sampling API (see PR for more info) |
| b3651 |
#8980 |
Add LLAMA_VOCAB_TYPE_RWKV enum value |
| b3644 |
#8672 |
Add llama_threadpool API + change uint32_t -> int32_t |
| b3614 |
#8526 |
Add llama_model_is_recurrent |
Overview
This is a list of changes to the public interface of the
llamalibrary. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into themasterbranch.If you are building a 3rd party project that relies on
libllama, it is recommended to follow this issue and check it before upgrading to new versions.See also:
llama-serverREST APIRecent API changes (most recent at the top)
llama_model_quantize_paramsllama_*_adapter_lora()APIllama_params_fitllama_model_params- newuse_direct_ioflagllama_model_n_embd_outllama_params_fitno_alloctollama_model_param, no change withllama_model_default_paramsllama_model_n_embd_inpllama_sampler_init_softmax()+distsampler no longer implicitly sortsllama_kv_self_...APIllama_state_seq_..._extAPIllama_context_params- addbool kv_unifiedllama_model_quantize_paramsenum llama_vocab_pre_typellama_kv_cache_view_*APIllama_context_params- addbool swa_fullllama_context_params- removelogits_all+ rearrange flagsllama_model_quantize_paramsllama_model_paramsllama_kv_cache_...->llama_kv_self_...rpc_serversfromllama_modelandllama_model_paramsllama_vocaband rename various structs and callsllama_modelAPI namingllama_model_get_tensor()llama_sampler_init_penalties()Q4_0_N_Mmodel files in favor of automatic repacking ofQ4_0devicestollama_model_paramssoftmaxsampler and updatedistsampler`all_pos_0, all_pos_1, all_seq_idfromllama_batchLLAMA_POOLING_TYPE_RANKllama_n_head()llama_perfAPI + param to disable internal profilingllama_sampler_chain_remove()LLAMA_VOCAB_TYPE_RWKVenum valuellama_threadpoolAPI + changeuint32_t->int32_tllama_model_is_recurrentFor older changes, use:
(For collaborators) To link between PR number vs Build number:
Upcoming API changes