Skip to content

Tags: zhudy/llama.cpp

Tags

b5124

Toggle b5124's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common : Define cache directory on AIX (ggml-org#12915)

b4857

Toggle b4857's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
authors : update (ggml-org#12271)

b2382

Toggle b2382's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server: benchmark: chat/completions scenario and other llm servers co…

…mparison (ggml-org#5941)

* server: bench: Init a bench scenario with K6
See ggml-org#5827

* server: bench: EOL EOF

* server: bench: PR feedback and improved k6 script configuration

* server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading

server: bench: add max_tokens from SERVER_BENCH_MAX_TOKENS

server: bench: increase truncated rate to 80% before failing

* server: bench: fix doc

* server: bench: change gauge custom metrics to trend

* server: bench: change gauge custom metrics to trend
server: bench: add trend custom metrics for total tokens per second average

* server: bench: doc add an option to debug http request

* server: bench: filter dataset too short and too long sequences

* server: bench: allow to filter out conversation in the dataset based on env variable

* server: bench: fix assistant message sent instead of user message

* server: bench: fix assistant message sent instead of user message

* server : add defrag thold parameter

* server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b1569

Toggle b1569's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
lookahead : support `-n -1` infinite generation

b1567

Toggle b1567's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
lookahead : add example for lookahead decoding (ggml-org#4207)

* lookahead : init

* lookahead : generate and store n-grams

* lookahead : use loop instead recursion to generate n-grams

* lookahead : initial working implementation

* lookahead : filter repeating n-grams

* lookahead : use deterministic init

* lookahead : add to Makefile

* lookahead : fix a bug in the seq_id of the lookahead tokens

* lookahead : add comments

---------

Co-authored-by: slaren <[email protected]>

b1566

Toggle b1566's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
metal : fix yarn (ggml-org#4220)

get the correct n_orig_ctx in metal

b1564

Toggle b1564's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : grammar `reserve` space in `decode_utf8` (ggml-org#4210)

* reserve space for codepoints

* improvement for the appended 0

b1563

Toggle b1563's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (g…

…gml-org#4189)

b1561

Toggle b1561's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
server : OAI API compatibility (ggml-org#4198)

* Add openai-compatible POST /v1/chat/completions API endpoint to server example

* fix code style

* Update server README.md

* Improve server README.md

* Fix server.cpp code style according to review

* server : some style changes

* server : indentation

* server : enable special tokens during tokenization by default

* server : minor code style

* server : change random string generator

* straightforward /v1/models endpoint

---------

Co-authored-by: kir-gadjello <[email protected]>
Co-authored-by: Tobi Lütke <[email protected]>

b1560

Toggle b1560's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : set metal log callback correctly (ggml-org#4204)