Skip to content

Tags: psugihara/llama.cpp

Tags

b2360

Toggle b2360's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-bench : add embeddings option (ggml-org#5924)

* llama-bench : add embeddings option

* llama-bench : do not hard code embd default value

---------

Co-authored-by: slaren <[email protected]>

b2167

Toggle b2167's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake : fix VULKAN and ROCm builds (ggml-org#5525)

* cmake : fix VULKAN and ROCm builds

* cmake : fix (cont)

* vulkan : fix compile warnings

ggml-ci

* cmake : fix

ggml-ci

* cmake : minor

ggml-ci

b2042

Toggle b2042's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
add --no-mmap in llama-bench (ggml-org#5257)

* add --no-mmap, show sycl backend

* fix conflict

* fix code format, change print for --no-mmap

* ren no_mmap to mmap, show mmap when not default value in printer

* update guide for mmap

* mv position to reduce model reload

b1963

Toggle b1963's commit message

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
nix-shell: use addToSearchPath

thx to @SomeoneSerge for the suggestion!

b1794

Toggle b1794's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

b1767

Toggle b1767's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
server : send token probs for "stream == false" (ggml-org#4714)

b1761

Toggle b1761's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
train : fix typo in overlapping-samples help msg (ggml-org#4758)

This commit fixes a typo in the help message for the
--overlapping-samples option.

Signed-off-by: Daniel Bevenius <[email protected]>

b1759

Toggle b1759's commit message
cuda : simplify expression

Co-authored-by: slaren <[email protected]>

b1709

Toggle b1709's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
gpt2 : Add gpt2 architecture integration (ggml-org#4555)

b1708

Toggle b1708's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : add AWQ for llama, llama2, mpt, and mistral models (ggml-org#…

…4593)

* update: awq support llama-7b model

* update: change order

* update: benchmark results for llama2-7b

* update: mistral 7b v1 benchmark

* update: support 4 models

* fix: Readme

* update: ready for PR

* update: readme

* fix: readme

* update: change order import

* black

* format code

* update: work for bot mpt and awqmpt

* update: readme

* Rename to llm_build_ffn_mpt_awq

* Formatted other files

* Fixed params count

* fix: remove code

* update: more detail for mpt

* fix: readme

* fix: readme

* update: change folder architecture

* fix: common.cpp

* fix: readme

* fix: remove ggml_repeat

* update: cicd

* update: cicd

* uppdate: remove use_awq arg

* update: readme

* llama : adapt plamo to new ffn

ggml-ci

---------

Co-authored-by: Trần Đức Nam <[email protected]>
Co-authored-by: Le Hoang Anh <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>