Tags: psugihara/llama.cpp
Tags
llama-bench : add embeddings option (ggml-org#5924) * llama-bench : add embeddings option * llama-bench : do not hard code embd default value --------- Co-authored-by: slaren <[email protected]>
add --no-mmap in llama-bench (ggml-org#5257) * add --no-mmap, show sycl backend * fix conflict * fix code format, change print for --no-mmap * ren no_mmap to mmap, show mmap when not default value in printer * update guide for mmap * mv position to reduce model reload
common : fix the short form of `--grp-attn-w`, not `-gat` (ggml-org#4825 ) See https://github.com/ggerganov/llama.cpp/blob/master/common/common.cpp#L230C53-L230C57
train : fix typo in overlapping-samples help msg (ggml-org#4758) This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <[email protected]>
llama : add AWQ for llama, llama2, mpt, and mistral models (ggml-org#… …4593) * update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci --------- Co-authored-by: Trần Đức Nam <[email protected]> Co-authored-by: Le Hoang Anh <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
PreviousNext