-
Notifications
You must be signed in to change notification settings - Fork 14k
GGUF #2398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
GGUF #2398
Changes from 1 commit
Commits
Show all changes
253 commits
Select commit
Hold shift + click to select a range
6873148
gguf : first API pass
ggerganov 8d6acfe
gguf : read header + meta data
ggerganov d91b985
gguf : read tensor info
ggerganov 78b226a
gguf : initial model loading - not tested
ggerganov 860c9c6
gguf : add gguf_get_tensor_name()
ggerganov cb871fa
gguf : do not support passing existing ggml_context to gguf_init
ggerganov d313c0f
gguf : simplify gguf_get_val
ggerganov e46870f
gguf : gguf.c is now part of ggml.c
ggerganov 5628ec7
gguf : read / write sample models
ggerganov d8491fc
gguf : add comments
ggerganov c85d317
refactor : reduce code duplication and better API (#2415)
monatis d89533d
gguf : expose the gguf_type enum through the API for now
ggerganov d2b6ca1
gguf : add array support
ggerganov 158be8f
gguf.py : some code style changes
ggerganov 68f5348
convert.py : start a new simplified implementation by removing old stuff
ggerganov d2bb3ac
convert.py : remove GGML vocab + other obsolete stuff
ggerganov 11ef380
GGUF : write tensor (#2426)
monatis 3492f84
gguf : add gguf_find_key (#2438)
klosax 1495735
gguf : fix writing tensors
monatis 9475cdb
Merge branch 'gguf-write-tokenization' into gguf
monatis 08dc8fd
gguf : do not hardcode tensor names to read
monatis 06f423a
gguf : write sample tensors to read
monatis d54f53c
gguf : add tokenization constants
monatis 999431c
quick and dirty conversion example
klosax ea5f9ad
gguf : fix writing gguf arrays
monatis aa99562
Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf
monatis 93f7f7a
gguf : write tensors one by one and code reuse
monatis 0c219fb
gguf : fix writing gguf arrays
monatis c861e23
gguf : write tensors one by one
monatis 8a76dd8
gguf : write tensors one by one
monatis cc3dd7f
gguf : write tokenizer data
monatis 0317c41
gguf : upd gguf conversion script
monatis 8ad7cd4
Update convert-llama-h5-to-gguf.py
klosax 0f5e57f
gguf : handle already encoded string
monatis 34469b9
ggml.h : get array str and f32
klosax 2c22e3b
ggml.c : get arr str and f32
klosax 9577821
gguf.py : support any type
klosax 06c3e4a
Update convert-llama-h5-to-gguf.py
klosax 32e037f
gguf : fix set is not subscriptable
monatis 87c34e4
gguf : update convert-llama-h5-to-gguf.py
monatis 0790c12
constants.py : add layer norm eps
klosax ccd81a7
gguf.py : add layer norm eps and merges
klosax b4676ee
ggml.h : increase GGML_MAX_NAME to 64
klosax b19c117
ggml.c : add gguf_get_arr_n
klosax 4ed98bf
Update convert-llama-h5-to-gguf.py
klosax e9192b0
add gptneox gguf example
klosax f175b05
Makefile : add gptneox gguf example
klosax 2fabc17
Update convert-llama-h5-to-gguf.py
klosax 30c4ea4
add gptneox gguf example
klosax 068a8e0
Update convert-llama-h5-to-gguf.py
klosax 2a09146
Update convert-gptneox-h5-to-gguf.py
klosax 4f5b622
Update convert-gptneox-h5-to-gguf.py
klosax 6b3a7b9
Update convert-llama-h5-to-gguf.py
klosax 7aa0a0e
gguf : support custom alignment value
monatis b26f5b2
gguf : fix typo in function call
monatis bb42aef
gguf : mmap tensor data example
monatis f3de876
fix : update convert-llama-h5-to-gguf.py
monatis da4900e
Update convert-llama-h5-to-gguf.py
klosax e7a7416
convert-gptneox-h5-to-gguf.py : Special tokens
klosax c77fabb
gptneox-main.cpp : special tokens
klosax 36a36c3
Update gptneox-main.cpp
klosax ff1cb02
constants.py : special tokens
klosax 49380a2
gguf.py : accumulate kv and tensor info data + special tokens
klosax 1b4f9c8
convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens
klosax cf365fb
gguf : gguf counterpart of llama-util.h
monatis c3a65c4
gguf-util.h : update note
monatis e1e9b28
convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens
klosax c5ba5ef
convert-llama-h5-to-gguf.py : special tokens
klosax 23abbe8
Delete gptneox-common.cpp
klosax 6691aa8
Delete gptneox-common.h
klosax 2922280
convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer
klosax e6f19ba
gptneox-main.cpp : gpt2 bpe tokenizer
klosax 5d98989
gpt2 bpe tokenizer (handles merges and unicode)
klosax fb0b243
Makefile : remove gptneox-common
klosax 278ada9
gguf.py : bytesarray for gpt2bpe tokenizer
klosax db5618a
cmpnct_gpt2bpe.hpp : comments
klosax 4357e69
gguf.py : use custom alignment if present
klosax 1da82c5
Merge branch 'master' into gguf
ggerganov 8083ae3
gguf : minor stuff
ggerganov 65559a2
Update gptneox-main.cpp
klosax ece4fc1
map tensor names
klosax f4d137d
convert-gptneox-h5-to-gguf.py : map tensor names
klosax 7d5f452
convert-llama-h5-to-gguf.py : map tensor names
klosax 0246d0d
gptneox-main.cpp : map tensor names
klosax 1c4d8bf
gguf : start implementing libllama in GGUF (WIP)
monatis 4f86518
gguf : start implementing libllama in GGUF (WIP)
monatis 4c0f64e
rm binary commited by mistake
monatis 22de6c5
upd .gitignore
monatis 42cc04d
gguf : calculate n_mult
monatis cfb8e35
gguf : inference with 7B model working (WIP)
monatis f316b94
gguf : rm deprecated function
monatis e7d346c
gguf : start implementing gguf_file_saver (WIP)
monatis a356b0e
gguf : start implementing gguf_file_saver (WIP)
monatis b2440f1
gguf : start implementing gguf_file_saver (WIP)
monatis eb8ca69
gguf : add gguf_get_kv_type
monatis e3a4960
gguf : add gguf_get_kv_type
monatis 28abfc9
gguf : write metadata in gguf_file_saver (WIP)
monatis 781b9ec
gguf : write metadata in gguf_file_saver (WIP)
monatis d09fd10
gguf : write metadata in gguf_file_saver
monatis 61919c1
gguf : rm references to old file formats
monatis 7009cf5
gguf : shorter name for member variable
monatis f44bbd3
gguf : rm redundant method
monatis e732423
gguf : get rid of n_mult, read n_ff from file
monatis 2a5ac7a
Update gguf_tensor_map.py
klosax e76c59d
Update gptneox-main.cpp
klosax 2f52008
gguf : rm references to old file magics
monatis 186c496
Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf
monatis 4fa017a
gguf : start implementing quantization (WIP)
monatis 0e1a3c7
gguf : start implementing quantization (WIP)
monatis c4f02b4
gguf : start implementing quantization (WIP)
monatis b2571af
gguf : start implementing quantization (WIP)
monatis fa7c395
gguf : start implementing quantization (WIP)
monatis 1fc3d30
gguf : start implementing quantization (WIP)
monatis 202eab0
gguf : quantization is working
monatis 60d5408
gguf : roper closing of file
monatis 5d81a71
gguf.py : no need to convert tensors twice
klosax 8f09157
convert-gptneox-h5-to-gguf.py : no need to convert tensors twice
klosax 4cef57c
convert-llama-h5-to-gguf.py : no need to convert tensors twice
klosax f821847
convert-gptneox-h5-to-gguf.py : simplify nbytes
klosax e606ffe
convert-llama-h5-to-gguf.py : simplify nbytes
klosax 5e58ffa
gptneox-main.cpp : n_layer --> n_block
klosax 8b5f0c5
constants.py : n_layer --> n_block
klosax d2ce9cf
gguf.py : n_layer --> n_block
klosax 489616e
convert-gptneox-h5-to-gguf.py : n_layer --> n_block
klosax e91a222
convert-llama-h5-to-gguf.py : n_layer --> n_block
klosax c7bd8c1
gptneox-main.cpp : n_layer --> n_block
klosax 9bf5a7e
Update gguf_tensor_map.py
klosax e3d1f07
convert-gptneox-h5-to-gguf.py : load model in parts to save memory
klosax 17800cd
convert-llama-h5-to-gguf.py : load model in parts to save memory
klosax 91d4bfd
convert : write more metadata for LLaMA
monatis 1d60468
fix conflicts
monatis bf2dad3
convert : rm quantization version
monatis 2827b84
convert-gptneox-h5-to-gguf.py : add file_type key
klosax 6beebf3
gptneox-main.cpp : add file_type key
klosax 24f4883
fix conflicts
monatis 196b50f
gguf : add todos and comments
monatis 56a1f32
Merge branch 'master' into gguf
ggerganov 5d22a9d
convert-gptneox-h5-to-gguf.py : tensor name map changes
klosax 51939d7
Create gguf_namemap.py : tensor name map changes
klosax 806a157
Delete gguf_tensor_map.py
klosax d753dfb
gptneox-main.cpp : tensor name map changes
klosax a7d226f
convert-llama-h5-to-gguf.py : fixes
klosax 5c5a95b
gguf.py : dont add empty strings
klosax 0c19ae7
simple : minor style changes
ggerganov 62490f1
gguf : use UNIX line ending
ggerganov 6f64b6c
Create convert-llama-7b-pth-to-gguf.py
klosax f00780b
llama : sync gguf-llama.cpp with latest llama.cpp (#2608)
ggerganov 6f14854
gitignore : add gptneox-main
ggerganov 8af3a99
Merge branch 'master' into gguf
ggerganov ec1b100
llama : tokenizer fixes (#2549)
goerch afc4ca2
convert : update convert-new.py with tokenizer fixes (#2614)
goerch 7494c78
llama : sync gguf-llama with llama (#2613)
ggerganov 6c63550
llama : update tokenizer style
ggerganov 7ec125b
convert-llama-h5-to-gguf.py : add token types
klosax 5d518d4
constants.py : add token types
klosax cedb487
gguf.py : add token types
klosax ab2cbd0
convert-llama-7b-pth-to-gguf.py : add token types
klosax ca47582
gguf-llama.cpp : fix n_head_kv
klosax 2dd5d2c
convert-llama-h5-to-gguf.py : add 70b gqa support
klosax b6056c3
gguf.py : add tensor data layout
klosax 66756c8
convert-llama-h5-to-gguf.py : add tensor data layout
klosax 2ae0e98
convert-llama-7b-pth-to-gguf.py : add tensor data layout
klosax 4a1741a
gptneox-main.cpp : add tensor data layout
klosax ea5615a
convert-llama-h5-to-gguf.py : clarify the reverse permute
klosax 758ff1b
llama : refactor model loading code (#2620)
ggerganov 88b5769
gguf : deduplicate (#2629)
ggerganov c8ee87f
gguf.py : merge all files in gguf.py
ggerganov 5ec1893
convert-new.py : pick #2427 for HF 70B support
ggerganov 42f8fe1
examples/gguf : no need to keep q option for quantization any more
monatis 5a0a2c5
llama.cpp : print actual model size
klosax d6fd53a
llama.cpp : use ggml_elements()
klosax e0429d3
convert-new.py : output gguf (#2635)
ggerganov 2ddd968
convert.py : update to support GGUF output
ggerganov dd016cc
Revert "ci : disable CI temporary to not waste energy"
ggerganov d646c4e
convert.py : n_head_kv optional and .gguf file extension
klosax 8ace03a
convert.py : better always have n_head_kv and default it to n_head
ggerganov 11bf436
llama : sync with recent PRs on master
ggerganov 6d66ef9
Merge branch 'master' into gguf
ggerganov c3b7393
editorconfig : ignore models folder
ggerganov dd9e2fc
ci : update ".bin" to ".gguf" extension
ggerganov 81a2c2a
llama : fix llama_model_loader memory leak
ggerganov 93f285b
gptneox : move as a WIP example
ggerganov 899f9a5
llama : fix lambda capture
ggerganov e72c8c2
ggml : fix bug in gguf_set_kv
ggerganov fb11dd3
common.h : .bin --> .gguf
klosax 78e1e57
quantize-stats.cpp : .bin --> .gguf
klosax acaa982
convert.py : fix HF tensor permuting / unpacking
ggerganov b3cc182
llama.cpp : typo
klosax 57eaadb
llama : throw error if gguf fails to init from file
ggerganov 5484737
llama : fix tensor name grepping during quantization
ggerganov fc3a523
gguf.py : write tensors in a single pass (#2644)
monatis b668cd3
convert-gptneox-hf-to-gguf.py : fixes
klosax 640ddc4
gguf.py : gptneox mapping
klosax 9e2d4dd
convert-llama-hf-to-gguf.py : fixes
klosax 3c1b721
convert-llama-7b-pth-to-gguf.py : fixes
klosax c20ae49
ggml.h : reverse GGUF_MAGIC
klosax 147a99b
gguf.py : reverse GGUF_MAGIC
klosax d9e6890
test-tokenizer-0.cpp : fix warning
klosax 306070c
llama.cpp : print kv general.name
klosax b275de7
llama.cpp : get special token kv and linefeed token id
klosax aa3efe8
llama : print number of tensors per type + print arch + style
ggerganov 856afff
Merge branch 'master' into gguf
ggerganov e35f8c7
tests : update vocab file with new magic
ggerganov dea5be6
editorconfig : fix whitespaces
ggerganov 660ca9b
llama : re-order functions
ggerganov 38016ed
Merge branch 'master' into gguf
ggerganov 2d6c2c7
llama : remove C++ API + reorganize common source in /common dir
ggerganov 035d511
llama : minor API updates
ggerganov 5d2656d
llama : avoid hardcoded special tokens
ggerganov a4ad2bf
llama : fix MPI build
ggerganov 25b8a89
llama : introduce enum llama_vocab_type + remove hardcoded string con…
ggerganov fb7c883
convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested
klosax d5e976c
falcon-main.cpp : falcon inference example
klosax 16ab9ba
convert-falcon-hf-to-gguf.py : remove extra kv
klosax c0e4ca6
convert-gptneox-hf-to-gguf.py : remove extra kv
klosax 593b04f
convert-llama-7b-pth-to-gguf.py : remove extra kv
klosax 281d6d1
convert-llama-hf-to-gguf.py : remove extra kv
klosax bd5a579
gguf.py : fix for falcon 40b
klosax 1d80eea
falcon-main.cpp : fix for falcon 40b
klosax 2c8055b
convert-falcon-hf-to-gguf.py : update ref
klosax b3a7a2b
convert-falcon-hf-to-gguf.py : add tensor data layout
klosax dadf098
cmpnct_gpt2bpe.hpp : fixes
klosax 781bf24
falcon-main.cpp : fixes
klosax 8945d47
gptneox-main.cpp : fixes
klosax 6a2e520
cmpnct_gpt2bpe.hpp : remove non-general stuff
klosax c0a1269
Update examples/server/README.md
klosax 28b8c26
cmpnct_gpt2bpe.hpp : cleanup
klosax 76b4662
convert-llama-hf-to-gguf.py : special tokens
klosax f838faa
convert-llama-7b-pth-to-gguf.py : special tokens
klosax 5a02b96
convert-permute-debug.py : permute debug print
klosax 4f92488
convert-permute-debug-master.py : permute debug for master
klosax 7de7cb4
convert-permute-debug.py : change permute type of attn_q
klosax d5c8fcf
convert.py : 70b model working (change attn_q permute)
klosax 287db51
Delete convert-permute-debug-master.py
klosax 58bde5c
Delete convert-permute-debug.py
klosax c818c40
convert-llama-hf-to-gguf.py : fix attn_q permute
klosax 6a69a69
gguf.py : fix rope scale kv
klosax 5f6ff38
convert-llama-hf-to-gguf.py : rope scale and added tokens
klosax dc1f051
convert-llama-7b-pth-to-gguf.py : rope scale and added tokens
klosax c082b9f
llama.cpp : use rope scale kv
klosax 9070e33
convert-llama-7b-pth-to-gguf.py : rope scale fix
klosax 7a7d1ba
convert-llama-hf-to-gguf.py : rope scale fix
klosax 1e7a009
Merge branch 'master' into gguf
ggerganov 6490ff7
py : fix whitespace
ggerganov e06cbce
gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682)
KerfuffleV2 8d177ed
llama : improve token type support (#2668)
goerch 0b53b8b
llama : add API for token type
ggerganov 49c25cc
tests : use new tokenizer type API (#2692)
goerch 811f653
py : cosmetics
ggerganov 66a66a0
readme : add notice about new file format
ggerganov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
gguf : start implementing quantization (WIP)
- Loading branch information
commit 1fc3d30b71a707187eb1f995c4776db7aaa6265a
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.