Releases · NVIDIA/kvpress

12 Feb 13:39

SimJeg

v0.2.2

cc4bf60

v0.2.2

Fix style check, #48 by @maxjeblick
Add CriticalKVPress, #46 by @FFY0
Add epsilon to ExpectedAttentionPress, #47 by @SimJeg

Contributors

SimJeg, maxjeblick, and FFY0

Assets 2

21 Jan 15:21

SimJeg

v0.2.1

72cc784

v0.2.1

Add ChunkPress, #40 by @maxjeblick and @giulio98
Update README, including new huggingface space, #41 and #42 by @SimJeg

Contributors

SimJeg, maxjeblick, and giulio98

Assets 2

13 Jan 17:44

SimJeg

v0.2.0

fe4610e

v0.2.0

Transformers v4.48 introduced breaking changes handled in this release. The release also features AdaKVPress, the first press allowing head-wise compression by patching the attention functions registered in ALL_ATTENTION_FUNCTIONS since v4.48. When combined with ExpectedAttentionPress, AdaKVPress achieved the best results observed yet on the RULER benchmark (see this post).

Add AdaKVPress, #38 by @SimJeg and @FFY0
Handle transformers 4.48, #39 by @SimJeg
Add InfiniteBench results, #11 by @maxjeblick

Contributors

SimJeg, maxjeblick, and FFY0

Assets 2

07 Jan 10:44

maxjeblick

v0.1.1

d242538

v0.1.1

What's Changed

#33 by @SimJeg fixes a small bug in the pipeline
#36 by @maxjeblick sets transformers <4.48 as a dependency

Full Changelog: v0.1.0...v0.1.1

Contributors

SimJeg and maxjeblick

Assets 2

12 Dec 09:22

SimJeg

v0.1.0

2b350b0

v0.1.0

#24 by @maxjeblick and #29 by @SimJeg introduce a non-breaking refactoring:

a press does not require the compression_ratio input argument anymore as some presses do not explicitly require it (e.g. ThinKPress, SimLayerKVPress). However every press must have a compression_ratio attribute after any forward pass (assertion added in tests) to allow average compression ratio measurement on a benchmark
the core compression logic has been moved from BasePress.forward_hook to BasePress.compress. BasePress.forward_hook now only checks if compress must be called (pre-filling vs decoding), de-quantize cache before compress and re-quantize it afterwards
the BasePress does not implement a score method anymore, this has been moved to the ScorerPress with the associated ScorerPress.compress method

Other features:

Add SimLayerKVPress, #28 by @SimJeg and @dame-cell
Add ComposedPress, #29 by @SimJeg
Add KeyReRotationPress, #31 by @maxjeblick and @giulio98
Fix QuantizedCache, #30 by @maxjeblick
Add new tests, including an integration test on a sample from RULER

Contributors

SimJeg, maxjeblick, and 2 other contributors

Assets 2

03 Dec 15:31

SimJeg

v0.0.4

ac2445e

v0.0.4

Add ThinKPress, #20 by @SimJeg and @yuhuixu1993

Contributors

SimJeg and yuhuixu1993

Assets 2

26 Nov 13:14

SimJeg

v0.0.3

51f3877

v0.0.3

Update speed and memory plots, #10 by @maxjeblick
Add TOVAPress, #12 by @SimJeg and @hassidm

Contributors

SimJeg, maxjeblick, and hassidm

Assets 2

21 Nov 15:54

SimJeg

v0.0.2

64b3c17

v0.0.2

Add support for QuantizedCache, #5 by @SimJeg
Add colab demo notebook, #6 by @maxjeblick

Contributors

SimJeg and maxjeblick

Assets 2

13 Nov 16:34

SimJeg

v0.0.1

2536a98

Initial release

v0.0.1

install poetry in workflows (#1)

Assets 2

Releases: NVIDIA/kvpress

v0.2.2

Contributors

Uh oh!

v0.2.1

Contributors

Uh oh!

v0.2.0

Contributors

Uh oh!

v0.1.1

What's Changed

Contributors

Uh oh!

v0.1.0

Contributors

Uh oh!

v0.0.4

Contributors

Uh oh!

v0.0.3

Contributors

Uh oh!

v0.0.2

Contributors

Uh oh!

Initial release

Uh oh!