Releases · NVIDIA/kvpress

05 Dec 08:54

maxjeblick

v0.4.0

8306602

v0.4.0 Latest

Latest

🚀 Release v0.4.0

✨ New Features

CURPress - Value-Guided KV Compression for LLMs via Approximated CUR Decomposition (#150)
CompactorPress - Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores (#143)
Decoding Press Functionality - Support for KV cache compression during the decoding phase (#139)
AIME25 & Math500 Benchmarks - New evaluation datasets for mathematical reasoning tasks (#142)
post_init_from_model Hook - Add model-specific initialization support in BasePress (#163)

📈 Improvements

Moved tests to GPU for faster CI execution (#132)
Improved needle-in-haystack test coverage (#133)
Updated README and documentation for clarity (#162)
Enhanced docstrings throughout the codebase (#159)
Updated decoding notebook with latest examples (#156)
Code cleanup: moved utilities, cleaned imports (#160)

🐛 Bug Fixes

Fixed LongBench-v2 benchmark evaluation (#161)
Fixed kvzip press access to past_key_values
Fixed ComposedPress behavior (#148)
Fixed import issues (#144)

📦 Installation

pip install kvpress==0.4.0

📚 Full Changelog

v0.3.0...v0.4.0

Assets 2

04 Sep 12:47

alessiodevoto

v0.3.0

7dbd3f0

v0.3.0

What's Changed

refactor: optimized covariance transform in ExpectedAttentionPress by @neuralsorcerer in #111
fix ruler integration tests by @maxjeblick in #113
fix typo by @neuralsorcerer in #116
Add needle in haystack test by @alessiodevoto in #121
fix masked_key_indices by @maxjeblick in #122
Add copy-pr-bot settings by @maxjeblick in #123
Add Github runner by @maxjeblick in #124
evaluation README.md command error and logging error #127 by @wzp-0815 in #128
add gpu runner by @maxjeblick in #125
Upgrade expected attention with support for more models by @alessiodevoto in #126
Add Expected Attention with Stats by @alessiodevoto in #120
⚠️ Transformers compatibility by @maxjeblick in #115 ---> this is a breaking change (the KV caching machinery changed in HF transformers and we adjusted KVPress accordingly)

New Contributors

@neuralsorcerer made their first contribution in #111
@wzp-0815 made their first contribution in #128

Full Changelog: v0.2.10...v0.3.0

Contributors

maxjeblick, neuralsorcerer, and 2 other contributors

Assets 2

06 Aug 16:10

alessiodevoto

v0.2.10

3eb3f92

v0.2.10

What's Changed

Migration to uv by @alessiodevoto in #108

Full Changelog: v0.2.9...v0.2.10

Contributors

alessiodevoto

Assets 2

28 Jul 12:39

alessiodevoto

v0.2.9

52c761c

v0.2.9

What's Changed

Refactor evaluation by @alessiodevoto in #96
Fix QFilters and DuotAttention when used with wrapper presses by @alessiodevoto in #97
Add HuggingFace leaderboard by @alessiodevoto in #98
Fix links in benchmarks directory by @alessiodevoto in #101
Add KVzipPress by @Janghyun1230 in #93
Test head-wise compression by @alessiodevoto in #103
run backbone model only for prefill by @giulio98 in #100
Transformers compatibility + evaluation by @alessiodevoto in #105

Full Changelog: v0.2.8...v0.2.9

Contributors

Janghyun1230, alessiodevoto, and giulio98

Assets 2

08 Jul 10:21

maxjeblick

v0.2.8

d3fb898

v0.2.8

What's Changed

🐛 Bug Fixes

Fix failing tests by @maxjeblick in #94
Reverts changes to CriticalKVPress performed in #90 that caused the press to initialize incorrectly. The PR also fixes some test logic.

Full Changelog: v0.2.7...v0.2.8

Contributors

maxjeblick

Assets 2

07 Jul 16:52

maxjeblick

v0.2.7

2bc4e2e

v0.2.7

What's Changed

🐛 Bug Fixes

Fix FinchPress for Qwen models family by @alessiodevoto in #82
Resolved compatibility issues with Qwen model architecture in FinchPress compression

✨ New Features

Add KeyDiffPress and BlockPress by @figuremout in #86
Introduces new compression methods based on key difference analysis
Fix for Qwen with Yarn by @giulio98 in #85
Enable Yarn scaling in FinchPress and KeyRerotationPress

📚 Documentation & Maintenance

Improve documentation by @maxjeblick in #90
Add docstrings to all presses, with their corresponding parameters and paper reference.
Add @alessiodevoto's to authors by @maxjeblick in #92 🚀

Full Changelog: v0.2.6...v0.2.7