Releases: NVIDIA/kvpress
Releases · NVIDIA/kvpress
v0.4.0
🚀 Release v0.4.0
✨ New Features
- CURPress - Value-Guided KV Compression for LLMs via Approximated CUR Decomposition (#150)
- CompactorPress - Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores (#143)
- Decoding Press Functionality - Support for KV cache compression during the decoding phase (#139)
- AIME25 & Math500 Benchmarks - New evaluation datasets for mathematical reasoning tasks (#142)
post_init_from_modelHook - Add model-specific initialization support in BasePress (#163)
📈 Improvements
- Moved tests to GPU for faster CI execution (#132)
- Improved needle-in-haystack test coverage (#133)
- Updated README and documentation for clarity (#162)
- Enhanced docstrings throughout the codebase (#159)
- Updated decoding notebook with latest examples (#156)
- Code cleanup: moved utilities, cleaned imports (#160)
🐛 Bug Fixes
- Fixed LongBench-v2 benchmark evaluation (#161)
- Fixed kvzip press access to
past_key_values - Fixed ComposedPress behavior (#148)
- Fixed import issues (#144)
📦 Installation
pip install kvpress==0.4.0📚 Full Changelog
v0.3.0
What's Changed
- refactor: optimized covariance transform in ExpectedAttentionPress by @neuralsorcerer in #111
- fix ruler integration tests by @maxjeblick in #113
- fix typo by @neuralsorcerer in #116
- Add needle in haystack test by @alessiodevoto in #121
- fix masked_key_indices by @maxjeblick in #122
- Add copy-pr-bot settings by @maxjeblick in #123
- Add Github runner by @maxjeblick in #124
- evaluation README.md command error and logging error #127 by @wzp-0815 in #128
- add gpu runner by @maxjeblick in #125
- Upgrade expected attention with support for more models by @alessiodevoto in #126
- Add Expected Attention with Stats by @alessiodevoto in #120
⚠️ Transformers compatibility by @maxjeblick in #115 ---> this is a breaking change (the KV caching machinery changed in HF transformers and we adjusted KVPress accordingly)
New Contributors
- @neuralsorcerer made their first contribution in #111
- @wzp-0815 made their first contribution in #128
Full Changelog: v0.2.10...v0.3.0
v0.2.10
v0.2.9
What's Changed
- Refactor evaluation by @alessiodevoto in #96
- Fix QFilters and DuotAttention when used with wrapper presses by @alessiodevoto in #97
- Add HuggingFace leaderboard by @alessiodevoto in #98
- Fix links in benchmarks directory by @alessiodevoto in #101
- Add KVzipPress by @Janghyun1230 in #93
- Test head-wise compression by @alessiodevoto in #103
- run backbone model only for prefill by @giulio98 in #100
- Transformers compatibility + evaluation by @alessiodevoto in #105
Full Changelog: v0.2.8...v0.2.9
v0.2.8
What's Changed
🐛 Bug Fixes
- Fix failing tests by @maxjeblick in #94
Reverts changes toCriticalKVPressperformed in #90 that caused the press to initialize incorrectly. The PR also fixes some test logic.
Full Changelog: v0.2.7...v0.2.8
v0.2.7
What's Changed
🐛 Bug Fixes
- Fix FinchPress for Qwen models family by @alessiodevoto in #82
Resolved compatibility issues with Qwen model architecture in FinchPress compression
✨ New Features
- Add KeyDiffPress and BlockPress by @figuremout in #86
Introduces new compression methods based on key difference analysis - Fix for Qwen with Yarn by @giulio98 in #85
Enable Yarn scaling in FinchPress and KeyRerotationPress
📚 Documentation & Maintenance
- Improve documentation by @maxjeblick in #90
Add docstrings to all presses, with their corresponding parameters and paper reference. - Add @alessiodevoto's to authors by @maxjeblick in #92 🚀
Full Changelog: v0.2.6...v0.2.7
v0.2.6
- Improve packaging, #71 by @emmanuel-ferdman, #77 by @fanqiNO1, SDPX headers by @maxjeblick
- Add LagKVPress, #77 by @JoelSeniorLiang
- Support Qwen3 and Gemma3, #81 by @alessiodevoto
v0.2.5
- Add PyramidKVPress, #65 by @figuremout
- Fix style errors, #68 by @maxjeblick
- Add FinchPress, #64 and #69, by @giulio98, @miriam-16, @FaureElia and @SimJeg
v0.2.4
- Add
QFilterPress, #54 by @NathanGodey - Update copyright dates and add citation file, #60 by @SimJeg
- Add
ChunkKVPress, #51 by @Dominic789654