Skip to content

Releases: NVIDIA/kvpress

v0.4.0

05 Dec 08:54
8306602

Choose a tag to compare

🚀 Release v0.4.0

✨ New Features

  • CURPress - Value-Guided KV Compression for LLMs via Approximated CUR Decomposition (#150)
  • CompactorPress - Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores (#143)
  • Decoding Press Functionality - Support for KV cache compression during the decoding phase (#139)
  • AIME25 & Math500 Benchmarks - New evaluation datasets for mathematical reasoning tasks (#142)
  • post_init_from_model Hook - Add model-specific initialization support in BasePress (#163)

📈 Improvements

  • Moved tests to GPU for faster CI execution (#132)
  • Improved needle-in-haystack test coverage (#133)
  • Updated README and documentation for clarity (#162)
  • Enhanced docstrings throughout the codebase (#159)
  • Updated decoding notebook with latest examples (#156)
  • Code cleanup: moved utilities, cleaned imports (#160)

🐛 Bug Fixes

  • Fixed LongBench-v2 benchmark evaluation (#161)
  • Fixed kvzip press access to past_key_values
  • Fixed ComposedPress behavior (#148)
  • Fixed import issues (#144)

📦 Installation

pip install kvpress==0.4.0

📚 Full Changelog

v0.3.0...v0.4.0

v0.3.0

04 Sep 12:47
7dbd3f0

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.10...v0.3.0

v0.2.10

06 Aug 16:10
3eb3f92

Choose a tag to compare

What's Changed

Full Changelog: v0.2.9...v0.2.10

v0.2.9

28 Jul 12:39
52c761c

Choose a tag to compare

What's Changed

Full Changelog: v0.2.8...v0.2.9

v0.2.8

08 Jul 10:21
d3fb898

Choose a tag to compare

What's Changed

🐛 Bug Fixes

  • Fix failing tests by @maxjeblick in #94
    Reverts changes to CriticalKVPress performed in #90 that caused the press to initialize incorrectly. The PR also fixes some test logic.

Full Changelog: v0.2.7...v0.2.8

v0.2.7

07 Jul 16:52
2bc4e2e

Choose a tag to compare

What's Changed

🐛 Bug Fixes

  • Fix FinchPress for Qwen models family by @alessiodevoto in #82
    Resolved compatibility issues with Qwen model architecture in FinchPress compression

✨ New Features

  • Add KeyDiffPress and BlockPress by @figuremout in #86
    Introduces new compression methods based on key difference analysis
  • Fix for Qwen with Yarn by @giulio98 in #85
    Enable Yarn scaling in FinchPress and KeyRerotationPress

📚 Documentation & Maintenance

Full Changelog: v0.2.6...v0.2.7

v0.2.6

16 Jun 10:37
f7d77d3

Choose a tag to compare

v0.2.5

17 Apr 14:03
ef5179d

Choose a tag to compare

v0.2.4

17 Mar 12:41
4100647

Choose a tag to compare

v0.2.3

18 Feb 16:51
a94a78d

Choose a tag to compare

  • Fix distributed inference for the ExpectedAttentionPress, #49 by @SimJeg
  • Add DuoAttentionPress, #50 by @SimJeg