-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-6371][feat] Restructure C++ KVCacheManager to better handle limited attention layers #7510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
thorjohnsen
wants to merge
63
commits into
NVIDIA:main
Choose a base branch
from
thorjohnsen:user/tjohnsen/restructure_kvcachemanager_for_vswa
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,615
−726
Open
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
9cc92f6
Draft
thorjohnsen 28d6dd1
Code simplification
thorjohnsen 8a06a28
Simplify eviction policy into true priority based eviction
thorjohnsen f30fa42
Remove personal info
thorjohnsen 9d8c420
Remove superfluous method, add better comments
thorjohnsen 6ceed6b
Fixes
thorjohnsen 64f4504
Fixes
thorjohnsen 5eaf9d5
Implement some lookup node methods
thorjohnsen 800b05d
Move some method(s) to make PR diff easier to read
thorjohnsen 21643e3
More fixes
thorjohnsen e0499cf
Faster and more flexible findNewContextBlock method
thorjohnsen 4840a98
Fix last compile issue
thorjohnsen 8647828
Bug fixes
thorjohnsen 91445c5
Bug fixes
thorjohnsen 76f317e
Turn debug printf into TLLM_LOG_DEBUG
thorjohnsen ca48af9
Bug fixes
thorjohnsen 3399798
Add multi-node scanning for partial match
thorjohnsen 78bcf4d
Bug fixes
thorjohnsen 8836822
Add lookupBlocks method
thorjohnsen 86ed0f8
Fix priority eviction, add lots of debug printouts
thorjohnsen 42fe938
Fix unit test that relied on leaf block only eviction
thorjohnsen cef3334
Bug fixes and unit test adjustments
thorjohnsen ad2aa8f
Fix last unit test
thorjohnsen cfe0609
Remove superfluous arguments
thorjohnsen 02abe39
Remove superfluous member variable
thorjohnsen f025cfb
Split setLookupNode into two methods to improve readability of code
thorjohnsen 735fe3c
Move stuff around to simplify diff
thorjohnsen a678b91
Resolve merge conflicts
thorjohnsen 787b681
Resolve build issue
thorjohnsen 50a6f3c
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen a487b55
Fix remaining merge issues
thorjohnsen 78c6253
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen bb421b9
precommit run
thorjohnsen f3e5c13
Manual precommit fix
thorjohnsen d427948
Address issue identified by coderabbit
thorjohnsen d57c3b4
Bug fixes
thorjohnsen 3f696f6
Bug fixes, more debug printouts
thorjohnsen c8177d3
Fix last unit test failures after merge with main
thorjohnsen 3802864
precommit run
thorjohnsen e3a1921
Fix spelling errors
thorjohnsen 24dfa89
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen 61f9702
Fix race condition that has been in code for months
thorjohnsen 984a4d2
precommit run
thorjohnsen 23c28fb
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen 1014a3e
Update unit test to reflect last bug fix
thorjohnsen a8ad970
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen 39e81e1
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen 64ef5f3
Bug fix
thorjohnsen fd364d9
Bug fix
thorjohnsen 3f22026
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen 9367bbe
Fix merge conflicts
thorjohnsen 98c0d29
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen a63d0b6
Add refresh blocks
Tabrizian 12da9ae
Merge remote-tracking branch 'upstream/main' into user/tjohnsen/restr…
thorjohnsen f2c1d9a
Fix transfer manager synchronization issues
thorjohnsen 0ee0004
Fix some comments
thorjohnsen b98a2fd
precommit run
thorjohnsen 0e62604
Fix merge conflicts
thorjohnsen d171f40
Remove commented out code
thorjohnsen e3422c9
Apply suggestions from code review
thorjohnsen 6308843
Fix merge issues
thorjohnsen 8fcaf67
Merge branch 'main' into user/tjohnsen/restructure_kvcachemanager_for…
thorjohnsen 5cb27b7
Resolve merge conflicts
thorjohnsen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
402 changes: 271 additions & 131 deletions
402
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -553,6 +553,32 @@ struct RetentionPriorityAndDuration | |
|
|
||
| std::optional<RetentionPriority> retentionPriority; | ||
| std::optional<std::chrono::milliseconds> durationMs; | ||
|
|
||
| std::string print() | ||
| { | ||
| std::stringstream out; | ||
| out << "RetentionPriorityAndDuration{"; | ||
| bool needComma = false; | ||
| if (retentionPriority.has_value()) | ||
| { | ||
| out << "retentionPriority=" << retentionPriority.value(); | ||
| needComma = true; | ||
| } | ||
| if (durationMs.has_value()) | ||
| { | ||
| if (needComma) | ||
| { | ||
| out << ","; | ||
| } | ||
| else | ||
| { | ||
| needComma = true; | ||
| } | ||
| out << "durationMs=" << durationMs.value().count(); | ||
| } | ||
| out << "}"; | ||
| return out.str(); | ||
| } | ||
| }; | ||
|
|
||
| /// @brief Configuration for the request's retention in the KV Cache | ||
|
|
@@ -586,6 +612,24 @@ class KvCacheRetentionConfig | |
| /// have no expiration time, and keep the block at the given priority level until it gets reclaimed. After the | ||
| /// duration has passed, the block will be moved back to the `kDefaultRetentionPriority` level. | ||
| std::optional<std::chrono::milliseconds> durationMs; | ||
|
|
||
| std::string print() const | ||
| { | ||
| std::stringstream out; | ||
| out << "TokenRangeRetentionConfig={"; | ||
| out << "tokenStart=" << tokenStart; | ||
| if (tokenEnd.has_value()) | ||
| { | ||
| out << ",tokenEnd=" << tokenEnd.value(); | ||
| } | ||
| out << ",priority=" << priority; | ||
| if (durationMs.has_value()) | ||
| { | ||
| out << ",durationMs=" << durationMs.value().count(); | ||
| } | ||
| out << "}"; | ||
| return out.str(); | ||
| } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. |
||
| }; | ||
|
|
||
| explicit KvCacheRetentionConfig() | ||
|
|
@@ -617,6 +661,27 @@ class KvCacheRetentionConfig | |
| && mDirectory == other.mDirectory; | ||
| } | ||
|
|
||
| std::string print() const | ||
| { | ||
| std::stringstream out; | ||
| out << "KvCacheRetentionConfig={"; | ||
| bool firstIteration = true; | ||
| for (auto trrc : mTokenRangeRetentionConfigs) | ||
| { | ||
| if (firstIteration) | ||
| { | ||
| firstIteration = false; | ||
| } | ||
| else | ||
| { | ||
| out << ","; | ||
| } | ||
| out << trrc.print(); | ||
| } | ||
| out << "}"; | ||
| return out.str(); | ||
| } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. |
||
|
|
||
| private: | ||
| /// @brief The token ranges and priority levels to update. Ranges must be non-overlapping. For example [(0, 64), | ||
| /// (100, 128), (70, 80)] is valid, whereas | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest renaming this to
to_stringand moving the implementation tocpp/tensorrt_llm/executor/kvCacheRetentionConfig.cpp.