[None][chore] expose tokens_per_block into KvCacheConfig #5911

Superjomn · 2025-07-10T09:17:36Z

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Added a tokens_per_block option in KV cache settings (default: 64) to control tokens per block; value is applied at runtime when provided.
Refactor
- Streamlined executor configuration: tokens_per_block is no longer exposed via executor configuration and is now managed through KV cache settings.

Superjomn · 2025-07-10T09:20:06Z

/bot run

tensorrt-cicd · 2025-07-10T09:25:15Z

PR_Github #11538 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T12:39:47Z

PR_Github #11538 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8541 completed with status: 'FAILURE'

QiJune

LGTM

Superjomn · 2025-07-11T05:27:25Z

/bot run

tensorrt-cicd · 2025-07-11T05:32:37Z

PR_Github #11605 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-11T07:11:52Z

PR_Github #11605 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8599 completed with status: 'FAILURE'

Superjomn · 2025-07-12T15:10:59Z

/bot run

tensorrt-cicd · 2025-07-12T15:17:23Z

PR_Github #11710 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-12T16:06:06Z

PR_Github #11710 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8666 completed with status: 'FAILURE'

Superjomn · 2025-07-14T04:36:21Z

/bot run

tensorrt-cicd · 2025-07-14T04:41:45Z

PR_Github #11760 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-14T05:53:21Z

PR_Github #11760 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8711 completed with status: 'FAILURE'

Superjomn · 2025-07-14T07:43:13Z

/bot run

Superjomn · 2025-09-05T07:41:29Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-05T07:46:23Z

PR_Github #17760 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-05T07:46:24Z

PR_Github #17760 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User Superjomn is not in the prioritized list. L0 testing cannot be triggered.

richardhuo-nv · 2025-09-05T15:19:57Z

/bot run --disable-fail-fast

pcastonguay · 2025-09-05T15:22:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-05T15:27:52Z

PR_Github #17788 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-05T21:13:16Z

PR_Github #17788 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13318 completed with status: 'FAILURE'

Superjomn · 2025-09-05T23:35:04Z

/bot run

tensorrt-cicd · 2025-09-05T23:41:18Z

PR_Github #17830 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-06T02:39:31Z

PR_Github #17830 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13347 completed with status: 'FAILURE'

pcastonguay · 2025-09-06T16:51:06Z

/bot run

tensorrt-cicd · 2025-09-06T16:56:10Z

PR_Github #17890 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-06T19:13:31Z

PR_Github #17890 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13399 completed with status: 'FAILURE'

Signed-off-by: Superjomn <[email protected]>

Signed-off-by: Yan Chunwei <[email protected]>

pcastonguay · 2025-09-06T19:24:14Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-06T19:29:20Z

PR_Github #17893 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-07T07:21:02Z

PR_Github #17893 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13402 completed with status: 'FAILURE'

pcastonguay · 2025-09-07T12:44:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-07T12:49:55Z

PR_Github #17932 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-07T15:32:22Z

PR_Github #17932 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13439 completed with status: 'FAILURE'

pcastonguay · 2025-09-07T18:23:02Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-07T18:28:34Z

PR_Github #17951 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-07T23:41:40Z

PR_Github #17951 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13454 completed with status: 'SUCCESS'

Signed-off-by: Superjomn <[email protected]> Signed-off-by: Yan Chunwei <[email protected]>

Superjomn requested a review from QiJune July 10, 2025 09:17

Superjomn changed the title ~~api: expose tokens_per_block into KvCacheConfig~~ chore: expose tokens_per_block into KvCacheConfig Jul 10, 2025

Superjomn force-pushed the add-tokens-per-block branch from 70c6157 to 398edfc Compare July 10, 2025 09:19

QiJune approved these changes Jul 11, 2025

View reviewed changes

Superjomn requested a review from lucaslie July 11, 2025 02:08

Superjomn force-pushed the add-tokens-per-block branch from 398edfc to 6abb3e8 Compare July 11, 2025 05:24

Superjomn requested a review from a team as a code owner July 11, 2025 05:24

Superjomn requested a review from yuxianq July 11, 2025 05:24

Superjomn force-pushed the add-tokens-per-block branch from 6abb3e8 to c0b8a26 Compare July 11, 2025 05:27

Superjomn force-pushed the add-tokens-per-block branch 2 times, most recently from 13208c1 to 14bb591 Compare July 12, 2025 15:10

Superjomn requested a review from a team as a code owner July 12, 2025 15:10

Superjomn requested a review from juney-nvidia July 12, 2025 15:10

Superjomn removed request for juney-nvidia and yuxianq July 13, 2025 05:47

Superjomn force-pushed the add-tokens-per-block branch from 14bb591 to 1012f27 Compare July 14, 2025 04:36

Superjomn force-pushed the add-tokens-per-block branch from 1012f27 to f4c5993 Compare July 14, 2025 07:42

Superjomn force-pushed the add-tokens-per-block branch 2 times, most recently from 0d17677 to 1ab900d Compare September 5, 2025 23:34

Superjomn added 2 commits September 6, 2025 15:23

init

32d945a

Signed-off-by: Superjomn <[email protected]>

change default value to 64

78aee16

Signed-off-by: Yan Chunwei <[email protected]>

pcastonguay force-pushed the add-tokens-per-block branch from 1ab900d to 78aee16 Compare September 6, 2025 19:24

pcastonguay merged commit 205c3a1 into NVIDIA:main Sep 8, 2025
5 checks passed

Superjomn deleted the add-tokens-per-block branch September 8, 2025 01:21

Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025

[None][chore] expose tokens_per_block into KvCacheConfig (NVIDIA#5911)

cd1c109

Signed-off-by: Superjomn <[email protected]> Signed-off-by: Yan Chunwei <[email protected]>

[None][chore] expose tokens_per_block into KvCacheConfig #5911

[None][chore] expose tokens_per_block into KvCacheConfig #5911

Uh oh!

Conversation

Superjomn commented Jul 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

Superjomn commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Superjomn commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

Superjomn commented Jul 12, 2025

Uh oh!

tensorrt-cicd commented Jul 12, 2025

Uh oh!

tensorrt-cicd commented Jul 12, 2025

Uh oh!

Superjomn commented Jul 14, 2025

Uh oh!

tensorrt-cicd commented Jul 14, 2025

Uh oh!

tensorrt-cicd commented Jul 14, 2025

Uh oh!

Superjomn commented Jul 14, 2025

Uh oh!

Superjomn commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

richardhuo-nv commented Sep 5, 2025

Uh oh!

pcastonguay commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

Superjomn commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

pcastonguay commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

pcastonguay commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 7, 2025

Uh oh!

pcastonguay commented Sep 7, 2025

Uh oh!

tensorrt-cicd commented Sep 7, 2025

Uh oh!

Superjomn commented Jul 10, 2025 •

edited by coderabbitai bot

Loading