Skip to content

Conversation

@mikeiovine
Copy link
Collaborator

Description

The issue is as follows:

The root of this issue is the mutation. I think we should refactor this such that the LLM request is more agnostic to its KV cache manager.

  1. Remove the number of cached tokens from context_current_position.
  2. Add a method get_num_cached_tokens
  3. The runtime coordinates setting position_id = req.context_current_position + kv_cache_manager.get_num_cached_tokens(req).

I'm not sure how feasible the above refactor is. For now, I've just disabled KV cache reuse and logged a warning when a drafter is required. This is consistent with what we do when the attention backend doesn't support block reuse.

Test Coverage

Existing tests.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@mikeiovine mikeiovine requested a review from lfr-0531 June 24, 2025 20:29
@mikeiovine mikeiovine requested review from a team as code owners June 24, 2025 20:29
@mikeiovine mikeiovine requested a review from yuxianq June 24, 2025 20:29
@mikeiovine mikeiovine force-pushed the prevent-crash branch 2 times, most recently from 0db19e5 to 7ec4aa0 Compare June 24, 2025 20:38
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9753 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9753 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7187 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9889 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9889 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7298 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@mikeiovine
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9909 [ reuse-pipeline ] triggered by Bot

@mikeiovine
Copy link
Collaborator Author

/bot reuse-pipeline

@mikeiovine mikeiovine enabled auto-merge (squash) June 25, 2025 19:37
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9910 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9909 [ reuse-pipeline ] completed with state ABORTED
Can't reuse PR_Github #9889 with status: SUCCESS

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9910 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #9889 for commit 13da78b

@mikeiovine mikeiovine merged commit 5bc8c89 into NVIDIA:main Jun 25, 2025
3 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
@mikeiovine mikeiovine deleted the prevent-crash branch July 23, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants