-
Notifications
You must be signed in to change notification settings - Fork 2k
[refactor] Simplification of Speculative decoding configs - Part 2 #5936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b773dd3 to
5f205d9
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #11653 [ run ] triggered by Bot |
|
PR_Github #11653 [ run ] completed with state |
57ec521 to
fb30ad5
Compare
## Walkthrough
This update refactors speculative decoding utility access across several modules, replacing direct attribute and method calls on configuration objects with new standalone utility functions. It also removes related methods and fields from configuration classes, updates public exports, and modifies a test prompt. These changes centralize speculative decoding logic and validation.
## Changes
| File(s) | Change Summary |
|---------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| tensorrt_llm/_torch/pyexecutor/_util.py<br>tensorrt_llm/_torch/pyexecutor/model_engine.py<br>tensorrt_llm/_torch/pyexecutor/py_executor_creator.py<br>tensorrt_llm/_torch/pyexecutor/resource_manager.py | Refactored to use new utility functions (`get_num_extra_kv_tokens`, `update_spec_config_from_model_config`) instead of direct config attribute/method access. |
| tensorrt_llm/_torch/speculative/__init__.py | Added new utility functions to public exports (`__all__`). |
| tensorrt_llm/_torch/speculative/model_drafter.py | Replaced method call for draft prompt with new utility function. |
| tensorrt_llm/_torch/speculative/utils.py | Introduced new utility functions for speculative decoding modes and config updates. |
| tensorrt_llm/llmapi/llm_args.py | Removed obsolete fields and methods from decoding config classes; added stricter validation for draft model presence. |
| tests/unittest/_torch/speculative/test_draft_target.py | Switched test prompt from Germany to France by uncommenting and removing lines. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant PyExecutor
participant SpecConfig
participant Utils
User->>PyExecutor: create_py_executor()
PyExecutor->>Utils: get_num_extra_kv_tokens(spec_config)
Utils-->>PyExecutor: num_extra_kv_tokens
PyExecutor->>SpecConfig: update_spec_config_from_model_config()
SpecConfig-->>PyExecutor: updated spec_config
PyExecutor-->>User: Executor readyEstimated code review effort3 (90–240 minutes) Suggested reviewers
Poem
|
072d36a to
01ef697
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #12440 [ run ] completed with state |
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
Signed-off-by: wili-65535 <[email protected]>
d1df81a to
ae34c89
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #12527 [ run ] triggered by Bot |
|
PR_Github #12527 [ run ] completed with state |
…VIDIA#5936) Signed-off-by: wili-65535 <[email protected]> Co-authored-by: wili-65535 <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
…VIDIA#5936) Signed-off-by: wili-65535 <[email protected]> Co-authored-by: wili-65535 <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
Description
Previous PR5639.
Here we replace the class methods with stand-alone tool functions, including:
update_from_model_config,get_draft_model_prompt, andget_num_extra_kv_tokens.Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests