-
Notifications
You must be signed in to change notification settings - Fork 764
feat: support trtllm in sla-planner #2980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
1aa8c32
chore: update CODEOWNERS for multimodal examples (#2878)
biswapanda ece24b1
first attempt
tedzhouhk c5633e1
add aiofiles
tedzhouhk 0846ae7
fix trtllm docker file
tedzhouhk 90b5ff1
Revert "fix trtllm docker file"
tedzhouhk f2a66cb
install requirement.txt
tedzhouhk 4001324
add trtllm to sla planner
tedzhouhk ebc7611
fix: fix hermes tool call config (#2915)
ayushag-nv e63ec2e
ci: OPS-724: Move to ARC runners (#2904)
dillon-cullinan 37f2778
fix: CI is broken with a deprecated dependency on pynvml (#2926)
saturley-hall cd1115a
fix: fix typo in multinode example (#2931)
julienmancuso 327f3fe
ci: Add concurrency check to auto cancel running actions. (#2438)
pvijayakrish 8b1b24c
chore: added utility to detect possible tool call start for a chunk (…
ayushag-nv d34cfdd
chore: add preference logic for using tool-call and reasoning parsers…
ayushag-nv dad62a5
Update README.md (#2938)
harryskim 64ba7f3
build: OPS-597: restructure sglang to follow container strategy struc…
nv-tusharma 766d5b2
refactor: standardize e2e tests across 3 frameworks (#2827)
alec-flowers e41c5bb
feat: automatically setup and inject prometheus configuration (#2912)
julienmancuso 1803db8
fix: WAR DeepGemm JIT compilation errors (#2937)
GuanLuo a76fd70
ci: sglang functional tests (#2943)
alec-flowers 8f1f965
feat: update benchmarking and deploy utils (#2933)
hhzhang16 4db7fcf
feat: Add a checksum to ModelDeploymentCard fields (#2934)
grahamking 351464b
ci: Fix Dockerfile mount secrets (#2960)
dillon-cullinan 51c75e1
chore: added tool call schema validation in oai formatter (#2935)
ayushag-nv f7090a3
test: remove nighlty marker in kvbm tests (#2958)
nv-anants f5644ef
ci: remove pre-merge ignore in github workflow (#2940)
nv-anants 1a412eb
ci: longer timeout, change model for l40 (#2951)
alec-flowers b19deaf
fix: aggregate logprobs (#2928)
messiaen 7148426
fix: no reasoning parser by default (#2939)
nealvaidya a2e3b52
docs: fix broken links (#2965)
nv-nmailhot 37213b6
feat: add a virtual connector for 3rd party deployments (#2913)
tedzhouhk 3af3425
fix: dyn namespace scoping for trtllm
biswapanda 436307c
pc
tedzhouhk e62f664
remove duplicate
tedzhouhk a4a3e66
Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…
tedzhouhk e6ac2a7
address pr comment
tedzhouhk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
add trtllm to sla planner
Signed-off-by: hongkuanz <[email protected]>
- Loading branch information
commit 40013243a5922ace30e92a5fae3129644990f798
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,205 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| apiVersion: nvidia.com/v1alpha1 | ||
| kind: DynamoGraphDeployment | ||
| metadata: | ||
| name: trtllm-disagg-planner | ||
| spec: | ||
| envs: | ||
| - name: DYNAMO_SERVICE_CONFIG | ||
| value: '{"Prometheus":{"global":{"scrape_interval":"5s"},"scrape_configs":[{"job_name":"prometheus","static_configs":[{"targets":["localhost:8000"]}]},{"job_name":"frontend","static_configs":[{"targets":["trtllm-disagg-planner-frontend:8000"]}]}]}}' | ||
| - name: DYNAMO_NAMESPACE | ||
| value: "trtllm-disagg-planner" | ||
| services: | ||
| Frontend: | ||
| dynamoNamespace: trtllm-disagg-planner | ||
| componentType: frontend | ||
| replicas: 1 | ||
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03 | ||
| workingDir: /workspace/components/backends/trtllm | ||
| command: | ||
| - python3 | ||
| args: | ||
| - -m | ||
| - dynamo.frontend | ||
| - --http-port | ||
| - "8000" | ||
| - --kv-cache-block-size | ||
| - "128" | ||
| - --router-mode | ||
| - kv | ||
| - --kv-overlap-score-weight | ||
| - "0.0" | ||
| - --router-temperature | ||
| - "0.0" | ||
| - --no-kv-events | ||
| Planner: | ||
| dynamoNamespace: trtllm-disagg-planner | ||
| envFromSecret: hf-token-secret | ||
| componentType: planner | ||
| replicas: 1 | ||
| envs: | ||
| - name: PROMETHEUS_PORT | ||
| value: "8000" | ||
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "exit 0" | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "exit 0" | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| pvc: | ||
tedzhouhk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| create: false | ||
| name: dynamo-pvc # Must be pre-created before deployment and SLA profiler must have been run | ||
| mountPoint: /workspace/profiling_results | ||
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03 | ||
| workingDir: /workspace/components/planner/src/dynamo/planner | ||
| ports: | ||
| - name: metrics | ||
| containerPort: 9085 | ||
| command: | ||
| - python3 | ||
| args: | ||
| - -m | ||
| - planner_sla | ||
| - --environment=kubernetes | ||
| - --backend=trtllm | ||
| - --adjustment-interval=60 | ||
| - --profile-results-dir=/workspace/profiling_results | ||
| - --prometheus-port=9085 | ||
| Prometheus: # NOTE: this is set on Prometheus to ensure a service is created for the Prometheus component. This is a workaround and should be managed differently. | ||
| dynamoNamespace: trtllm-disagg-planner | ||
| componentType: frontend | ||
| replicas: 1 | ||
| envs: | ||
| - name: PYTHONPATH | ||
| value: "/workspace/components/planner/src" | ||
| - name: PROMETHEUS_PORT | ||
| value: "8000" | ||
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "exit 0" | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - "exit 0" | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 60 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 10 | ||
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03 | ||
| workingDir: /workspace/components/backends/trtllm | ||
| command: | ||
| - python3 | ||
| args: | ||
| - -m | ||
| - dynamo.planner.prometheus | ||
| TRTLLMDecodeWorker: | ||
| dynamoNamespace: trtllm-disagg-planner | ||
| envFromSecret: hf-token-secret | ||
| componentType: worker | ||
| replicas: 1 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /live | ||
| port: 9090 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 1 | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /health | ||
| port: 9090 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 30 | ||
| failureThreshold: 60 | ||
| resources: | ||
| limits: | ||
| gpu: "1" | ||
| extraPodSpec: | ||
tedzhouhk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| terminationGracePeriodSeconds: 600 | ||
| mainContainer: | ||
| startupProbe: | ||
| httpGet: | ||
| path: /health | ||
| port: 9090 | ||
| periodSeconds: 10 | ||
| failureThreshold: 60 | ||
| image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03 | ||
| workingDir: /workspace/components/backends/trtllm | ||
| command: | ||
| - python3 | ||
| args: | ||
| - -m | ||
| - dynamo.trtllm | ||
| - --model-path | ||
| - Qwen/Qwen3-0.6B | ||
| - --served-model-name | ||
| - Qwen/Qwen3-0.6B | ||
| - --extra-engine-args | ||
| - engine_configs/decode.yaml | ||
| - --disaggregation-mode | ||
| - decode | ||
| - --disaggregation-strategy | ||
| - decode_first | ||
| TRTLLMPrefillWorker: | ||
| dynamoNamespace: trtllm-disagg-planner | ||
| envFromSecret: hf-token-secret | ||
| componentType: worker | ||
| replicas: 1 | ||
| resources: | ||
| limits: | ||
| gpu: "1" | ||
| extraPodSpec: | ||
| terminationGracePeriodSeconds: 600 | ||
| mainContainer: | ||
| startupProbe: | ||
| httpGet: | ||
| path: /health | ||
| port: 9090 | ||
| periodSeconds: 10 | ||
| failureThreshold: 60 | ||
| image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03 | ||
tedzhouhk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| workingDir: /workspace/components/backends/trtllm | ||
| command: | ||
| - python3 | ||
| args: | ||
| - -m | ||
| - dynamo.trtllm | ||
| - --model-path | ||
| - Qwen/Qwen3-0.6B | ||
| - --served-model-name | ||
| - Qwen/Qwen3-0.6B | ||
| - --extra-engine-args | ||
| - engine_configs/prefill.yaml | ||
| - --disaggregation-mode | ||
| - prefill | ||
| - --disaggregation-strategy | ||
| - decode_first | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,7 +44,7 @@ Key features include: | |
| - ✅ | ||
| - vLLM | ||
| * - | ||
| - ❌ | ||
| - ✅ | ||
| - TensorRT-LLM | ||
| * - | ||
| - ❌ | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.