feat: Trtllm canary health check #3082

tzulingk · 2025-09-17T03:01:03Z

Overview:

Trtllm canary health check

Details:

Add canary health check for trtllm. Main logic in PR#2903

Where should the reviewer start?

components/backends/trtllm/src/dynamo/trtllm/health_check.py: default payload
components/backends/trtllm/src/dynamo/trtllm/main.py: add payload to endpoint

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-635

Summary by CodeRabbit

Refactor
- Standardized the TensorRT-LLM backend health check to a token-based payload for consistency with backend expectations.
Chores
- Integrated the new health-check payload into the service startup and monitoring paths to ensure consistent endpoint checks.

Impact: Improves reliability and compatibility of health checks for the TensorRT-LLM backend.

Signed-off-by: [email protected] <[email protected]>

…annel to discover registered endpoints. Signed-off-by: [email protected] <[email protected]>

…INT_HEALTH_STATUS Signed-off-by: [email protected] <[email protected]>

Signed-off-by: [email protected] <[email protected]>

…caches are consistent Signed-off-by: [email protected] <[email protected]>

Signed-off-by: [email protected] <[email protected]>

Signed-off-by: Tzu-Ling Kan <[email protected]>

coderabbitai · 2025-09-17T03:36:39Z

Walkthrough

Updates the TensorRT-LLM health-check to a token-based payload structure and wires this payload into the main server startup flow. The main routine now constructs TrtllmHealthCheckPayload and passes its dict to endpoint.serve_endpoint in both execution branches.

Changes

Cohort / File(s)	Summary of changes
Health-check payload definition `components/backends/trtllm/src/dynamo/trtllm/health_check.py`	Replaces messages-based default payload with token-centric fields: token_ids, stop_conditions, and sampling_options. Minor comment tweak; no public API/signature changes.
Main workflow integration `components/backends/trtllm/src/dynamo/trtllm/main.py`	Imports TrtllmHealthCheckPayload, builds health_check_payload via to_dict(), and passes it to endpoint.serve_endpoint across both code paths. No other control/logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Main as Main (trtllm/main.py)
  participant HC as TrtllmHealthCheckPayload
  participant EP as Endpoint

  Main->>HC: Construct()
  HC-->>Main: to_dict() -> health_check_payload

  alt Publish disabled
    Note over Main,EP: New: pass payload to serve_endpoint
    Main->>EP: serve_endpoint(health_check_payload)
  else Publish enabled
    Note over Main,EP: New: pass payload to serve_endpoint
    Main->>EP: serve_endpoint(health_check_payload)
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I thump my paws—new tokens hop in line,
Stop rules nibble bytes, sampling neat and fine.
I pass a payload, crisp as morning dew,
To endpoints humming, waiting for the cue.
Burrow secure—health checks squeak “all clear!”
Carrots for code; release draws near. 🥕

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat: Trtllm canary health check" is concise, directly describes the primary change (adding a canary health check for Trtllm), and matches the diffs and PR objectives referencing health_check.py and main.py, so it is suitable for quick history scanning.
Description Check	✅ Passed	The PR description follows the repository template by providing an Overview, Details, Where should the reviewer start, and Related Issues, and it points reviewers to the changed files and references the related work (PR#2903), so it is mostly complete. However, the Related Issues line uses "DIS-635" without an action keyword (Closes/Fixes/Relates to) and could be formatted more explicitly for GitHub linking.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

components/backends/trtllm/src/dynamo/trtllm/health_check.py (3)
27-27: Avoid hard‑coding token_id=1; prefer model BOS at runtime.

ID 1 isn’t universal across tokenizers. Derive BOS (or PAD fallback) from the active tokenizer to reduce cross‑model fragility. I’ve proposed a main.py patch to do this post‑construction.

Would you like me to wire a sentinel (e.g., "MODEL_BOS") here and replace it in main.py?

29-35: Option: set ignore_eos=True for guaranteed one decode step.

If some models instantly emit EOS, setting ignore_eos=True ensures a decode step still occurs. Not mandatory; current settings are acceptable for fast liveness.

Please confirm whether a zero-token early stop is considered “healthy” in your SLOs.

36-45: Trim redundant sampling knobs for greedy decode.

With temperature=0.0, top_k=1, top_p=1.0, and beam_width=1 are largely redundant. If handler defaults cover these, consider simplifying:

Apply if optional fields are truly optional in request parsing.
-            "sampling_options": {
-                "temperature": 0.0,
-                "top_p": 1.0,
-                "top_k": 1,
-                "beam_width": 1,
-                "repetition_penalty": 1.0,
-                "presence_penalty": 0.0,
-                "frequency_penalty": 0.0,
-                "seed": None,
-            },
+            "sampling_options": {
+                "temperature": 0.0
+            },
components/backends/trtllm/src/dynamo/trtllm/main.py (1)
320-322: Align canary token with the active tokenizer (BOS/PAD fallback)

Verified request handlers reference token_ids, stop_conditions, and sampling_options in components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py — apply the best-effort BOS→PAD alignment to the health-check payload in components/backends/trtllm/src/dynamo/trtllm/main.py (around lines 320–322):
-        # Get health check payload (checks env var and falls back to TensorRT-LLM default)
-        health_check_payload = TrtllmHealthCheckPayload().to_dict()
+        # Get health check payload (env override or default), then align token_id with model if possible
+        health_check_payload = TrtllmHealthCheckPayload().to_dict()
+        try:
+            if isinstance(health_check_payload, dict) and "token_ids" in health_check_payload:
+                ids = health_check_payload.get("token_ids") or []
+                if ids == [1] or not ids:
+                    bos = getattr(tokenizer, "bos_token_id", None)
+                    if isinstance(bos, int) and bos >= 0:
+                        health_check_payload["token_ids"] = [bos]
+                    else:
+                        pad = getattr(tokenizer, "pad_token_id", None)
+                        if isinstance(pad, int) and pad >= 0:
+                            health_check_payload["token_ids"] = [pad]
+        except Exception:
+            # Best-effort only; keep payload as-is if anything goes wrong
+            pass

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08cb08c and 212c3ee.

📒 Files selected for processing (2)

components/backends/trtllm/src/dynamo/trtllm/health_check.py (1 hunks)
components/backends/trtllm/src/dynamo/trtllm/main.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/backends/trtllm/src/dynamo/trtllm/main.py (4)

components/backends/trtllm/src/dynamo/trtllm/health_check.py (1)

TrtllmHealthCheckPayload (13-47)

lib/bindings/python/src/dynamo/health_check.py (1)

to_dict (86-96)

lib/bindings/python/src/dynamo/_core.pyi (2)

endpoint (207-211)

serve_endpoint (220-232)

lib/bindings/python/rust/lib.rs (2)

endpoint (549-555)

serve_endpoint (569-622)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - vllm
GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - sglang
GitHub Check: Build and Test - dynamo

🔇 Additional comments (3)

components/backends/trtllm/src/dynamo/trtllm/main.py (3)

30-30: Import looks good.

Clean, minimal integration point.

341-344: Passing health_check_payload in metrics branch: LGTM.

Matches serve_endpoint signature and keeps graceful_shutdown default.

347-349: Passing health_check_payload in non‑metrics branch: LGTM.

Consistent behavior across both paths.

rmccorm4

LGTM

gitlab failure for test_chat_only_aggregated_with_test_logits_processor appears unrelated to this change and happening on other branches as well:

E               AssertionError: Expected content not found in response. Missing: ['AI']

Signed-off-by: [email protected] <[email protected]> Signed-off-by: Tzu-Ling Kan <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>

tzulingk added 15 commits September 16, 2025 15:29

Canary Health Check.

33558b2

Signed-off-by: [email protected] <[email protected]>

Uncommit unintended files

2b898c2

Signed-off-by: [email protected] <[email protected]>

Remove unused functions and data members.

e20b261

Signed-off-by: [email protected] <[email protected]>

Remove health_check_notifier from RequestMetricsGuard

2659e99

Signed-off-by: [email protected] <[email protected]>

Send health check request using PushRouter.

31923b4

Signed-off-by: [email protected] <[email protected]>

HealthCheckPayload for managing backend health check payloads. Use ch…

6176d98

…annel to discover registered endpoints. Signed-off-by: [email protected] <[email protected]>

Combine this with health check readiness. Remove DYN_SYSTEM_USE_ENDPO…

0cdea5b

…INT_HEALTH_STATUS Signed-off-by: [email protected] <[email protected]>

Avoid dup registration race. Check payload is json. Add SPDX header

a8e4883

Signed-off-by: [email protected] <[email protected]>

Pass endpoint_subject as cache_key, so that all health check related …

9a69e82

…caches are consistent Signed-off-by: [email protected] <[email protected]>

Move SystemHealth to its own crate.

6c73ec7

Signed-off-by: [email protected] <[email protected]>

Add default_payload type hint for mypy

5fe0d00

Signed-off-by: [email protected] <[email protected]>

Add DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS back

37d6826

Signed-off-by: [email protected] <[email protected]>

Add a new line to satisfy format rule.

d5c80ec

Signed-off-by: [email protected] <[email protected]>

Canary health check for trtllm. Main logic in PR2903

de6dc46

Signed-off-by: [email protected] <[email protected]>

Update topk value, has to be >= 0.

8459dfc

Signed-off-by: [email protected] <[email protected]>

pull-request-size bot added the size/XXL label Sep 17, 2025

github-actions bot added the feat label Sep 17, 2025

tzulingk marked this pull request as ready for review September 17, 2025 03:33

tzulingk requested review from Aphoh, PeaBrane, alec-flowers, biswapanda, hhzhang16, hutm, ishandhanani, jasonqinzhou, julienmancuso, michaelshin, nnshah1 and tedzhouhk as code owners September 17, 2025 03:33

tzulingk requested review from a team, atchernych and mohammedabdulwahhab as code owners September 17, 2025 03:33

Merge branch 'main' into tzulingk/health_check_trtllm_cherry

212c3ee

Signed-off-by: Tzu-Ling Kan <[email protected]>

pull-request-size bot added size/M and removed size/XXL labels Sep 17, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 03:35 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 03:39 Inactive

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

tzulingk enabled auto-merge (squash) September 17, 2025 04:10

tzulingk requested review from rmccorm4 and ryanolson September 17, 2025 16:38

rmccorm4 approved these changes Sep 17, 2025

View reviewed changes

tzulingk merged commit 67ff181 into main Sep 17, 2025
13 of 14 checks passed

tzulingk deleted the tzulingk/health_check_trtllm_cherry branch September 17, 2025 17:04

kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025

feat: Trtllm canary health check (#3082)

2dd381f

Signed-off-by: [email protected] <[email protected]> Signed-off-by: Tzu-Ling Kan <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>

This was referenced Sep 18, 2025

feat: Sglang canary health check #3103

Merged

feat: Trtllm health check payload use bos_token_id #3145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Trtllm canary health check #3082

feat: Trtllm canary health check #3082

Uh oh!

tzulingk commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

rmccorm4 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Trtllm canary health check #3082

feat: Trtllm canary health check #3082

Uh oh!

Conversation

tzulingk commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tzulingk commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading