Skip to content

Conversation

@tzulingk
Copy link
Contributor

@tzulingk tzulingk commented Sep 17, 2025

Overview:

Trtllm canary health check

Details:

Add canary health check for trtllm. Main logic in PR#2903

Where should the reviewer start?

components/backends/trtllm/src/dynamo/trtllm/health_check.py: default payload
components/backends/trtllm/src/dynamo/trtllm/main.py: add payload to endpoint

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-635

Summary by CodeRabbit

  • Refactor
    • Standardized the TensorRT-LLM backend health check to a token-based payload for consistency with backend expectations.
  • Chores
    • Integrated the new health-check payload into the service startup and monitoring paths to ensure consistent endpoint checks.

Impact: Improves reliability and compatibility of health checks for the TensorRT-LLM backend.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

Walkthrough

Updates the TensorRT-LLM health-check to a token-based payload structure and wires this payload into the main server startup flow. The main routine now constructs TrtllmHealthCheckPayload and passes its dict to endpoint.serve_endpoint in both execution branches.

Changes

Cohort / File(s) Summary of changes
Health-check payload definition
components/backends/trtllm/src/dynamo/trtllm/health_check.py
Replaces messages-based default payload with token-centric fields: token_ids, stop_conditions, and sampling_options. Minor comment tweak; no public API/signature changes.
Main workflow integration
components/backends/trtllm/src/dynamo/trtllm/main.py
Imports TrtllmHealthCheckPayload, builds health_check_payload via to_dict(), and passes it to endpoint.serve_endpoint across both code paths. No other control/logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Main as Main (trtllm/main.py)
  participant HC as TrtllmHealthCheckPayload
  participant EP as Endpoint

  Main->>HC: Construct()
  HC-->>Main: to_dict() -> health_check_payload

  alt Publish disabled
    Note over Main,EP: New: pass payload to serve_endpoint
    Main->>EP: serve_endpoint(health_check_payload)
  else Publish enabled
    Note over Main,EP: New: pass payload to serve_endpoint
    Main->>EP: serve_endpoint(health_check_payload)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I thump my paws—new tokens hop in line,
Stop rules nibble bytes, sampling neat and fine.
I pass a payload, crisp as morning dew,
To endpoints humming, waiting for the cue.
Burrow secure—health checks squeak “all clear!”
Carrots for code; release draws near. 🥕

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: Trtllm canary health check" is concise, directly describes the primary change (adding a canary health check for Trtllm), and matches the diffs and PR objectives referencing health_check.py and main.py, so it is suitable for quick history scanning.
Description Check ✅ Passed The PR description follows the repository template by providing an Overview, Details, Where should the reviewer start, and Related Issues, and it points reviewers to the changed files and references the related work (PR#2903), so it is mostly complete. However, the Related Issues line uses "DIS-635" without an action keyword (Closes/Fixes/Relates to) and could be formatted more explicitly for GitHub linking.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (3)

27-27: Avoid hard‑coding token_id=1; prefer model BOS at runtime.

ID 1 isn’t universal across tokenizers. Derive BOS (or PAD fallback) from the active tokenizer to reduce cross‑model fragility. I’ve proposed a main.py patch to do this post‑construction.

Would you like me to wire a sentinel (e.g., "MODEL_BOS") here and replace it in main.py?


29-35: Option: set ignore_eos=True for guaranteed one decode step.

If some models instantly emit EOS, setting ignore_eos=True ensures a decode step still occurs. Not mandatory; current settings are acceptable for fast liveness.

Please confirm whether a zero-token early stop is considered “healthy” in your SLOs.


36-45: Trim redundant sampling knobs for greedy decode.

With temperature=0.0, top_k=1, top_p=1.0, and beam_width=1 are largely redundant. If handler defaults cover these, consider simplifying:

Apply if optional fields are truly optional in request parsing.

-            "sampling_options": {
-                "temperature": 0.0,
-                "top_p": 1.0,
-                "top_k": 1,
-                "beam_width": 1,
-                "repetition_penalty": 1.0,
-                "presence_penalty": 0.0,
-                "frequency_penalty": 0.0,
-                "seed": None,
-            },
+            "sampling_options": {
+                "temperature": 0.0
+            },
components/backends/trtllm/src/dynamo/trtllm/main.py (1)

320-322: Align canary token with the active tokenizer (BOS/PAD fallback)

Verified request handlers reference token_ids, stop_conditions, and sampling_options in components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py — apply the best-effort BOS→PAD alignment to the health-check payload in components/backends/trtllm/src/dynamo/trtllm/main.py (around lines 320–322):

-        # Get health check payload (checks env var and falls back to TensorRT-LLM default)
-        health_check_payload = TrtllmHealthCheckPayload().to_dict()
+        # Get health check payload (env override or default), then align token_id with model if possible
+        health_check_payload = TrtllmHealthCheckPayload().to_dict()
+        try:
+            if isinstance(health_check_payload, dict) and "token_ids" in health_check_payload:
+                ids = health_check_payload.get("token_ids") or []
+                if ids == [1] or not ids:
+                    bos = getattr(tokenizer, "bos_token_id", None)
+                    if isinstance(bos, int) and bos >= 0:
+                        health_check_payload["token_ids"] = [bos]
+                    else:
+                        pad = getattr(tokenizer, "pad_token_id", None)
+                        if isinstance(pad, int) and pad >= 0:
+                            health_check_payload["token_ids"] = [pad]
+        except Exception:
+            # Best-effort only; keep payload as-is if anything goes wrong
+            pass
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08cb08c and 212c3ee.

📒 Files selected for processing (2)
  • components/backends/trtllm/src/dynamo/trtllm/health_check.py (1 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/main.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/backends/trtllm/src/dynamo/trtllm/main.py (4)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (1)
  • TrtllmHealthCheckPayload (13-47)
lib/bindings/python/src/dynamo/health_check.py (1)
  • to_dict (86-96)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • endpoint (207-211)
  • serve_endpoint (220-232)
lib/bindings/python/rust/lib.rs (2)
  • endpoint (549-555)
  • serve_endpoint (569-622)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/trtllm/src/dynamo/trtllm/main.py (3)

30-30: Import looks good.

Clean, minimal integration point.


341-344: Passing health_check_payload in metrics branch: LGTM.

Matches serve_endpoint signature and keeps graceful_shutdown default.


347-349: Passing health_check_payload in non‑metrics branch: LGTM.

Consistent behavior across both paths.

@tzulingk tzulingk enabled auto-merge (squash) September 17, 2025 04:10
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gitlab failure for test_chat_only_aggregated_with_test_logits_processor appears unrelated to this change and happening on other branches as well:

E               AssertionError: Expected content not found in response. Missing: ['AI']

@tzulingk tzulingk merged commit 67ff181 into main Sep 17, 2025
13 of 14 checks passed
@tzulingk tzulingk deleted the tzulingk/health_check_trtllm_cherry branch September 17, 2025 17:04
kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Tzu-Ling Kan <[email protected]>
Signed-off-by: Kristen Kelleher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants