-
Notifications
You must be signed in to change notification settings - Fork 787
docs: Add health check section to GPT OSS guide #2556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRevises the TRT-LLM GPT OSS deployment guide to add a readiness polling step against /health (checking prefill and decode workers/endpoints) and replaces the test step with an OpenAI-compatible /v1/responses curl example. The same changes are applied in two places within the document. No code/public APIs changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Server as Inference Server
participant Prefill as Prefill Worker
participant Decode as Decode Worker
User->>Server: GET /health (poll)
Server->>Prefill: Check prefill status
Server->>Decode: Check decode status
Prefill-->>Server: status=healthy|starting
Decode-->>Server: status=healthy|starting
Server-->>User: { endpoints: [...], statuses }
alt Both healthy
User->>Server: POST /v1/responses {model,input,...}
Server-->>User: 200 OK, response payload
else Any starting
User-->>User: Wait and continue polling /health
Note over User,Server: Monitor logs until both endpoints are healthy
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
components/backends/trtllm/gpt-oss.md (4)
217-219: Specify code fence language for curl example (markdownlint MD040).Add the language to the fenced block for better syntax highlighting and to satisfy markdownlint.
-``` +```bash curl http://localhost:8000/health -``` +```
221-221: Clarify success criteria (status must be healthy).Explicitly state that overall status should be healthy before sending requests.
-Make sure that both of the endpoints are available before sending an inference request: +Make sure both endpoints are listed and the overall status is "healthy" before sending any inference requests:
223-230: Specify code fence language for JSON example (markdownlint MD040).Add the language to the fenced block.
-``` +```json { "endpoints": [ "dyn://dynamo.tensorrt_llm.generate", "dyn://dynamo.tensorrt_llm_next.generate" ], "status": "healthy" } -``` +```
217-221: Optional: include a one-liner to poll /health until ready.This helps prevent premature requests during startup. Note: requires jq.
curl http://localhost:8000/health+Optional (requires jq): wait until the deployment is ready before proceeding:
+
+```bash
+until curl -fsS http://localhost:8000/health | jq -e '.status=="healthy" and (.endpoints|index("dyn://dynamo.tensorrt_llm.generate")) and (.endpoints|index("dyn://dynamo.tensorrt_llm_next.generate"))' >/dev/null; do
- echo "Waiting for prefill and decode workers to become healthy..."
- sleep 2
+done
+```</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: .coderabbit.yaml** **Review profile: CHILL** **Plan: Pro** **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between f73d35d50c389926a934de131751796ce5050766 and d13febe87b5671196b3286419e435b7c6402b16b. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `components/backends/trtllm/gpt-oss.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 LanguageTool</summary> <details> <summary>components/backends/trtllm/gpt-oss.md</summary> [grammar] ~232-~232: There might be a mistake here. Context: ... still be starting up. You can watch the worker logs to see the progress of worke... (QB_NEW_EN) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>components/backends/trtllm/gpt-oss.md</summary> 217-217: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 222-222: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)</summary> * GitHub Check: Build and Test - dynamo * GitHub Check: pre-merge-rust (lib/runtime/examples) * GitHub Check: pre-merge-rust (lib/bindings/python) * GitHub Check: pre-merge-rust (.) </details> <details> <summary>🔇 Additional comments (1)</summary><blockquote> <details> <summary>components/backends/trtllm/gpt-oss.md (1)</summary> `214-216`: **Nice addition: readiness check before inference.** Adding an explicit health check step to ensure both prefill and decode are up before sending traffic improves reliability and user experience. </details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Signed-off-by: Hannah Zhang <[email protected]>
Overview:
Clarify that both prefill/decode should be successfully started before sending any inference requests, by using the health endpoint.
Future Work
There will be a dedicated doc on health endpoint and how to use it in the near future.
Summary by CodeRabbit