docs: Add health check section to GPT OSS guide #2556

rmccorm4 · 2025-08-20T16:34:43Z

Overview:

Clarify that both prefill/decode should be successfully started before sending any inference requests, by using the health endpoint.

Future Work

There will be a dedicated doc on health endpoint and how to use it in the near future.

Summary by CodeRabbit

Documentation
- Updated deployment verification steps to include a preflight readiness check that polls health until both prefill and decode workers are up, with guidance to monitor logs if one is still starting.
- Revised testing instructions to use a real OpenAI-compatible API call to the responses endpoint with configurable parameters.
- Applied these updates in both relevant sections of the guide for consistency.

coderabbitai · 2025-08-20T16:37:58Z

Walkthrough

Revises the TRT-LLM GPT OSS deployment guide to add a readiness polling step against /health (checking prefill and decode workers/endpoints) and replaces the test step with an OpenAI-compatible /v1/responses curl example. The same changes are applied in two places within the document. No code/public APIs changed.

Changes

Cohort / File(s)	Change summary
Docs: TRT-LLM GPT OSS deployment verification `components/backends/trtllm/gpt-oss.md`	Replaced “Test the Deployment” with “Verify the Deployment is Ready,” adding /health polling for prefill/decode worker readiness and listing expected dyn endpoints. Updated subsequent test to an OpenAI-compatible POST /v1/responses curl example. Applied in two duplicated sections within the doc.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Server as Inference Server
  participant Prefill as Prefill Worker
  participant Decode as Decode Worker

  User->>Server: GET /health (poll)
  Server->>Prefill: Check prefill status
  Server->>Decode: Check decode status
  Prefill-->>Server: status=healthy|starting
  Decode-->>Server: status=healthy|starting
  Server-->>User: { endpoints: [...], statuses }

  alt Both healthy
    User->>Server: POST /v1/responses {model,input,...}
    Server-->>User: 200 OK, response payload
  else Any starting
    User-->>User: Wait and continue polling /health
    Note over User,Server: Monitor logs until both endpoints are healthy
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

docs: add GPT-OSS deployment guide #2297 — Edits the same gpt-oss.md to introduce readiness polling and OpenAI-compatible curl; overlaps directly with these documentation updates.
docs: fix issues in gpt-oss guide #2304 — Also updates gpt-oss.md with the readiness flow and new test command; appears to touch the same sections as this PR.

Poem

A whisk of ears, I watch the lights,
Prefill hums, decode ignites.
I ping /health—two greens in view,
Then /v1/responses sings anew.
Logs like clover guide my hop—
Ready, steady—carrots pop! 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

components/backends/trtllm/gpt-oss.md (4)

217-219: Specify code fence language for curl example (markdownlint MD040).

Add the language to the fenced block for better syntax highlighting and to satisfy markdownlint.

-```
+```bash
 curl http://localhost:8000/health
-```
+```

221-221: Clarify success criteria (status must be healthy).

Explicitly state that overall status should be healthy before sending requests.

-Make sure that both of the endpoints are available before sending an inference request:
+Make sure both endpoints are listed and the overall status is "healthy" before sending any inference requests:

223-230: Specify code fence language for JSON example (markdownlint MD040).

Add the language to the fenced block.

-```
+```json
 {
   "endpoints": [
     "dyn://dynamo.tensorrt_llm.generate",
     "dyn://dynamo.tensorrt_llm_next.generate"
   ],
   "status": "healthy"
 }
-```
+```

217-221: Optional: include a one-liner to poll /health until ready.

This helps prevent premature requests during startup. Note: requires jq.

 curl http://localhost:8000/health

+Optional (requires jq): wait until the deployment is ready before proceeding:
+
+```bash
+until curl -fsS http://localhost:8000/health | jq -e '.status=="healthy" and (.endpoints|index("dyn://dynamo.tensorrt_llm.generate")) and (.endpoints|index("dyn://dynamo.tensorrt_llm_next.generate"))' >/dev/null; do

echo "Waiting for prefill and decode workers to become healthy..."
sleep 2
+done
+```


</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**

**💡 Knowledge Base configuration:**

- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between f73d35d50c389926a934de131751796ce5050766 and d13febe87b5671196b3286419e435b7c6402b16b.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `components/backends/trtllm/gpt-oss.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>components/backends/trtllm/gpt-oss.md</summary>

[grammar] ~232-~232: There might be a mistake here.
Context: ... still be starting up. You can watch the worker logs to see the progress of worke...

(QB_NEW_EN)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>components/backends/trtllm/gpt-oss.md</summary>

217-217: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

222-222: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)</summary>

* GitHub Check: Build and Test - dynamo
* GitHub Check: pre-merge-rust (lib/runtime/examples)
* GitHub Check: pre-merge-rust (lib/bindings/python)
* GitHub Check: pre-merge-rust (.)

</details>

<details>
<summary>🔇 Additional comments (1)</summary><blockquote>

<details>
<summary>components/backends/trtllm/gpt-oss.md (1)</summary>

`214-216`: **Nice addition: readiness check before inference.**

Adding an explicit health check step to ensure both prefill and decode are up before sending traffic improves reliability and user experience.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

components/backends/trtllm/gpt-oss.md

Signed-off-by: Hannah Zhang <[email protected]>

docs: Add health check section to GPT OSS guide

d13febe

pull-request-size bot added the size/S label Aug 20, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 16:34 Inactive

github-actions bot added the docs label Aug 20, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 16:35 Inactive

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

components/backends/trtllm/gpt-oss.md Outdated Show resolved Hide resolved

docs: CodeRabbit feedback

38a4193

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 16:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 16:52 Inactive

rmccorm4 enabled auto-merge (squash) August 20, 2025 16:55

dmitry-tokarev-nv approved these changes Aug 20, 2025

View reviewed changes

rmccorm4 merged commit 8380f1b into main Aug 20, 2025
12 of 13 checks passed

rmccorm4 deleted the rmccormick/nvbugs5468257 branch August 20, 2025 17:12

rmccorm4 added a commit that referenced this pull request Aug 20, 2025

docs: Add health check section to GPT OSS guide (#2556)

eda06a7

rmccorm4 mentioned this pull request Aug 20, 2025

docs: Add health check section to GPT OSS guide (#2556) #2571

Merged

dmitry-tokarev-nv pushed a commit that referenced this pull request Aug 20, 2025

docs: Add health check section to GPT OSS guide (#2556) (#2571)

4e8cbab

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

docs: Add health check section to GPT OSS guide (#2556)

dc9af03

Signed-off-by: Hannah Zhang <[email protected]>

nv-anants pushed a commit that referenced this pull request Aug 28, 2025

docs: Add health check section to GPT OSS guide (#2556)

c7fbb18

coderabbitai bot mentioned this pull request Sep 2, 2025

docs: health check and structured logs #2805

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add health check section to GPT OSS guide #2556

docs: Add health check section to GPT OSS guide #2556

Uh oh!

rmccorm4 commented Aug 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 20, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: Add health check section to GPT OSS guide #2556

docs: Add health check section to GPT OSS guide #2556

Uh oh!

Conversation

rmccorm4 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Future Work

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 20, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rmccorm4 commented Aug 20, 2025 •

edited

Loading