-
Notifications
You must be signed in to change notification settings - Fork 764
fix: prevent crash looping hello world #2625 #2670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
992adfb
9a93f11
2a616da
d0de1a0
edccbd5
54fbff3
a9b6b28
65e89b3
c92dc98
eb58916
e848cf5
5e3586d
4fbb4e5
dc13774
e5e94ad
92781d3
58ad4a2
039c061
2a8e251
2dc4a4b
85737ba
27c8a97
641e49d
1b145bb
4e4818f
c92c1f4
6fce98a
035d6d8
167c793
409aa9e
71126c7
f342c30
96d1f15
e8b37a6
b5c9278
b0c1a24
0cf8041
bd8e368
73bcc3b
aa57c6b
3f0a725
d98a791
37fca1c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Co-authored-by: Dmitry Tokarev <[email protected]>
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -5,7 +5,7 @@ SPDX-License-Identifier: Apache-2.0 | |||||
|
|
||||||
| # Running DeepSeek-R1 Disaggregated with WideEP on H100s | ||||||
|
|
||||||
| Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs). | ||||||
| Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs). | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inconsistent Dockerfile name: “sglang-deepep” vs build command using “sglang-wideep”. The text references -... We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` ...
+... We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` ...📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
|
|
||||||
| ## Instructions | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,8 +18,7 @@ Currently, this setup is only kgateway based Inference Gateway. | |
|
|
||
| 1. **Install Dynamo Platform** | ||
|
|
||
| [See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud. | ||
|
|
||
| [See Quickstart Guide](../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainVerify the Quickstart link after the docs restructure. Please confirm ../../docs/guides/dynamo_deploy/quickstart.md exists in this PR and on the target branch post-merge. 🏁 Script executed: #!/bin/bash
# Verify that the quickstart doc exists (run from repo root)
fd -a quickstart.md | rg -n "^docs/guides/dynamo_deploy/quickstart\.md$" -n || {
echo "Missing quickstart at docs/guides/dynamo_deploy/quickstart.md"
exit 1
}Length of output: 219 Broken Quickstart Link: The referenced Quickstart guide at • Confirm that 🤖 Prompt for AI Agents |
||
|
|
||
| 2. **Deploy Inference Gateway** | ||
|
|
||
|
|
||
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -211,7 +211,7 @@ The KV-aware routing arguments: | |
|
|
||
| ### Request Migration | ||
|
|
||
| In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the HTTP Server and the Worker Engine. | ||
| In a Distributed System, a request may fail due to connectivity issues between the HTTP Server and the Worker Engine. | ||
|
|
||
| The HTTP Server will automatically track which Worker Engines are having connectivity issues with it and avoid routing new requests to the Engines with known connectivity issues. | ||
|
|
||
|
|
@@ -482,11 +482,11 @@ The trtllm engine requires [etcd](https://etcd.io/) and [nats](https://nats.io/) | |
|
|
||
| ##### Step 1: Build the environment | ||
|
|
||
| See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#build-docker) to build the dynamo container with TensorRT-LLM. | ||
| See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#build-container) to build the dynamo container with TensorRT-LLM. | ||
|
|
||
| ##### Step 2: Run the environment | ||
|
|
||
| See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#run-container) to run the built environment. | ||
| See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#run-container) to run the built environment. | ||
|
|
||
|
Comment on lines
+485
to
490
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainUpdated TRT-LLM links — verify anchors exist in README. Ensure 🏁 Script executed: #!/bin/bash
set -euo pipefail
file="components/backends/trtllm/README.md"
echo "Checking anchors in $file"
rg -nP '^\s*#{2,}\s*Build the environment\b' "$file" || echo "Missing 'Build the environment' heading"
rg -nP '^\s*#{2,}\s*Run the environment\b' "$file" || echo "Missing 'Run the environment' heading"Length of output: 538 Add missing anchors in components/backends/trtllm/README.md The links in • In components/backends/trtllm/README.md, add: ### Build Container
<!-- or: <a name="build-container"></a> -->
Steps to build the Dynamo container with TensorRT-LLM…• And: ### Run Container
<!-- or: <a name="run-container"></a> -->
Instructions to run the built environment…Alternatively, adjust the links in 🤖 Prompt for AI Agents |
||
| ##### Step 3: Execute `dynamo-run` command | ||
|
|
||
|
|
@@ -679,10 +679,6 @@ Here are some example engines: | |
| - Chat: | ||
| * [sglang](https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/examples/hello_world/server_sglang_tok.py) | ||
|
|
||
| More fully-featured Backend engines (used by `dynamo-run`): | ||
| - [vllm](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/vllm_inc.py) | ||
| - [sglang](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/sglang_inc.py) | ||
|
|
||
| ### Debugging | ||
|
|
||
| `dynamo-run` and `dynamo-runtime` support [tokio-console](https://github.com/tokio-rs/console). Build with the feature to enable: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix wording: extra “the” in Quick Start intro.
📝 Committable suggestion
🧰 Tools
🪛 LanguageTool
[grammar] ~55-~55: There might be a mistake here.
Context: ...on deployment patterns on a single node. ### Start NATS and ETCD in the background S...
(QB_NEW_EN)
🤖 Prompt for AI Agents