ai-dynamo · rmccorm4 · Aug 25, 2025 · Aug 19, 2025 · Aug 19, 2025 · Aug 19, 2025
diff --git a/components/backends/sglang/deploy/README.md b/components/backends/sglang/deploy/README.md
@@ -145,7 +145,7 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you
 ## Further Reading
 
 - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/guides/dynamo_deploy/create_deployment.md)
-- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/quickstart.md)
+- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
 - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
@@ -159,4 +159,4 @@ Common issues and solutions:
 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
 4. **Out of memory**: Increase memory limits or reduce model batch size
 
-For additional support, refer to the [deployment guide](../../../../docs/guides/dynamo_deploy/quickstart.md).
+For additional support, refer to the [deployment guide](../../../../docs/guides/dynamo_deploy/README.md).
diff --git a/components/backends/trtllm/deploy/README.md b/components/backends/trtllm/deploy/README.md
@@ -81,7 +81,7 @@ extraPodSpec:
 
 Before using these templates, ensure you have:
 
-1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/guides/dynamo_deploy/quickstart.md)
+1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/guides/dynamo_deploy/README.md)
 2. **Kubernetes cluster with GPU support**
 3. **Container registry access** for TensorRT-LLM runtime images
 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
@@ -257,7 +257,7 @@ Configure the `model` name and `host` based on your deployment.
 ## Further Reading
 
 - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/guides/dynamo_deploy/create_deployment.md)
-- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/quickstart.md)
+- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
 - **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
@@ -277,4 +277,4 @@ Common issues and solutions:
 6. **Git LFS issues**: Ensure git-lfs is installed before building containers
 7. **ARM deployment**: Use `--platform linux/arm64` when building on ARM machines
 
-For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
+For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/README.md).
diff --git a/components/backends/vllm/deploy/README.md b/components/backends/vllm/deploy/README.md
@@ -82,7 +82,7 @@ extraPodSpec:
 
 Before using these templates, ensure you have:
 
-1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/guides/dynamo_deploy/quickstart.md)
+1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/guides/dynamo_deploy/README.md)
 2. **Kubernetes cluster with GPU support**
 3. **Container registry access** for vLLM runtime images
 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
@@ -236,7 +236,7 @@ args:
 ## Further Reading
 
 - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/guides/dynamo_deploy/create_deployment.md)
-- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/quickstart.md)
+- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
 - **SLA Planner**: [SLA Planner Deployment Guide](../../../../docs/guides/dynamo_deploy/sla_planner_deployment.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
@@ -252,4 +252,4 @@ Common issues and solutions:
 4. **Out of memory**: Increase memory limits or reduce model batch size
 5. **Port forwarding issues**: Ensure correct pod UUID in port-forward command
 
-For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
+For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/README.md).
@@ -20,7 +20,7 @@ Currently, these setups are only supported with the kGateway based Inference Gat
 
 1. **Install Dynamo Platform**
 
-[See Quickstart Guide](../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
+[See Quickstart Guide](../../docs/guides/dynamo_deploy/README.md) to install Dynamo Cloud.
 
 
 2. **Deploy Inference Gateway**

diff --git a/docs/_includes/dive_in_examples.rst b/docs/_includes/dive_in_examples.rst
@@ -0,0 +1,32 @@
+The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.
+
+.. grid:: 1 2 2 2
+    :gutter: 3
+    :margin: 0
+    :padding: 3 4 0 0
+
+    .. grid-item-card:: :doc:`Hello World <../examples/runtime/hello_world/README>`
+        :link: ../examples/runtime/hello_world/README
+        :link-type: doc
+
+        Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
+
+    .. grid-item-card:: :doc:`vLLM <../components/backends/vllm/README>`
+        :link: ../components/backends/vllm/README
+        :link-type: doc
+
+        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
+
+    .. grid-item-card:: :doc:`SGLang <../components/backends/sglang/README>`
+        :link: ../components/backends/sglang/README
+        :link-type: doc
+
+        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with SGLang.
+
+    .. grid-item-card:: :doc:`TensorRT-LLM <../components/backends/trtllm/README>`
+        :link: ../components/backends/trtllm/README
+        :link-type: doc
+
+        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with TensorRT-LLM.
+
+
diff --git a/docs/_includes/install.rst b/docs/_includes/install.rst
@@ -0,0 +1,44 @@
+Pip (PyPI)
+----------
+
+Install a pre-built wheel from PyPI.
+
+.. code-block:: bash
+
+   # Create a virtual environment and activate it
+   uv venv venv
+   source venv/bin/activate
+
+   # Install Dynamo from PyPI (choose one backend extra)
+   uv pip install "ai-dynamo[sglang]==0.4.1"  # or [vllm], [trtllm]
+
+
+Pip from source
+---------------
+
+Install directly from a local checkout for development.
+
+.. code-block:: bash
+
+   # Clone the repository
+   git clone https://github.com/ai-dynamo/dynamo.git
+   cd dynamo
+
+   # Create a virtual environment and activate it
+   uv venv venv
+   source venv/bin/activate
+   uv pip install ".[sglang]"  # or [vllm], [trtllm]
+
+
+Docker
+------
+
+Pull and run prebuilt images from NVIDIA NGC (`nvcr.io`).
+
+.. code-block:: bash
+
+   # Run a container (mount your workspace if needed)
+   docker run --rm -it \
+     --gpus all \
+     --network host \
+     nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.4.1  # or vllm, tensorrtllm
diff --git a/docs/_includes/quick_start_local.rst b/docs/_includes/quick_start_local.rst
@@ -0,0 +1,43 @@
+Get started with Dynamo locally in just a few commands:
+
+**1. Install Dynamo**
+
+.. code-block:: bash
+
+   # Install uv (recommended Python package manager)
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+
+   # Create virtual environment and install Dynamo
+   uv venv venv
+   source venv/bin/activate
+   uv pip install "ai-dynamo[sglang]==0.4.1"  # or [vllm], [trtllm]
+
+**2. Start etcd/NATS**
+
+.. code-block:: bash
+
+   # Fetch and start etcd and NATS using Docker Compose
+   curl -fsSL -o docker-compose.yml https://raw.githubusercontent.com/ai-dynamo/dynamo/release/0.4.1/deploy/docker-compose.yml
+   docker compose -f docker-compose.yml up -d
+
+**3. Run Dynamo**
+
+.. code-block:: bash
+
+   # Start the OpenAI compatible frontend (default port is 8080)
+   python -m dynamo.frontend
+
+   # In another terminal, start an SGLang worker
+   python -m dynamo.sglang --model-path Qwen/Qwen3-0.6B
+
+**4. Test your deployment**
+
+.. code-block:: bash
+
+   curl localhost:8080/v1/chat/completions \
+     -H "Content-Type: application/json" \
+     -d '{"model": "Qwen/Qwen3-0.6B",
+          "messages": [{"role": "user", "content": "Hello!"}],
+          "max_tokens": 50}'
+
+
diff --git a/docs/_sections/architecture.rst b/docs/_sections/architecture.rst
@@ -0,0 +1,11 @@
+Overview
+============
+
+.. include:: ../architecture/architecture.md
+   :parser: myst_parser.sphinx_
+
+.. toctree::
+   :hidden:
+
+   Overview <self>
+   Disaggregated Serving <../architecture/disagg_serving>
diff --git a/docs/_sections/backends.rst b/docs/_sections/backends.rst
@@ -0,0 +1,42 @@
+..
+    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+    SPDX-License-Identifier: Apache-2.0
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+Backends
+========
+
+NVIDIA Dynamo supports multiple inference backends to provide flexibility and performance optimization for different use cases and model architectures. Backends are the underlying engines that execute AI model inference, each optimized for specific scenarios, hardware configurations, and performance requirements.
+
+Overview
+--------
+
+Dynamo's multi-backend architecture allows you to:
+
+* **Choose the optimal engine** for your specific workload and hardware
+* **Switch between backends** without changing your application code
+* **Leverage specialized optimizations** from each backend
+* **Scale flexibly** across different deployment scenarios
+
+Supported Backends
+------------------
+
+Dynamo currently supports the following high-performance inference backends:
+
+.. toctree::
+   :maxdepth: 1
+
+   vLLM <../components/backends/vllm/README>
+   SGLang <../components/backends/sglang/README>
+   TensorRT-LLM <../components/backends/trtllm/README>
diff --git a/docs/_sections/examples.rst b/docs/_sections/examples.rst
@@ -0,0 +1,8 @@
+..
+    Quickstart Page (left sidebar target)
+..
+
+Examples
+========
+
+.. include:: ../_includes/dive_in_examples.rst
diff --git a/docs/_sections/installation.rst b/docs/_sections/installation.rst
@@ -0,0 +1,10 @@
+..
+    Installation Page (left sidebar target)
+..
+
+Installation
+============
+
+.. include:: ../_includes/install.rst
+
+
diff --git a/docs/architecture/kvbm_intro.rst b/docs/architecture/kvbm_intro.rst
@@ -48,9 +48,6 @@ The Dynamo KV Block Manager serves as a reference implementation that emphasizes
    * -
      - ❌
      - SGLang
-   * -
-     - ❌
-     - llama.cpp
    * - **Serving Type**
      - ✅
      - Aggregated
@@ -61,7 +58,9 @@ The Dynamo KV Block Manager serves as a reference implementation that emphasizes
 .. toctree::
    :hidden:
 
+   Overview <self>
    Motivation <kvbm_motivation.md>
    KVBM Architecture <kvbm_architecture.md>
    Understanding KVBM components <kvbm_components.md>
    KVBM Further Reading <kvbm_reading>
+   LMCache Integration <../components/backends/vllm/LMCache_Integration.md>
diff --git a/docs/architecture/planner_intro.rst b/docs/architecture/planner_intro.rst
@@ -49,9 +49,6 @@ Key features include:
    * -
      - ❌
      - SGLang
-   * -
-     - ❌
-     - llama.cpp
    * - **Serving Type**
      - ✅
      - Aggregated
@@ -73,6 +70,7 @@ Key features include:
 .. toctree::
    :hidden:
 
+   Overview <self>
    Pre-Deployment Profiling <pre_deployment_profiling.md>
-   Load-based Planner <load_planner.md>
-   SLA-based Planner <sla_planner.md>
+   SLA-based Planner <sla_planner.md>
+   Planner Benchmark <../guides/planner_benchmark/README.md>
diff --git a/docs/architecture/pre_deployment_profiling.md b/docs/architecture/pre_deployment_profiling.md
@@ -96,7 +96,7 @@ Use the default pre-built image and inject custom configurations via PVC:
 
 1. **Set the container image:**
    ```bash
-   export DOCKER_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.0 # or any existing image tag
+   export DOCKER_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.1 # or any existing image tag
    ```
 
 2. **Inject your custom disagg configuration:**

diff --git a/docs/components/backends/llm/README.md b/docs/components/backends/llm/README.md
diff --git a/docs/components/backends/sglang/README.md b/docs/components/backends/sglang/README.md
@@ -0,0 +1 @@
+../../../../components/backends/sglang/README.md
diff --git a/docs/components/backends/trtllm/multinode/multinode-examples.md b/docs/components/backends/trtllm/multinode/multinode-examples.md
@@ -0,0 +1 @@
+../../../../../components/backends/trtllm/multinode/multinode-examples.md
diff --git a/docs/components/backends/vllm/LMCache_Integration.md b/docs/components/backends/vllm/LMCache_Integration.md
@@ -0,0 +1 @@
+../../../../components/backends/vllm/LMCache_Integration.md
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		../../../../components/backends/sglang/README.md
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		../../../../../components/backends/trtllm/multinode/multinode-examples.md
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		../../../../components/backends/vllm/LMCache_Integration.md