docs: Consolidate documentation and fix redundant headings

This commit consolidates and improves the documentation structure based on tech writer feedback: **New Documentation:** - Added Grove advanced Kubernetes scheduling guide - Added comprehensive K8s metrics setup guide with Prometheus/Grafana **Heading Fixes:** - Fixed redundant headings that would appear redundant in Sphinx breadcrumbs - Changed "Architecture" → "Design" in SLA planner docs - Changed "Core Components" → "Core Services" to avoid repetition - Removed duplicate H1 headings in component docs **Quick Start Disambiguation:** - "Quick Start" → "SGLang Quick Start" in SGLang README - "Quick Start" → "TensorRT-LLM Quick Start" in TensorRT-LLM README - "Quick Start" → "vLLM Quick Start" in vLLM README - "Quick Start" → "KV Router Quick Start" in router docs - "Quick Start" → "Pre-deployment Steps" in Fluid caching guide **Platform Naming:** - Updated references to use consistent "Dynamo Kubernetes Platform" naming 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
ai-dynamo · rmccorm4 · Aug 19, 2025 · Aug 19, 2025 · Aug 19, 2025 · Aug 19, 2025
commit 359b4aeeb5b8ba7eaf4d8a032be59ed5a3327857
diff --git a/components/README.md b/components/README.md
@@ -29,7 +29,7 @@ Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and Te
 
 Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.
 
-## Core Components
+## Core Services
 
 ### [Backends](backends/)
 

diff --git a/components/backends/sglang/README.md b/components/backends/sglang/README.md
@@ -50,7 +50,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | **GB200 Support**   | ✅     |                                                              |
 
 
-## Quick Start
+## SGLang Quick Start
 
 Below we provide a guide that lets you run all of our common deployment patterns on a single node.
 

diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -66,7 +66,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | **DP Rank Routing**| ✅           |                                                                 |
 | **GB200 Support**  | ✅           |                                                                 |
 
-## Quick Start
+## TensorRT-LLM Quick Start
 
 Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
 

diff --git a/components/backends/vllm/README.md b/components/backends/vllm/README.md
@@ -51,7 +51,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | **DP Rank Routing**| ✅   | Supported via external control of DP ranks |
 | **GB200 Support**  | 🚧   | Container functional on main |
 
-## Quick Start
+## vLLM Quick Start
 
 Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
 

diff --git a/docs/architecture/architecture.md b/docs/architecture/architecture.md
@@ -48,7 +48,7 @@ There are multi-faceted challenges:
 
 To address the growing demands of distributed inference serving, NVIDIA introduces Dynamo. This innovative product tackles key challenges in scheduling, memory management, and data transfer. Dynamo employs KV-aware routing for optimized decoding, leveraging existing KV caches. For efficient global memory management at scale, it strategically stores and evicts KV caches across multiple memory tiers—GPU, CPU, SSD, and object storage—enhancing both time-to-first-token and overall throughput. Dynamo features NIXL (NVIDIA Inference tranXfer Library), a new data transfer engine designed for dynamic scaling and low-latency storage access.
 
-## High level architecture and key benefits
+## Key benefits
 
 The following diagram outlines Dynamo's high-level architecture. To enable large-scale distributed and disaggregated inference serving, Dynamo includes five key features:
 

diff --git a/docs/architecture/sla_planner.md b/docs/architecture/sla_planner.md
@@ -17,7 +17,7 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy
 * **Performance interpolation**: Leverages profiling results data from pre-deployment profiling for accurate scaling decisions
 * **Correction factors**: Adapts to real-world performance deviations from profiled data
 
-## Architecture
+## Design
 
 The SLA planner consists of several key components:
 
@@ -108,7 +108,7 @@ Finally, SLA planner applies the change by scaling up/down the number of prefill
 
 For detailed deployment instructions including setup, configuration, troubleshooting, and architecture overview, see the [SLA Planner Deployment Guide](../guides/dynamo_deploy/sla_planner_deployment.md).
 
-**Quick Start:**
+**To deploy SLA Planner:**
 ```bash
 cd components/backends/vllm/deploy
 kubectl apply -f disagg_planner.yaml -n {$NAMESPACE}

diff --git a/docs/components/router/README.md b/docs/components/router/README.md
@@ -9,7 +9,7 @@ SPDX-License-Identifier: Apache-2.0
 
 The Dynamo KV Router intelligently routes requests by evaluating their computational costs across different workers. It considers both decoding costs (from active blocks) and prefill costs (from newly computed blocks). Optimizing the KV Router is critical for achieving maximum throughput and minimum latency in distributed inference setups.
 
-## Quick Start
+## KV Router Quick Start
 
 To launch the Dynamo frontend with the KV Router:
 

diff --git a/docs/guides/dynamo_deploy/grove.md b/docs/guides/dynamo_deploy/grove.md
@@ -0,0 +1,171 @@
+# Grove: Advanced Kubernetes Scheduling
+
+Grove is an advanced Kubernetes scheduler and batch workload manager built on top of the Dynamo Kubernetes Platform. It enables sophisticated scheduling policies for multi-node GPU workloads, with special support for large-scale LLM inference deployments.
+
+## Overview
+
+Grove extends Kubernetes' default scheduling capabilities with:
+- **Gang scheduling**: Ensures all pods in a workload start together or not at all
+- **Topology-aware placement**: Optimizes pod placement based on network topology
+- **Resource-aware scheduling**: Makes intelligent decisions based on GPU memory, compute capacity, and network bandwidth
+- **Priority-based queueing**: Manages workload priorities and preemption policies
+
+## Key Features
+
+### PodGangSet
+PodGangSet is Grove's primary scheduling primitive that groups related pods that must be scheduled together.
+
+```yaml
+apiVersion: grove.dynamo.ai/v1
+kind: PodGangSet
+metadata:
+  name: llm-inference-gang
+  namespace: default
+spec:
+  template:
+    spec:
+      containers:
+      - name: worker
+        image: dynamo/worker:latest
+        resources:
+          requests:
+            nvidia.com/gpu: 1
+  replicas: 8
+  minAvailable: 8  # All pods must be schedulable
+  scheduling:
+    nodeAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        nodeSelectorTerms:
+        - matchExpressions:
+          - key: node-type
+            operator: In
+            values: ["gpu-compute"]
+```
+
+### PodClique
+PodClique provides fine-grained control over pod co-location and anti-affinity rules within a gang.
+
+```yaml
+apiVersion: grove.dynamo.ai/v1
+kind: PodClique
+metadata:
+  name: prefill-decode-clique
+spec:
+  selector:
+    matchLabels:
+      app: dynamo-worker
+  topology:
+    # Prefer pods to be co-located on the same rack
+    preferredDuringSchedulingIgnoredDuringExecution:
+    - weight: 100
+      podAffinityTerm:
+        labelSelector:
+          matchLabels:
+            component: prefill
+        topologyKey: topology.kubernetes.io/rack
+```
+
+## Deployment
+
+### Prerequisites
+- Kubernetes cluster with GPU nodes
+- NVIDIA GPU Operator installed
+- Node topology labels configured
+
+### Install Grove Scheduler
+
+```bash
+# Install Grove CRDs and scheduler
+kubectl apply -f https://github.com/ai-dynamo/grove/releases/latest/download/grove-crds.yaml
+kubectl apply -f https://github.com/ai-dynamo/grove/releases/latest/download/grove-scheduler.yaml
+```
+
+### Configure Node Topology
+
+Label your nodes with topology information:
+
+```bash
+# Label nodes with rack information
+kubectl label node gpu-node-01 topology.kubernetes.io/rack=rack-1
+kubectl label node gpu-node-02 topology.kubernetes.io/rack=rack-1
+kubectl label node gpu-node-03 topology.kubernetes.io/rack=rack-2
+
+# Label nodes with GPU types
+kubectl label node gpu-node-01 accelerator=h100
+kubectl label node gpu-node-02 accelerator=h100
+kubectl label node gpu-node-03 accelerator=a100
+```
+
+## Integration with Dynamo
+
+Grove integrates seamlessly with Dynamo's disaggregated serving architecture:
+
+### Multi-Node Prefill/Decode Scheduling
+
+```yaml
+apiVersion: grove.dynamo.ai/v1
+kind: PodGangSet
+metadata:
+  name: dynamo-multinode-serving
+spec:
+  template:
+    metadata:
+      labels:
+        app: dynamo-worker
+    spec:
+      schedulerName: grove-scheduler
+      containers:
+      - name: dynamo-worker
+        image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:latest
+        env:
+        - name: WORKER_TYPE
+          value: "prefill"  # or "decode"
+  replicas: 16
+  minAvailable: 16
+  scheduling:
+    # Ensure all workers can communicate efficiently
+    nodeAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        nodeSelectorTerms:
+        - matchExpressions:
+          - key: network-tier
+            operator: In
+            values: ["high-bandwidth"]
+```
+
+## Best Practices
+
+### Resource Planning
+- Use `minAvailable: replicas` for strict gang scheduling
+- Set appropriate resource requests and limits
+- Consider network bandwidth requirements for multi-node workloads
+
+### Topology Awareness
+- Label nodes with rack, zone, and network topology information
+- Use PodClique for fine-grained placement control
+- Test different affinity rules to optimize for your workload
+
+### Monitoring
+Grove provides metrics for scheduling decisions:
+
+```bash
+# View Grove scheduler metrics
+kubectl port-forward -n grove-system svc/grove-scheduler-metrics 8080:8080
+curl localhost:8080/metrics | grep grove_
+```
+
+## Troubleshooting
+
+### Common Issues
+
+**Pods stuck in Pending state:**
+- Check if sufficient resources are available across required nodes
+- Verify node labels match gang affinity requirements
+- Review Grove scheduler logs: `kubectl logs -n grove-system deployment/grove-scheduler`
+
+**Gang scheduling not working:**
+- Ensure `schedulerName: grove-scheduler` is set in pod specs
+- Verify PodGangSet controller is running
+- Check for resource conflicts with other scheduled workloads
+
+For more detailed troubleshooting, see the [Grove Documentation](https://grove.dynamo.ai/docs).