diff --git a/README.md b/README.md index bca0086e21..9817a78dcf 100644 --- a/README.md +++ b/README.md @@ -148,7 +148,7 @@ Rerun with `curl -N` and change `stream` in the request to `true` to get the res ### Deploying Dynamo -- Follow the [Quickstart Guide](docs/guides/dynamo_deploy/README.md) to deploy on Kubernetes. +- Follow the [Quickstart Guide](docs/kubernetes/README.md) to deploy on Kubernetes. - Check out [Backends](components/backends) to deploy various workflow configurations (e.g. SGLang with router, vLLM with disaggregated serving, etc.) - Run some [Examples](examples) to learn about building components in Dynamo and exploring various integrations. diff --git a/components/backends/sglang/deploy/README.md b/components/backends/sglang/deploy/README.md index 0b25049edd..6afe558292 100644 --- a/components/backends/sglang/deploy/README.md +++ b/components/backends/sglang/deploy/README.md @@ -74,7 +74,7 @@ extraPodSpec: Before using these templates, ensure you have: -1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../../../docs/guides/dynamo_deploy/installation_guide.md) +1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../../../docs/kubernetes/installation_guide.md) 2. **Kubernetes cluster with GPU support** 3. **Container registry access** for SGLang runtime images 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) @@ -144,9 +144,9 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you ## Further Reading -- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/guides/dynamo_deploy/create_deployment.md) -- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/README.md) -- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/guides/dynamo_deploy/installation_guide.md) +- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md) +- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md) +- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md) - **Examples**: [Deployment Examples](../../../../docs/examples/README.md) - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) @@ -159,4 +159,4 @@ Common issues and solutions: 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds` 4. **Out of memory**: Increase memory limits or reduce model batch size -For additional support, refer to the [deployment guide](../../../../docs/guides/dynamo_deploy/README.md). +For additional support, refer to the [deployment guide](../../../../docs/kubernetes/README.md). diff --git a/components/backends/trtllm/deploy/README.md b/components/backends/trtllm/deploy/README.md index 9ca3c16cbc..f25771a732 100644 --- a/components/backends/trtllm/deploy/README.md +++ b/components/backends/trtllm/deploy/README.md @@ -102,7 +102,7 @@ extraPodSpec: Before using these templates, ensure you have: -1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/guides/dynamo_deploy/README.md) +1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/kubernetes/README.md) 2. **Kubernetes cluster with GPU support** 3. **Container registry access** for TensorRT-LLM runtime images 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) @@ -153,7 +153,7 @@ args: ### 3. Deploy -See the [Create Deployment Guide](../../../../docs/guides/dynamo_deploy/create_deployment.md) to learn how to deploy the deployment file. +See the [Create Deployment Guide](../../../../docs/kubernetes/create_deployment.md) to learn how to deploy the deployment file. First, create a secret for the HuggingFace token. ```bash @@ -277,9 +277,9 @@ Configure the `model` name and `host` based on your deployment. ## Further Reading -- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/guides/dynamo_deploy/create_deployment.md) -- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/README.md) -- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/guides/dynamo_deploy/installation_guide.md) +- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md) +- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md) +- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md) - **Examples**: [Deployment Examples](../../../../docs/examples/README.md) - **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md) - **Multinode Deployment**: [Multinode Examples](../multinode/multinode-examples.md) @@ -298,4 +298,4 @@ Common issues and solutions: 6. **Git LFS issues**: Ensure git-lfs is installed before building containers 7. **ARM deployment**: Use `--platform linux/arm64` when building on ARM machines -For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/README.md). +For additional support, refer to the [deployment troubleshooting guide](../../../../docs/kubernetes/README.md). diff --git a/components/backends/vllm/deploy/README.md b/components/backends/vllm/deploy/README.md index e75339abe4..d8bc7027c7 100644 --- a/components/backends/vllm/deploy/README.md +++ b/components/backends/vllm/deploy/README.md @@ -82,7 +82,7 @@ extraPodSpec: Before using these templates, ensure you have: -1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/guides/dynamo_deploy/README.md) +1. **Dynamo Cloud Platform installed** - See [Quickstart Guide](../../../../docs/kubernetes/README.md) 2. **Kubernetes cluster with GPU support** 3. **Container registry access** for vLLM runtime images 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) @@ -234,10 +234,10 @@ args: ## Further Reading -- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/guides/dynamo_deploy/create_deployment.md) -- **Quickstart**: [Deployment Quickstart](../../../../docs/guides/dynamo_deploy/README.md) -- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/guides/dynamo_deploy/installation_guide.md) -- **SLA Planner**: [SLA Planner Deployment Guide](../../../../docs/guides/dynamo_deploy/sla_planner_deployment.md) +- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md) +- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md) +- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md) +- **SLA Planner**: [SLA Planner Deployment Guide](../../../../docs/kubernetes/sla_planner_deployment.md) - **Examples**: [Deployment Examples](../../../../docs/examples/README.md) - **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md) @@ -251,4 +251,4 @@ Common issues and solutions: 4. **Out of memory**: Increase memory limits or reduce model batch size 5. **Port forwarding issues**: Ensure correct pod UUID in port-forward command -For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/README.md). +For additional support, refer to the [deployment troubleshooting guide](../../../../docs/kubernetes/README.md). diff --git a/deploy/cloud/helm/crds/README.md b/deploy/cloud/helm/crds/README.md index d030e573fa..5c19c5c702 100644 --- a/deploy/cloud/helm/crds/README.md +++ b/deploy/cloud/helm/crds/README.md @@ -17,4 +17,4 @@ limitations under the License. # Dynamo Kubernetes Platform CRDs Helm Chart -This chart installs the [CRDs](../../../../docs/guides/dynamo_deploy/api_reference.md) for the Dynamo Kubernetes Platform. \ No newline at end of file +This chart installs the [CRDs](../../../../docs/kubernetes/api_reference.md) for the Dynamo Kubernetes Platform. \ No newline at end of file diff --git a/deploy/cloud/helm/platform/README.md b/deploy/cloud/helm/platform/README.md index 83fd2c8976..9289c35246 100644 --- a/deploy/cloud/helm/platform/README.md +++ b/deploy/cloud/helm/platform/README.md @@ -103,7 +103,7 @@ For detailed etcd configuration options beyond `etcd.enabled`, please refer to t ## 📚 Additional Resources -- [Dynamo Cloud Deployment Installation Guide](../../../../docs/guides/dynamo_deploy/installation_guide.md) +- [Dynamo Cloud Deployment Installation Guide](../../../../docs/kubernetes/installation_guide.md) - [NATS Documentation](https://docs.nats.io/) - [etcd Documentation](https://etcd.io/docs/) - [Kubernetes Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) diff --git a/deploy/cloud/helm/platform/README.md.gotmpl b/deploy/cloud/helm/platform/README.md.gotmpl index 252c01ab46..93e69facf3 100644 --- a/deploy/cloud/helm/platform/README.md.gotmpl +++ b/deploy/cloud/helm/platform/README.md.gotmpl @@ -57,7 +57,7 @@ For detailed etcd configuration options beyond `etcd.enabled`, please refer to t ## 📚 Additional Resources -- [Dynamo Cloud Deployment Installation Guide](../../../../docs/guides/dynamo_deploy/installation_guide.md) +- [Dynamo Cloud Deployment Installation Guide](../../../../docs/kubernetes/installation_guide.md) - [NATS Documentation](https://docs.nats.io/) - [etcd Documentation](https://etcd.io/docs/) - [Kubernetes Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) diff --git a/deploy/cloud/operator/Makefile b/deploy/cloud/operator/Makefile index 759a1b4a8b..20f9f28b56 100644 --- a/deploy/cloud/operator/Makefile +++ b/deploy/cloud/operator/Makefile @@ -288,7 +288,7 @@ generate-api-docs: crd-ref-docs ## Generate API reference documentation from CRD --output-path=./docs/api_reference.md @echo "✅ Generated API reference at ./docs/api_reference.md" # concatenate header.md and api_reference.md - cat docs/header.md ./docs/api_reference.md > ../../../docs/guides/dynamo_deploy/api_reference.md + cat docs/header.md ./docs/api_reference.md > ../../../docs/kubernetes/api_reference.md rm ./docs/api_reference.md @echo "✅ Concatenated header.md and api_reference.md" diff --git a/deploy/cloud/operator/README.md b/deploy/cloud/operator/README.md index 62c2959ee6..ea2517dd03 100644 --- a/deploy/cloud/operator/README.md +++ b/deploy/cloud/operator/README.md @@ -24,4 +24,4 @@ make ### Install -See [Dynamo Kubernetes Platform Installation Guide](/docs/guides/dynamo_deploy/installation_guide.md) for installation instructions. +See [Dynamo Kubernetes Platform Installation Guide](/docs/kubernetes/installation_guide.md) for installation instructions. diff --git a/deploy/inference-gateway/README.md b/deploy/inference-gateway/README.md index f63d974a85..3c2685021a 100644 --- a/deploy/inference-gateway/README.md +++ b/deploy/inference-gateway/README.md @@ -24,7 +24,7 @@ Currently, these setups are only supported with the kGateway based Inference Gat ### 1. Install Dynamo Platform ### -[See Quickstart Guide](../../docs/guides/dynamo_deploy/README.md) to install Dynamo Cloud. +[See Quickstart Guide](../../docs/kubernetes/README.md) to install Dynamo Cloud. ### 2. Deploy Inference Gateway ### diff --git a/deploy/logging/README.md b/deploy/logging/README.md index 99ce31717c..6bed61da48 100644 --- a/deploy/logging/README.md +++ b/deploy/logging/README.md @@ -1,3 +1,3 @@ # Dynamo Logging on Kubernetes -For detailed documentation on collecting and visualizing logs on Kubernetes, see [docs/guides/dynamo_deploy/logging.md](../../docs/guides/dynamo_deploy/logging.md). +For detailed documentation on collecting and visualizing logs on Kubernetes, see [docs/kubernetes/logging.md](../../docs/kubernetes/logging.md). diff --git a/deploy/metrics/k8s/README.md b/deploy/metrics/k8s/README.md index ae13722eb5..d4ac85c0b6 100644 --- a/deploy/metrics/k8s/README.md +++ b/deploy/metrics/k8s/README.md @@ -1,3 +1,3 @@ # Dynamo Metrics Collection on Kubernetes -For detailed documentation on collecting and visualizing metrics on Kubernetes, see [docs/guides/dynamo_deploy/metrics.md](../../../docs/guides/dynamo_deploy/metrics.md). +For detailed documentation on collecting and visualizing metrics on Kubernetes, see [docs/kubernetes/metrics.md](../../../docs/kubernetes/metrics.md). diff --git a/deploy/utils/README.md b/deploy/utils/README.md index 2d8097ff56..26b832c694 100644 --- a/deploy/utils/README.md +++ b/deploy/utils/README.md @@ -6,7 +6,7 @@ This directory contains utilities and manifests for Dynamo benchmarking and prof **Before using these utilities, you must first set up Dynamo Cloud following the main installation guide:** -👉 **[Follow the Dynamo Cloud installation guide](/docs/guides/dynamo_deploy/installation_guide.md) to install the Dynamo Kubernetes Platform first.** +👉 **[Follow the Dynamo Cloud installation guide](/docs/kubernetes/installation_guide.md) to install the Dynamo Kubernetes Platform first.** This includes: 1. Installing the Dynamo CRDs diff --git a/deploy/utils/setup_benchmarking_resources.sh b/deploy/utils/setup_benchmarking_resources.sh index 5e07ff8d13..0d89629ae0 100755 --- a/deploy/utils/setup_benchmarking_resources.sh +++ b/deploy/utils/setup_benchmarking_resources.sh @@ -56,7 +56,7 @@ fi if ! kubectl get pods -n "$NAMESPACE" | grep -q "dynamo-platform"; then warn "Dynamo platform pods not found in namespace $NAMESPACE" warn "Please ensure Dynamo Cloud platform is installed first:" - warn " See: docs/guides/dynamo_deploy/installation_guide.md" + warn " See: docs/kubernetes/installation_guide.md" if [[ -z "${FORCE:-}" && -z "${YES:-}" ]]; then read -p "Continue anyway? [y/N]: " -r ans [[ "$ans" =~ ^[Yy]$ ]] || exit 1 diff --git a/docs/architecture/sla_planner.md b/docs/architecture/sla_planner.md index 68d48a742e..e5f4be94ce 100644 --- a/docs/architecture/sla_planner.md +++ b/docs/architecture/sla_planner.md @@ -110,7 +110,7 @@ Finally, SLA planner applies the change by scaling up/down the number of prefill ### K8s Deployment -For detailed deployment instructions including setup, configuration, troubleshooting, and architecture overview, see the [SLA Planner Deployment Guide](../guides/dynamo_deploy/sla_planner_deployment.md). +For detailed deployment instructions including setup, configuration, troubleshooting, and architecture overview, see the [SLA Planner Deployment Guide](../kubernetes/sla_planner_deployment.md). **To deploy SLA Planner:** ```bash diff --git a/docs/benchmarks/benchmarking.md b/docs/benchmarks/benchmarking.md index 3f92ff5ddf..1107905beb 100644 --- a/docs/benchmarks/benchmarking.md +++ b/docs/benchmarks/benchmarking.md @@ -56,7 +56,7 @@ The framework is a Python-based wrapper around `genai-perf` that: Follow these steps to benchmark Dynamo deployments: ### Step 1: Establish Kubernetes Cluster and Install Dynamo -Set up your Kubernetes cluster with NVIDIA GPUs and install the Dynamo Cloud platform. First follow the [installation guide](/docs/guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources. +Set up your Kubernetes cluster with NVIDIA GPUs and install the Dynamo Cloud platform. First follow the [installation guide](/docs/kubernetes/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources. ### Step 2: Deploy DynamoGraphDeployments Deploy your DynamoGraphDeployments separately using the [deployment documentation](../../components/backends/). Each deployment should have a frontend service exposed. diff --git a/docs/benchmarks/pre_deployment_profiling.md b/docs/benchmarks/pre_deployment_profiling.md index f2378ebf9b..4f1c6de18d 100644 --- a/docs/benchmarks/pre_deployment_profiling.md +++ b/docs/benchmarks/pre_deployment_profiling.md @@ -89,7 +89,7 @@ SLA planner can work with any interpolation data that follows the above format. ## Running the Profiling Script in Kubernetes -Set up your Kubernetes namespace for profiling (one-time per namespace). First ensure Dynamo Cloud platform is installed by following the [main installation guide](../../deploy/README.md), then set up profiling resources using [deploy/utils/README](../../deploy/utils/README.md). If your namespace is already set up, skip this step. +Set up your Kubernetes namespace for profiling (one-time per namespace). First ensure Dynamo Cloud platform is installed by following the [main installation guide](/docs/kubernetes/installation_guide.md), then set up profiling resources using [deploy/utils/README](/deploy/utils/README.md). If your namespace is already set up, skip this step. **Prerequisites**: Ensure all dependencies are installed. If you ran the setup script above, dependencies are already installed. Otherwise, install them manually: ```bash diff --git a/docs/guides/logging.md b/docs/guides/logging.md index 9fc8d97268..b903937d31 100644 --- a/docs/guides/logging.md +++ b/docs/guides/logging.md @@ -146,4 +146,4 @@ curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages": - [Distributed Runtime Architecture](../architecture/distributed_runtime.md) - [Dynamo Architecture Overview](../architecture/architecture.md) - [Backend Guide](backend.md) -- [Log Aggregation in Kubernetes](dynamo_deploy/logging.md) +- [Log Aggregation in Kubernetes](../kubernetes/logging.md) diff --git a/docs/guides/metrics.md b/docs/guides/metrics.md index c0499b0bf6..c2bc00b874 100644 --- a/docs/guides/metrics.md +++ b/docs/guides/metrics.md @@ -31,7 +31,7 @@ Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also a **Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for details on specialized component metrics. -**Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](dynamo_deploy/metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana. +**Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](../kubernetes/metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana. ## Metrics Hierarchy diff --git a/docs/hidden_toctree.rst b/docs/hidden_toctree.rst index 41ac9f85ab..5ca984148c 100644 --- a/docs/hidden_toctree.rst +++ b/docs/hidden_toctree.rst @@ -24,16 +24,16 @@ API/nixl_connect/write_operation.md API/nixl_connect/README.md - guides/dynamo_deploy/api_reference.md - guides/dynamo_deploy/create_deployment.md - - guides/dynamo_deploy/fluxcd.md - guides/dynamo_deploy/gke_setup.md - guides/dynamo_deploy/grove.md - guides/dynamo_deploy/model_caching_with_fluid.md - guides/dynamo_deploy/README.md + kubernetes/api_reference.md + kubernetes/create_deployment.md + + kubernetes/fluxcd.md + kubernetes/gke_setup.md + kubernetes/grove.md + kubernetes/model_caching_with_fluid.md + kubernetes/README.md guides/dynamo_run.md - guides/dynamo_deploy/sla_planner_deployment.md + kubernetes/sla_planner_deployment.md guides/metrics.md guides/run_kvbm_in_vllm.md guides/run_kvbm_in_trtllm.md diff --git a/docs/index.rst b/docs/index.rst index 8fae79b2aa..efeeda07ac 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -50,13 +50,13 @@ Quickstart :hidden: :caption: Kubernetes Deployment - Quickstart (K8s) <../guides/dynamo_deploy/README.md> - Detailed Installation Guide <../guides/dynamo_deploy/installation_guide.md> - Dynamo Operator <../guides/dynamo_deploy/dynamo_operator.md> - Metrics <../guides/dynamo_deploy/metrics.md> - Logging <../guides/dynamo_deploy/logging.md> - Multinode <../guides/dynamo_deploy/multinode-deployment.md> - Minikube Setup <../guides/dynamo_deploy/minikube.md> + Quickstart (K8s) <../kubernetes/README.md> + Detailed Installation Guide <../kubernetes/installation_guide.md> + Dynamo Operator <../kubernetes/dynamo_operator.md> + Metrics <../kubernetes/metrics.md> + Logging <../kubernetes/logging.md> + Multinode <../kubernetes/multinode-deployment.md> + Minikube Setup <../kubernetes/minikube.md> .. toctree:: :hidden: diff --git a/docs/guides/dynamo_deploy/README.md b/docs/kubernetes/README.md similarity index 78% rename from docs/guides/dynamo_deploy/README.md rename to docs/kubernetes/README.md index c016e5207b..22ff95675c 100644 --- a/docs/guides/dynamo_deploy/README.md +++ b/docs/kubernetes/README.md @@ -31,12 +31,11 @@ helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${REL helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default # 3. Install Platform -kubectl create namespace ${NAMESPACE} helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz -helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} +helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace ``` -For more details or customization options, see **[Installation Guide for Dynamo Kubernetes Platform](/docs/guides/dynamo_deploy/installation_guide.md)**. +For more details or customization options (including multinode deployments), see **[Installation Guide for Dynamo Kubernetes Platform](/docs/kubernetes/installation_guide.md)**. ## 2. Choose Your Backend @@ -44,9 +43,9 @@ Each backend has deployment examples and configuration options: | Backend | Available Configurations | |---------|--------------------------| -| **[vLLM](/components/backends/vllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated + Planner | +| **[vLLM](/components/backends/vllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated + Planner, Disaggregated Multi-node | | **[SGLang](/components/backends/sglang/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Planner, Disaggregated Multi-node | -| **[TensorRT-LLM](/components/backends/trtllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router | +| **[TensorRT-LLM](/components/backends/trtllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated Multi-node | ## 3. Deploy Your First Model @@ -73,15 +72,15 @@ It's a Kubernetes Custom Resource that defines your inference pipeline: - Scaling policies - Frontend/backend connections -The scripts in the `components//launch` folder like `agg.sh` demonstrate how you can serve your models locally. The corresponding YAML files like `agg.yaml` show you how you could create a kubernetes deployment for your inference graph. +Refer to the [API Reference and Documentation](/docs/kubernetes/api_reference.md) for more details. ## 📖 API Reference & Documentation For detailed technical specifications of Dynamo's Kubernetes resources: -- **[API Reference](/docs/guides/dynamo_deploy/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment` -- **[Operator Guide](/docs/guides/dynamo_deploy/dynamo_operator.md)** - Dynamo operator configuration and management -- **[Create Deployment](/docs/guides/dynamo_deploy/create_deployment.md)** - Step-by-step deployment creation examples +- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment` +- **[Operator Guide](/docs/kubernetes/dynamo_operator.md)** - Dynamo operator configuration and management +- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation examples ### Choosing Your Architecture Pattern @@ -165,7 +164,12 @@ Key customization points include: ## Additional Resources - **[Examples](/examples/README.md)** - Complete working examples -- **[Create Custom Deployments](/docs/guides/dynamo_deploy/create_deployment.md)** - Build your own CRDs -- **[Operator Documentation](/docs/guides/dynamo_deploy/dynamo_operator.md)** - How the platform works +- **[Create Custom Deployments](/docs/kubernetes/create_deployment.md)** - Build your own CRDs +- **[Operator Documentation](/docs/kubernetes/dynamo_operator.md)** - How the platform works - **[Helm Charts](/deploy/helm/README.md)** - For advanced users -- **[GitOps Deployment with FluxCD](/docs/guides/dynamo_deploy/fluxcd.md)** - For advanced users \ No newline at end of file +- **[GitOps Deployment with FluxCD](/docs/kubernetes/fluxcd.md)** - For advanced users +- **[Logging](/docs/kubernetes/logging.md)** - For logging setup +- **[Multinode Deployment](/docs/kubernetes/multinode-deployment.md)** - For multinode deployment +- **[Grove](/docs/kubernetes/grove.md)** - For grove details and custom installation +- **[Monitoring](/docs/kubernetes/metrics.md)** - For monitoring setup +- **[Model Caching with Fluid](/docs/kubernetes/model_caching_with_fluid.md)** - For model caching with Fluid \ No newline at end of file diff --git a/docs/guides/dynamo_deploy/api_reference.md b/docs/kubernetes/api_reference.md similarity index 100% rename from docs/guides/dynamo_deploy/api_reference.md rename to docs/kubernetes/api_reference.md diff --git a/docs/guides/dynamo_deploy/create_deployment.md b/docs/kubernetes/create_deployment.md similarity index 90% rename from docs/guides/dynamo_deploy/create_deployment.md rename to docs/kubernetes/create_deployment.md index 99a446df78..c49ec9eabc 100644 --- a/docs/guides/dynamo_deploy/create_deployment.md +++ b/docs/kubernetes/create_deployment.md @@ -13,13 +13,13 @@ Select the architecture pattern as your template that best fits your use case. For example, when using the `VLLM` inference backend: - **Development / Testing** - Use [`agg.yaml`](../../../components/backends/vllm/deploy/agg.yaml) as the base configuration. + Use [`agg.yaml`](/components/backends/vllm/deploy/agg.yaml) as the base configuration. - **Production with Load Balancing** - Use [`agg_router.yaml`](../../../components/backends/vllm/deploy/agg_router.yaml) to enable scalable, load-balanced inference. + Use [`agg_router.yaml`](/components/backends/vllm/deploy/agg_router.yaml) to enable scalable, load-balanced inference. - **High Performance / Disaggregated Deployment** - Use [`disagg_router.yaml`](../../../components/backends/vllm/deploy/disagg_router.yaml) for maximum throughput and modular scalability. + Use [`disagg_router.yaml`](/components/backends/vllm/deploy/disagg_router.yaml) for maximum throughput and modular scalability. ## Step 2: Customize the Template @@ -90,7 +90,7 @@ Consult the corresponding sh file. Each of the python commands to launch a compo The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]" Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command. -If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command. +If you are a Dynamo contributor the [dynamo run guide](/docs/guides/dynamo_run.md) for details on how to run this command. ## Step 3: Key Customization Points diff --git a/docs/guides/dynamo_deploy/dynamo_operator.md b/docs/kubernetes/dynamo_operator.md similarity index 91% rename from docs/guides/dynamo_deploy/dynamo_operator.md rename to docs/kubernetes/dynamo_operator.md index fd3044aab8..be5c9e5de7 100644 --- a/docs/guides/dynamo_deploy/dynamo_operator.md +++ b/docs/kubernetes/dynamo_operator.md @@ -23,11 +23,11 @@ Dynamo operator is a Kubernetes operator that simplifies the deployment, configu For the complete technical API reference for Dynamo Custom Resource Definitions, see: -**📖 [Dynamo CRD API Reference](/docs/guides/dynamo_deploy/api_reference.md)** +**📖 [Dynamo CRD API Reference](/docs/kubernetes/api_reference.md)** ## Installation -[See installation steps](/docs/guides/dynamo_deploy/installation_guide.md#overview) +[See installation steps](/docs/kubernetes/installation_guide.md#overview) ## Development diff --git a/docs/guides/dynamo_deploy/fluxcd.md b/docs/kubernetes/fluxcd.md similarity index 86% rename from docs/guides/dynamo_deploy/fluxcd.md rename to docs/kubernetes/fluxcd.md index 99edf2df47..e8a04426d4 100644 --- a/docs/guides/dynamo_deploy/fluxcd.md +++ b/docs/kubernetes/fluxcd.md @@ -1,10 +1,10 @@ # GitOps Deployment with FluxCD -This section describes how to use FluxCD for GitOps-based deployment of Dynamo inference graphs. GitOps enables you to manage your Dynamo deployments declaratively using Git as the source of truth. We'll use the [aggregated vLLM example](../../../components/backends/vllm/README.md) to demonstrate the workflow. +This section describes how to use FluxCD for GitOps-based deployment of Dynamo inference graphs. GitOps enables you to manage your Dynamo deployments declaratively using Git as the source of truth. We'll use the [aggregated vLLM example](/components/backends/vllm/README.md) to demonstrate the workflow. ## Prerequisites -- A Kubernetes cluster with [Dynamo Cloud](/docs/guides/dynamo_deploy/installation_guide.md) installed +- A Kubernetes cluster with [Dynamo Cloud](/docs/kubernetes/installation_guide.md) installed - [FluxCD](https://fluxcd.io/flux/installation/) installed in your cluster - A Git repository to store your deployment configurations @@ -18,7 +18,7 @@ The GitOps workflow for Dynamo deployments consists of three main steps: ## Step 1: Build and Push Dynamo Cloud Operator -First, follow to [See Install Dynamo Cloud](/docs/guides/dynamo_deploy/installation_guide.md). +First, follow to [See Install Dynamo Cloud](/docs/kubernetes/installation_guide.md). ## Step 2: Create Initial Deployment diff --git a/docs/guides/dynamo_deploy/gke_setup.md b/docs/kubernetes/gke_setup.md similarity index 100% rename from docs/guides/dynamo_deploy/gke_setup.md rename to docs/kubernetes/gke_setup.md diff --git a/docs/guides/dynamo_deploy/grove.md b/docs/kubernetes/grove.md similarity index 95% rename from docs/guides/dynamo_deploy/grove.md rename to docs/kubernetes/grove.md index bccf50387a..177f19c0b6 100644 --- a/docs/guides/dynamo_deploy/grove.md +++ b/docs/kubernetes/grove.md @@ -19,7 +19,7 @@ Grove enables disaggregated serving by breaking down large language model infere Grove implements disaggregated serving through several custom Kubernetes resources that provide declarative composition of role-based pod groups: -### PodGangSet +### PodCliqueSet The top-level Grove object that defines a group of components managed and colocated together. Key features include: - Support for autoscaling - Topology-aware spread of replicas for availability @@ -39,10 +39,10 @@ A set of PodCliques that scale and are scheduled together, ideal for tightly cou Grove provides several specialized features that make it particularly well-suited for disaggregated serving: ### Flexible Gang Scheduling -PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodGangSet to prevent resource deadlocks and ensure all components of a disaggregated system start together. +PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodCliqueSet to prevent resource deadlocks and ensure all components of a disaggregated system start together. ### Multi-level Horizontal Auto-Scaling -Supports pluggable horizontal auto-scaling solutions to scale PodGangSet, PodClique, and PodCliqueScalingGroup custom resources independently based on their specific metrics and requirements. +Supports pluggable horizontal auto-scaling solutions to scale PodCliqueSet, PodClique, and PodCliqueScalingGroup custom resources independently based on their specific metrics and requirements. ### Network Topology-Aware Scheduling Allows specifying network topology pack and spread constraints to optimize for both network performance and service availability, crucial for disaggregated systems where components need efficient inter-node communication. diff --git a/docs/guides/dynamo_deploy/installation_guide.md b/docs/kubernetes/installation_guide.md similarity index 86% rename from docs/guides/dynamo_deploy/installation_guide.md rename to docs/kubernetes/installation_guide.md index 497edaaa33..c581d76b49 100644 --- a/docs/guides/dynamo_deploy/installation_guide.md +++ b/docs/kubernetes/installation_guide.md @@ -21,7 +21,7 @@ Deploy and manage Dynamo inference graphs on Kubernetes with automated orchestra ## Quick Start Paths -Platform is installed using Dynamo Kubernetes Platform [helm chart](../../../deploy/cloud/helm/platform/README.md). +Platform is installed using Dynamo Kubernetes Platform [helm chart](/deploy/cloud/helm/platform/README.md). **Path A: Production Install** Install from published artifacts on your existing cluster → [Jump to Path A](#path-a-production-install) @@ -32,6 +32,20 @@ Set up Minikube first → [Minikube Setup](minikube.md) → Then follow Path A **Path C: Custom Development** Build from source for customization → [Jump to Path C](#path-c-custom-development) +All helm install commands could be overridden by either setting the values.yaml file or by passing in your own values.yaml: + +```bash +helm install ... + -f your-values.yaml +``` + +and/or setting values as flags to the helm install command, as follows: + +```bash +helm install ... + --set "your-value=your-value" +``` + ## Prerequisites ```bash @@ -68,7 +82,9 @@ helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ``` > [!TIP] -> By default, Grove and Kai Scheduler are NOT installed. You can enable them by setting the following flags in the helm install command: +> For multinode deployments, you need to enable Grove and Kai Scheduler. +> You might chose to install them manually or through the dynamo-platform helm install command. +> When using the dynamo-platform helm install command, Grove and Kai Scheduler are NOT installed by default. You can enable their installation by setting the following flags in the helm install command: ```bash --set "grove.enabled=true" @@ -111,7 +127,7 @@ docker build -t $DOCKER_SERVER/dynamo-operator:$IMAGE_TAG . && docker push $DOCK cd - -# 3. Create namespace and secrets to be able to pull the operator image +# 3. Create namespace and secrets to be able to pull the operator image (only needed if you pushed the operator image to a private registry) kubectl create namespace ${NAMESPACE} kubectl create secret docker-registry docker-imagepullsecret \ --docker-server=${DOCKER_SERVER} \ @@ -123,9 +139,8 @@ kubectl create secret docker-registry docker-imagepullsecret \ helm upgrade --install dynamo-crds ./crds/ --namespace default # 5. Install Platform -helm repo add bitnami https://charts.bitnami.com/bitnami helm dep build ./platform/ -helm upgrade --install dynamo-platform ./platform/ \ +helm install dynamo-platform ./platform/ \ --namespace ${NAMESPACE} \ --set dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator \ --set dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG} \ @@ -158,9 +173,9 @@ kubectl get pods -n ${NAMESPACE} ``` 2. **Explore Backend Guides** - - [vLLM Deployments](../../../components/backends/vllm/deploy/README.md) - - [SGLang Deployments](../../../components/backends/sglang/deploy/README.md) - - [TensorRT-LLM Deployments](../../../components/backends/trtllm/deploy/README.md) + - [vLLM Deployments](/components/backends/vllm/deploy/README.md) + - [SGLang Deployments](/components/backends/sglang/deploy/README.md) + - [TensorRT-LLM Deployments](/components/backends/trtllm/deploy/README.md) 3. **Optional:** - [Set up Prometheus & Grafana](metrics.md) @@ -200,7 +215,7 @@ just add the following to the helm install command: ## Advanced Options -- [Helm Chart Configuration](../../../deploy/cloud/helm/platform/README.md) +- [Helm Chart Configuration](/deploy/cloud/helm/platform/README.md) - [GKE-specific setup](gke_setup.md) - [Create custom deployments](create_deployment.md) - [Dynamo Operator details](dynamo_operator.md) diff --git a/docs/guides/dynamo_deploy/logging.md b/docs/kubernetes/logging.md similarity index 100% rename from docs/guides/dynamo_deploy/logging.md rename to docs/kubernetes/logging.md diff --git a/docs/guides/dynamo_deploy/metrics.md b/docs/kubernetes/metrics.md similarity index 97% rename from docs/guides/dynamo_deploy/metrics.md rename to docs/kubernetes/metrics.md index ea1d717118..ef46f17f8a 100644 --- a/docs/guides/dynamo_deploy/metrics.md +++ b/docs/kubernetes/metrics.md @@ -28,7 +28,7 @@ helm install prometheus -n monitoring --create-namespace prometheus-community/ku > The commands enumerated below assume you have installed the kube-prometheus-stack with the installation method listed above. Depending on your installation configuration of the monitoring stack, you may need to modify the `kubectl` commands that follow in this document accordingly (e.g modifying Namespace or Service names accordingly). ### Install Dynamo Operator -Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](../dynamo_deploy/installation_guide.md) for detailed instructions on deploying the Dynamo operator. +Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](/docs/kubernetes/installation_guide.md) for detailed instructions on deploying the Dynamo operator. Make sure to set the `prometheusEndpoint` to the Prometheus endpoint you installed in the previous step. ```bash @@ -64,8 +64,8 @@ This will create two components: - A Worker component exposing metrics on its system port Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about: -- Deployment configuration: See the [vLLM README](../../components/backends/vllm/README.md) -- Available metrics: See the [metrics guide](../metrics.md) +- Deployment configuration: See the [vLLM README](/components/backends/vllm/README.md) +- Available metrics: See the [metrics guide](/docs/guides/metrics.md) ### Validate the Deployment diff --git a/docs/guides/dynamo_deploy/minikube.md b/docs/kubernetes/minikube.md similarity index 100% rename from docs/guides/dynamo_deploy/minikube.md rename to docs/kubernetes/minikube.md diff --git a/docs/guides/dynamo_deploy/model_caching_with_fluid.md b/docs/kubernetes/model_caching_with_fluid.md similarity index 100% rename from docs/guides/dynamo_deploy/model_caching_with_fluid.md rename to docs/kubernetes/model_caching_with_fluid.md diff --git a/docs/guides/dynamo_deploy/multinode-deployment.md b/docs/kubernetes/multinode-deployment.md similarity index 100% rename from docs/guides/dynamo_deploy/multinode-deployment.md rename to docs/kubernetes/multinode-deployment.md diff --git a/docs/guides/dynamo_deploy/sla_planner_deployment.md b/docs/kubernetes/sla_planner_deployment.md similarity index 100% rename from docs/guides/dynamo_deploy/sla_planner_deployment.md rename to docs/kubernetes/sla_planner_deployment.md diff --git a/examples/custom_backend/hello_world/README.md b/examples/custom_backend/hello_world/README.md index 7bcb54b18b..2b5b5f67e0 100644 --- a/examples/custom_backend/hello_world/README.md +++ b/examples/custom_backend/hello_world/README.md @@ -106,7 +106,7 @@ Hello star! Note that this a very simple degenerate example which does not demonstrate the standard Dynamo FrontEnd-Backend deployment. The hello-world client is not a web server, it is a one-off function which sends the predefined text "world,sun,moon,star" to the backend. The example is meant to show the HelloWorldWorker. As such you will only see the HelloWorldWorker pod in deployment. The client will run and exit and the pod will not be operational. -Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/README.md) to install Dynamo Kubernetes Platform. +Follow the [Quickstart Guide](../../../docs/kubernetes/README.md) to install Dynamo Kubernetes Platform. Then deploy to kubernetes using ```bash diff --git a/examples/deployments/AKS/AKS-deployment.md b/examples/deployments/AKS/AKS-deployment.md index dff829e64c..0bfa24ac94 100644 --- a/examples/deployments/AKS/AKS-deployment.md +++ b/examples/deployments/AKS/AKS-deployment.md @@ -90,7 +90,7 @@ git clone https://github.com/ai-dynamo/dynamo.git cd dynamo ``` -2. Install Dynamo from Published Artifacts on NGC (see the [Dynamo Cloud guide](../../../docs/guides/dynamo_deploy/installation_guide.md)): +2. Install Dynamo from Published Artifacts on NGC (see the [Dynamo Cloud guide](../../../docs/kubernetes/installation_guide.md)): ```bash export NAMESPACE=dynamo-cloud export RELEASE_VERSION=0.3.2 @@ -124,7 +124,7 @@ dynamo-platform-nats-0 2/2 Runnin dynamo-platform-nats-box-5dbf45c748-kln82 1/1 Running 0 2m51s ``` -There are other ways to install Dynamo, you can find them [here](../../../docs/guides/dynamo_deploy/installation_guide.md). +There are other ways to install Dynamo, you can find them [here](../../../docs/kubernetes/installation_guide.md). ### Task 4. Deploy a model diff --git a/recipes/README.md b/recipes/README.md index 7fd1b00123..81125d07c6 100644 --- a/recipes/README.md +++ b/recipes/README.md @@ -19,7 +19,7 @@ export NAMESPACE=your-namespace kubectl create namespace ${NAMESPACE} ``` -2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/guides/dynamo_deploy/README.md) +2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md) 3. **Kubernetes cluster with GPU support**