diff --git a/docs/components/backends/llm/README.md b/docs/components/backends/llm/README.md new file mode 120000 index 0000000000..615da9417b --- /dev/null +++ b/docs/components/backends/llm/README.md @@ -0,0 +1 @@ +../../../../components/backends/llm/README.md \ No newline at end of file diff --git a/docs/components/backends/sglang/docs/multinode-examples.md b/docs/components/backends/sglang/docs/multinode-examples.md new file mode 120000 index 0000000000..9929f08b4a --- /dev/null +++ b/docs/components/backends/sglang/docs/multinode-examples.md @@ -0,0 +1 @@ +../../../../../components/backends/sglang/docs/multinode-examples.md \ No newline at end of file diff --git a/docs/components/backends/trtllm/README.md b/docs/components/backends/trtllm/README.md new file mode 120000 index 0000000000..15969304d0 --- /dev/null +++ b/docs/components/backends/trtllm/README.md @@ -0,0 +1 @@ +../../../../components/backends/trtllm/README.md \ No newline at end of file diff --git a/docs/components/backends/vllm/README.md b/docs/components/backends/vllm/README.md new file mode 120000 index 0000000000..ec40eb5e49 --- /dev/null +++ b/docs/components/backends/vllm/README.md @@ -0,0 +1 @@ +../../../../components/backends/vllm/README.md \ No newline at end of file diff --git a/docs/deploy/metrics/docker-compose.yml b/docs/deploy/metrics/docker-compose.yml new file mode 120000 index 0000000000..f7c658ffff --- /dev/null +++ b/docs/deploy/metrics/docker-compose.yml @@ -0,0 +1 @@ +../../../deploy/metrics/docker-compose.yml \ No newline at end of file diff --git a/docs/examples/runtime/hello_world/README.md b/docs/examples/runtime/hello_world/README.md new file mode 120000 index 0000000000..aa7e284f34 --- /dev/null +++ b/docs/examples/runtime/hello_world/README.md @@ -0,0 +1 @@ +../../../../examples/runtime/hello_world/README.md \ No newline at end of file diff --git a/docs/guides/dynamo_deploy/operator_deployment.md b/docs/guides/dynamo_deploy/operator_deployment.md new file mode 120000 index 0000000000..80ca4341ee --- /dev/null +++ b/docs/guides/dynamo_deploy/operator_deployment.md @@ -0,0 +1 @@ +../../../guides/dynamo_deploy/operator_deployment.md \ No newline at end of file diff --git a/docs/guides/dynamo_deploy/quickstart.md b/docs/guides/dynamo_deploy/quickstart.md index ebf2f57058..5639b92f87 100644 --- a/docs/guides/dynamo_deploy/quickstart.md +++ b/docs/guides/dynamo_deploy/quickstart.md @@ -67,7 +67,7 @@ Ensure you have the source code checked out and are in the `dynamo` directory: ### Set Environment Variables -Our examples use the [`nvcr.io`](nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry. +Our examples use the [`nvcr.io`](https://nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry. ```bash export NAMESPACE=dynamo-cloud # or whatever you prefer. diff --git a/docs/index.rst b/docs/index.rst index 7000e786e5..c751f0d819 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -45,26 +45,26 @@ The examples below assume you build the latest image yourself from source. If us :margin: 0 :padding: 3 4 0 0 - .. grid-item-card:: :doc:`Hello World ` - :link: /examples/hello_world + .. grid-item-card:: :doc:`Hello World ` + :link: examples/runtime/hello_world/README :link-type: doc - Demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline. + Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph - .. grid-item-card:: :doc:`LLM Deployment ` - :link: /examples/llm_deployment + .. grid-item-card:: :doc:`LLM Serving with VLLM ` + :link: components/backends/vllm/README :link-type: doc - Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations. + Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM. - .. grid-item-card:: :doc:`Multinode ` - :link: /examples/multinode + .. grid-item-card:: :doc:`Multinode with SGLang ` + :link: components/backends/sglang/docs/multinode-examples :link-type: doc - Demonstrates deployment for disaggregated serving on 3 nodes using `nvidia/Llama-3.1-405B-Instruct-FP8`. + Demonstrates disaggregated serving on several nodes. - .. grid-item-card:: :doc:`TensorRT-LLM ` - :link: /examples/trtllm + .. grid-item-card:: :doc:`TensorRT-LLM ` + :link: components/backends/trtllm/README :link-type: doc Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations. @@ -110,7 +110,7 @@ The examples below assume you build the latest image yourself from source. If us Dynamo Deploy Quickstart Dynamo Cloud Kubernetes Platform - Manual Helm Deployment + Manual Helm Deployment GKE Setup Guide Minikube Setup Guide Model Caching with Fluid @@ -126,22 +126,22 @@ The examples below assume you build the latest image yourself from source. If us :hidden: :caption: API - Python API NIXL Connect API .. toctree:: :hidden: :caption: Examples - Aggregated and Disaggregated Deployment - LLM Deployment Examples - Multinode Examples - LLM Deployment Examples using TensorRT-LLM + Hello World + LLM Deployment Examples using VLLM + Multinode Examples using SGLang + LLM Deployment Examples using TensorRT-LLM .. toctree:: :hidden: :caption: Reference + Glossary KVBM Reading