-
Notifications
You must be signed in to change notification settings - Fork 766
docs: Refactor README.md and add components/README.md #2141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
f46827d
Update to README.md
athreesh 33be4f1
added NATS/etcd to ReadME + created ReadME for components
athreesh 5160978
changes to root ReadME + added in ReadME for components
athreesh c4df262
Update components README.md
athreesh 254e711
fix docker compose NATS command
athreesh 087f55e
Merge branch 'refactor-readme' of https://github.com/ai-dynamo/dynamo…
athreesh ce18586
address feedback from Itay, smaller images + updates to progress
athreesh 5bf38d0
neal adjustments
athreesh 2f63f1f
removing metrics highlight per ryan suggestion
athreesh 54957c8
ryan suggestion on header
athreesh f828878
fix framework matrix hyperlinks
athreesh 4b4ea9c
fix: Remove trailing whitespace (pre-commit hook)
athreesh 36c8c17
Merge branch 'main' into refactor-readme
athreesh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
| https://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| --> | ||
|
|
||
| # Dynamo Components | ||
|
|
||
| This directory contains the core components that make up the Dynamo inference framework. Each component serves a specific role in the distributed LLM serving architecture, enabling high-throughput, low-latency inference across multiple nodes and GPUs. | ||
|
|
||
| ## Supported Inference Engines | ||
|
|
||
| Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities: | ||
|
|
||
| - **[vLLM](backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms | ||
| - **[SGLang](backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication | ||
| - **[TensorRT-LLM](backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration | ||
|
|
||
| Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories. | ||
|
|
||
| ## Core Components | ||
|
|
||
| ### [Backends](backends/) | ||
|
|
||
| The backends directory contains inference engine integrations and implementations, with a key focus on: | ||
|
|
||
| - **vLLM** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, and SLA-based planning | ||
| - **SGLang** - SGLang engine integration supporting disaggregated serving and KV-aware routing | ||
| - **TensorRT-LLM** - TensorRT-LLM integration with disaggregated serving capabilities | ||
|
|
||
|
|
||
| ### [Frontend](frontend/) | ||
|
|
||
| The frontend component provides the HTTP API layer and request processing: | ||
|
|
||
| - **OpenAI-compatible HTTP server** - RESTful API endpoint for LLM inference requests | ||
| - **Pre-processor** - Handles request preprocessing and validation | ||
| - **Router** - Routes requests to appropriate workers based on load and KV cache state | ||
| - **Auto-discovery** - Automatically discovers and registers available workers | ||
|
|
||
| ### [Router](router/) | ||
|
|
||
| A high-performance request router written in Rust that: | ||
|
|
||
| - Routes incoming requests to optimal workers based on KV cache state | ||
| - Implements KV-aware routing to minimize cache misses | ||
| - Provides load balancing across multiple worker instances | ||
| - Supports both aggregated and disaggregated serving patterns | ||
|
|
||
| ### [Planner](planner/) | ||
|
|
||
| The planner component monitors system state and dynamically adjusts worker allocation: | ||
|
|
||
| - **Dynamic scaling** - Scales prefill/decode workers up and down based on metrics | ||
| - **Multiple backends** - Supports local (circus-based) and Kubernetes scaling | ||
| - **SLA-based planning** - Ensures performance targets are met | ||
| - **Load-based planning** - Optimizes resource utilization based on demand | ||
|
|
||
| ### [Metrics](metrics/) | ||
|
|
||
| The metrics component collects, aggregates, and exposes system metrics: | ||
|
|
||
| - **Prometheus-compatible endpoint** - Exposes metrics in standard Prometheus format | ||
| - **Real-time monitoring** - Collects statistics from workers and components | ||
| - **Visualization support** - Integrates with Grafana for dashboard creation | ||
| - **Push/Pull modes** - Supports both push and pull-based metric collection | ||
athreesh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Getting Started | ||
|
|
||
| To get started with Dynamo components: | ||
|
|
||
| 1. **Choose an inference engine** from the supported backends | ||
| 2. **Set up required services** (etcd and NATS) using Docker Compose | ||
| 3. **Configure** your chosen engine using Python wheels or building an image | ||
| 4. **Run deployment scripts** from the engine's launch directory | ||
| 5. **Monitor performance** using the metrics component | ||
|
|
||
| For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../../docs/). | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.