Name	Name	Last commit message	Last commit date
parent directory ..
backends	backends
src/dynamo	src/dynamo
README.md	README.md

Name

Last commit message

Last commit date

Dynamo Components

This directory contains the core components that make up the Dynamo inference framework. Each component serves a specific role in the distributed LLM serving architecture, enabling high-throughput, low-latency inference across multiple nodes and GPUs.

Supported Inference Engines

Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:

vLLM - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
SGLang - Structured generation language framework with ZMQ-based communication
TensorRT-LLM - NVIDIA's optimized LLM inference engine with TensorRT acceleration

Each engine provides launch scripts for different deployment patterns in their respective /launch & /deploy directories.

Core Components

Backends

The backends directory contains inference engine integrations and implementations, with a key focus on:

vLLM - Full-featured vLLM integration with disaggregated serving, KV-aware routing, and SLA-based planning
SGLang - SGLang engine integration supporting disaggregated serving and KV-aware routing
TensorRT-LLM - TensorRT-LLM integration with disaggregated serving capabilities

Frontend

The frontend component provides the HTTP API layer and request processing:

OpenAI-compatible HTTP server - RESTful API endpoint for LLM inference requests
Pre-processor - Handles request preprocessing and validation
Router - Routes requests to appropriate workers based on load and KV cache state
Auto-discovery - Automatically discovers and registers available workers

Planner

The planner component monitors system state and dynamically adjusts worker allocation:

Dynamic scaling - Scales prefill/decode workers up and down based on metrics
SLA-based planning - Ensures inference performance targets are met
Load-based planning - Optimizes resource utilization based on demand

Getting Started

To get started with Dynamo components:

Choose an inference engine from the supported backends
Set up required services (etcd and NATS) using Docker Compose
Configure your chosen engine using Python wheels or building an image
Run deployment scripts from the engine's launch directory
Monitor performance using the metrics component

For detailed instructions, see the README files in each component directory and the main Dynamo documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Dynamo Components

Supported Inference Engines

Core Components

Backends

Frontend

Planner

Getting Started

FilesExpand file tree

components

Directory actions

More options

Directory actions

More options

Latest commit

History

components

Folders and files

parent directory

README.md

Dynamo Components

Supported Inference Engines

Core Components

Backends

Frontend

Planner

Getting Started