feat: Add Dynamo TUI MVP for monitoring deployments #4296

AsadShahid04 · 2025-11-13T11:11:39Z

Description

Implements the MVP Terminal User Interface (TUI) for Dynamo as requested in #1453. This provides a K9s-like interface for monitoring Dynamo deployments in real-time.

The TUI watches ETCD to discover namespaces, components, and endpoints, monitors NATS connection statistics, and optionally displays Prometheus metrics for performance monitoring.

The screenshot shows the TUI successfully discovering and displaying:

Namespaces: dynamo namespace with 1 component
Components: backend component in Ready state with 2/2 endpoints and 2 instances
Endpoints: clear_kv_blocks and generate endpoints, both Ready with 1 instance each
NATS: Connected with message statistics
Real-time Updates: Live discovery and status updates

Features Implemented

Core TUI (#1453)

✅ ETCD Discovery: Real-time watching of ETCD to detect namespaces, components, and endpoints
✅ Hierarchical View: Three-column layout showing Namespaces → Components → Endpoints
✅ Keyboard Navigation: Vim-style navigation (hjkl, arrow keys) with focus indicators
✅ Real-time Updates: Live updates as components register/deregister

Health Monitoring (#1454)

✅ Status Indicators: Visual health states (Ready, Provisioning, Offline)
✅ Component Status: Real-time status updates as components initialize
✅ Instance Tracking: Display instance counts and last activity timestamps
✅ Deployment Spin-up: Watch components appear as they start

NATS Monitoring (#1455)

✅ Connection Status: Display Connected/Disconnected state
✅ Message Statistics: Byte and message counters (in/out)
✅ Connection Tracking: Monitor active connections
✅ Auto-refresh: Configurable polling interval (default 2s)

Metrics Display (#1456)

✅ Prometheus Integration: Optional metrics endpoint scraping
✅ TTFT/TPOT: Time to First Token and Time per Output Token
✅ Throughput Metrics: Request rate and token throughput
✅ Request Gauges: Inflight and queued request counts
✅ Configurable Intervals: Customizable scrape intervals (default 3s)

Changes

This PR includes the complete TUI implementation:

New Files:

launch/dynamo-tui/ - Complete TUI package
- src/app.rs - Core state management and event handling
- src/ui.rs - Ratatui-based UI rendering
- src/sources.rs - ETCD, NATS, and metrics data sources
- src/input.rs - Keyboard input handling
- src/metrics.rs - Prometheus metrics parsing
- src/main.rs - Application entry point
- README.md - User documentation
- Cargo.toml - Package configuration
- Dockerfile.dev - Docker-based development setup
- docker-build.sh, docker-run.sh, docker-test.sh - Docker helper scripts
- DOCKER.md - Docker development documentation

Bug Fixes:

Fixed test assertions in discovery_snapshot_populates_tree
- Issue: The test was incorrectly expecting the "frontend" component to have 2 endpoints and 3 instances
- Root Cause: The test data creates:
  - ns-a/frontend/http with instances 1 and 2 (same endpoint, 2 instances)
  - ns-a/router/grpc with instance 3 (different component)
  - ns-b/worker/engine with instance 4 (different namespace)
- Fix: Corrected expectations to match actual data structure:
  - Endpoint count: 2 → 1 (the "frontend" component has 1 endpoint "http" with 2 instances)
  - Instance count: 3 → 2 (the "frontend" component has 2 instances total, not 3)
- The test now correctly validates that the "frontend" component has 1 endpoint with 2 instances, while instance 3 belongs to the "router" component (a different component in the same namespace)
Added readme field to workspace.package in Cargo.toml to fix workspace build errors

Backward Compatibility:

Added backward compatibility for DiscoveryInstance deserialization (lib/runtime/src/discovery/mod.rs)
- Problem: The TUI (built from source) expects ETCD data with a "type": "Endpoint" field, but workers using pre-built wheels register without this field, causing parsing errors: "Failed to parse discovery instance from watch event missing field 'type'"
- Solution: Implemented custom Deserialize for DiscoveryInstance that handles both formats:
  - New format (with "type" field): Deserializes as DiscoveryInstance::Endpoint or DiscoveryInstance::Model
  - Old format (without "type" field): Automatically wraps raw Instance data as DiscoveryInstance::Endpoint
- Impact: The TUI can now discover workers regardless of whether they use pre-built wheels (old format) or source builds (new format)
- Benefits:
  - No breaking changes for existing deployments
  - Smooth transition period during runtime version updates
  - TUI works out-of-the-box with any worker version

Docker Development Setup:

Added Docker-based development workflow for cross-platform compatibility
- Dockerfile.dev: Linux-based build environment (fixes macOS inotify issues)
- Helper scripts for building, running, and testing in Docker
- Enables consistent development experience across macOS and Linux
- Includes protobuf-compiler for etcd-client compilation

Usage

Native Build

# Basic usage (ETCD + NATS monitoring)
cargo run -p dynamo-tui

# With metrics endpoint
cargo run -p dynamo-tui -- \
  --metrics-url http://localhost:9100/metrics \
  --metrics-interval 3s \
  --nats-interval 2s

Docker (Recommended for macOS)

# Build the Docker image
./launch/dynamo-tui/docker-build.sh

# Run the TUI
./launch/dynamo-tui/docker-run.sh --metrics-url http://localhost:9090/metrics

# Run tests
./launch/dynamo-tui/docker-test.sh

Keyboard Shortcuts

↑/k, ↓/j - Navigate within focused column
←/h, →/l - Move focus between columns
r - Manual refresh
q, Esc, Ctrl+C - Exit

Testing

✅ All unit tests passing
✅ TUI launches and connects successfully
✅ Real-time updates verified
✅ Keyboard navigation tested
✅ Error handling verified
✅ Backward compatibility tested with pre-built wheels
✅ Docker build and run verified

Related Issues

Closes #1453 (TUI for Dynamo MVP)
Implements features from:

[FEATURE]: TUI - Monitoring Health #1454 (Health Monitoring)
[FEATURE]: TUI NATS Monitoring #1455 (NATS Monitoring)
[FEATURE]: TUI Metrics #1456 (Metrics Display)

Reference

Based on the initial prototype in PR #689 by @ishandhanani

Notes

The TUI uses the same environment variables as other Dynamo components for ETCD/NATS configuration
Metrics scraping expects Dynamo frontend metric names (e.g., dynamo_frontend_*)
Requires Rust 1.90.0+ (as specified in rust-toolchain.toml)
Docker setup enables development on macOS despite Linux-specific dependencies (e.g., inotify)
Backward compatibility ensures the TUI works with workers using any runtime version

Summary by CodeRabbit

Release Notes

New Features
- New terminal user interface (TUI) for monitoring Dynamo environments with real-time namespace, component, and endpoint status visualization, integrated metrics and NATS statistics, and keyboard-based navigation.
- Docker support for building and running the TUI with simplified setup workflows.
Documentation
- Added comprehensive usage guides and Docker deployment documentation.

This commit introduces a new terminal UI for Dynamo, providing real-time insights into namespaces, components, endpoints, NATS statistics, and Prometheus metrics. Co-authored-by: asadblue2015 <[email protected]>

This commit introduces async-nats integration and refactors the application's state management for better organization and efficiency. Co-authored-by: asadblue2015 <[email protected]>

- Fix endpoint count expectation (1 instead of 2) - Fix instance count expectation (2 instead of 3) - Add readme field to workspace.package in Cargo.toml - Tests now correctly reflect the actual data structure

copy-pr-bot · 2025-11-13T11:11:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-11-13T11:11:49Z

👋 Hi AsadShahid04! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

- Add Dockerfile.dev for Linux-based builds (fixes macOS inotify issues) - Add docker-build.sh, docker-run.sh, and docker-test.sh scripts - Add DOCKER.md documentation for Docker workflow - Enables cross-platform development and testing

- Support both old format (without 'type' field) and new format (with 'type' field) - Allows TUI to parse ETCD data from workers using older runtime versions - Fixes 'missing field type' parsing errors in TUI

coderabbitai · 2025-11-19T06:18:52Z

Walkthrough

This pull request introduces a new TUI project for Dynamo as a workspace member that monitors namespaces, components, and endpoints via ETCD/NATS discovery with optional Prometheus metrics collection. It includes CLI configuration, Docker support, comprehensive documentation, and core Rust modules for state management, input handling, metrics processing, and UI rendering, alongside a library change enabling flexible JSON deserialization for discovery instances.

Changes

Cohort / File(s)	Summary
Workspace Configuration `.gitignore`, `Cargo.toml`	Added scripts/ directory to gitignore and registered dynamo-tui as new workspace member with readme field.
Project Manifest & Documentation `launch/dynamo-tui/Cargo.toml`, `launch/dynamo-tui/README.md`	Added Cargo manifest for dynamo-tui crate with workspace-scoped dependencies and comprehensive README documenting usage, CLI options, key bindings, UI panels, and limitations.
Docker Setup `launch/dynamo-tui/Dockerfile.dev`, `launch/dynamo-tui/DOCKER.md`, `launch/dynamo-tui/docker-*.sh`	Introduced Docker development environment with Dockerfile.dev (rust:1.90.0-slim base), build/test/run shell scripts, and detailed DOCKER.md guide covering workflows and troubleshooting.
Core Application State & Logic `launch/dynamo-tui/src/app.rs`	Defined comprehensive UI state machine with AppEvent enum, hierarchical data structures (namespaces, components, endpoints, instances), EndpointStatus enum with health aggregation, selection management, and input handling for navigation and refresh.
Input & Event Handling `launch/dynamo-tui/src/input.rs`	Introduced InputEvent enum (Quit, MoveUp, MoveDown, MoveLeft, MoveRight, Refresh) and spawn_input_listener function using crossterm for async keyboard event polling with cancellation support.
Metrics Processing `launch/dynamo-tui/src/metrics.rs`	Added MetricsSnapshot and PrometheusSample structs with process_metrics function for parsing Prometheus text format, computing averages and rate metrics from current/previous samples.
Data Pipelines `launch/dynamo-tui/src/sources.rs`	Implemented spawn_discovery_pipeline, spawn_nats_pipeline, and spawn_metrics_pipeline functions that emit AppEvent updates from ETCD discovery, NATS client state, and HTTP metrics endpoints with cancellation support.
UI Rendering `launch/dynamo-tui/src/ui.rs`	Added public render function orchestrating three-column layout (Namespaces/Components/Endpoints) with endpoint details panel, NATS/metrics stats bar, and status bar using ratatui with dynamic highlighting.
Binary Entry Point `launch/dynamo-tui/src/main.rs`	Created main binary wiring CLI argument parsing, runtime initialization, pipeline orchestration, terminal setup, event loop with input/app event handling, and graceful shutdown with terminal restoration.
Discovery Deserialization `lib/runtime/src/discovery/mod.rs`	Added custom Deserialize implementation for DiscoveryInstance supporting both new "type"-based format (Endpoint/Model variants) and legacy format via helper ModelVariant struct.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Core logic concentration: src/app.rs contains dense state management with comprehensive data structures and tree manipulation; src/sources.rs orchestrates three concurrent pipelines with error handling; src/ui.rs implements sophisticated layout rendering.
API surface breadth: Multiple interdependent public types and functions across app, input, metrics, and sources modules requiring coherent integration validation.
Discovery deserializer: Custom implementation with multi-format support and backward compatibility adds non-trivial logic density.
Additional focus areas:
- Cross-module integration between app state, event channels, and UI rendering
- Error propagation and cancellation token coordination across async pipelines
- Metrics rate calculation and edge case handling (negative deltas, zero durations)
- Selection clamping and hierarchical tree navigation correctness

Poem

🐰 A TUI hops into place,
Three columns sync at a swift pace,
NATS whispers, metrics gleam,
ETCD dreams in Rust's fast stream,
Dynamo's realm now shines so bright!

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.58% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main addition: a new Dynamo TUI MVP for deployment monitoring, accurately reflecting the substantial feature implementation throughout the changeset.
Description check	✅ Passed	The PR description comprehensively covers the feature with overview, implementation details, file changes, bug fixes, backward compatibility measures, usage instructions, and test results, following the repository template structure.
Linked Issues check	✅ Passed	The PR successfully implements the TUI MVP requested in #1453 with ETCD discovery watching, hierarchical namespace/component/endpoint display, and keyboard navigation, fully meeting the linked issue's primary coding requirements.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to TUI implementation (#1453) or supporting infrastructure (backward compatibility, workspace config, Docker setup). No extraneous modifications detected outside linked issue objectives.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/runtime/src/discovery/mod.rs (1)

140-154: Add tests for multi-format DiscoveryInstance deserialization and enhance implicit-default documentation

The deserializer implementation is correct as assessed. However, recommended test coverage remains absent:

New Endpoint format ("type": "Endpoint"): no direct deserialization test

New Model format ("type": "Model"): no direct deserialization test

Legacy format (no "type" field): no direct deserialization test

Error cases (unknown or non-string "type"): no tests

Additionally, the implicit Endpoint default when "type" is missing (line 208 of mod.rs) is documented only in a minimal comment at line 203. Per prior guidance on keeping specifications explicit, add a clear docstring or expanded comment above the fallback branch explicitly stating: "When type is absent, default to Endpoint for backward compatibility with pre-enum stored data."

🧹 Nitpick comments (6)

launch/dynamo-tui/Dockerfile.dev (1)
1-39: Good Docker practices for a development image.

The Dockerfile follows best practices with proper layer caching and cleanup. Since this is a development image, the single-stage build is appropriate.

For production use, consider a multi-stage build to reduce the final image size by excluding build tools:
# Stage 1: Builder
FROM rust:1.90.0-slim AS builder
RUN apt-get update && apt-get install -y \
    pkg-config libssl-dev curl git build-essential protobuf-compiler \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
COPY Cargo.toml Cargo.lock rust-toolchain.toml README.md pyproject.toml ./
COPY lib/ ./lib/
COPY launch/ ./launch/
RUN cargo build -p dynamo-tui --release

# Stage 2: Runtime
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y libssl3 ca-certificates \
    && rm -rf /var/lib/apt/lists/*
COPY --from=builder /workspace/target/release/dynamo-tui /usr/local/bin/
CMD ["dynamo-tui"]
launch/dynamo-tui/docker-run.sh (1)

14-43: Docker run wiring looks solid; only minor optional nits

The flow to ensure dynamo-tui:dev exists, pick sensible ETCD/NATS defaults per-OS, and set --network host on Linux is coherent and should work well for the intended environments.

If you want to polish further (optional):

Consider a brief comment that the --network host behavior is Linux-specific and that non‑Linux platforms are expected to go through the darwin branch.

Optionally quote NETWORK_MODE ("$NETWORK_MODE") or gate it with ${NETWORK_MODE:+$NETWORK_MODE} to avoid passing a spurious empty arg, though this is mostly cosmetic.

launch/dynamo-tui/src/ui.rs (1)

16-430: UI rendering is solid; only very small cleanup opportunities

The overall layout (3 columns + endpoint detail + stats + status bar) and use of App state look correct and robust, including handling of empty/invalid selections via filter_map and endpoint_at.

If you want to trim a few small nits (purely optional):

In render_metrics, the last entry uses format!(...).to_string(), which is redundant since format! already returns a String.

Several helpers repeatedly call app.selection() inside loops; caching let sel = app.selection(); once per render function would slightly reduce noise and repeated lookups.

instance_lines currently only matches TransportType::NatsTcp; that’s fine given today’s enum, but if new variants are added later, consider including them in the UI (you’ll get a compile error anyway, so this is more of a future reminder).

Functionally though, nothing here looks blocking.

launch/dynamo-tui/src/input.rs (1)

11-85: Input listener and key mapping look correct for the intended UX

The async input loop, spawn_blocking around crossterm polling, and the vim‑style key mappings all look good and align with the app’s handle_input semantics.

A small, optional enhancement would be to log once when the listener exits (e.g., due to cancellation or channel close) to aid debugging, but behavior is otherwise sound.

launch/dynamo-tui/src/sources.rs (2)

20-47: Discovery snapshot + initial seeding look correct, but consider handling send failures explicitly

The initial DiscoverySnapshot construction and filtering to DiscoveryInstance::Endpoint look good and align with the App’s expectations. One minor improvement: if tx.send(AppEvent::DiscoverySnapshot(...)).await fails (receiver dropped), you could bail out early instead of proceeding to set up the watch stream, since the TUI is unlikely to be alive anymore and the background task work becomes wasted.

This is non-blocking and mostly about avoiding unnecessary work during shutdown.

156-235: Metrics pipeline: error handling is working correctly

Verification confirms the gzip feature is properly enabled in launch/dynamo-tui/Cargo.toml with reqwest = { workspace = true, features = ["gzip"] }, so the .gzip(true) call will work as intended.

The overall structure remains sound:

Single reqwest::Client reused for all requests

Initial and periodic scraping with proper interval configuration

Network/body errors logged and surfaced to UI

The error handling semantics are as described: collect_metrics_once always returns Ok(()) for HTTP/body errors and only returns Err when the metrics channel closes. The is_err() checks don't change runtime behavior—they only trigger when the channel is closed (at which point errors can't be sent anyway). This works correctly in practice for UI-only telemetry.

For clarity, consider either removing the is_err() checks if they don't serve a purpose, or restructuring collect_metrics_once to return Err for network failures—but the current implementation is functionally sound and acceptable as-is.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 263f99d and be19622.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

.gitignore (1 hunks)
Cargo.toml (2 hunks)
launch/dynamo-tui/Cargo.toml (1 hunks)
launch/dynamo-tui/DOCKER.md (1 hunks)
launch/dynamo-tui/Dockerfile.dev (1 hunks)
launch/dynamo-tui/README.md (1 hunks)
launch/dynamo-tui/docker-build.sh (1 hunks)
launch/dynamo-tui/docker-run.sh (1 hunks)
launch/dynamo-tui/docker-test.sh (1 hunks)
launch/dynamo-tui/src/app.rs (1 hunks)
launch/dynamo-tui/src/input.rs (1 hunks)
launch/dynamo-tui/src/main.rs (1 hunks)
launch/dynamo-tui/src/metrics.rs (1 hunks)
launch/dynamo-tui/src/sources.rs (1 hunks)
launch/dynamo-tui/src/ui.rs (1 hunks)
lib/runtime/src/discovery/mod.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (13)

📚 Learning: 2025-08-05T22:51:59.230Z

Learnt from: dmitry-tokarev-nv
Repo: ai-dynamo/dynamo PR: 2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.

Applied to files:

Cargo.toml

📚 Learning: 2025-10-17T22:57:12.526Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager.rs:19-19
Timestamp: 2025-10-17T22:57:12.526Z
Learning: The dynamo library (ai-dynamo/dynamo repository) is currently developed to run on Linux only, so Linux-specific code and syscalls do not require platform guards.

Applied to files:

Cargo.toml

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

launch/dynamo-tui/docker-test.sh
launch/dynamo-tui/Dockerfile.dev
launch/dynamo-tui/docker-build.sh
launch/dynamo-tui/docker-run.sh

📚 Learning: 2025-08-15T02:01:01.238Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2453
File: deploy/dynamo_check.py:739-741
Timestamp: 2025-08-15T02:01:01.238Z
Learning: In the dynamo project, missing cargo should be treated as a fatal error in dynamo_check.py because developers need cargo to build the Rust components. The tool is designed to validate the complete development environment, not just import functionality.

Applied to files:

launch/dynamo-tui/Cargo.toml

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

launch/dynamo-tui/Dockerfile.dev
launch/dynamo-tui/docker-build.sh
launch/dynamo-tui/docker-run.sh

📚 Learning: 2025-09-16T17:16:07.820Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.vllm.j2:221-233
Timestamp: 2025-09-16T17:16:07.820Z
Learning: In the dynamo project, when converting Dockerfiles to Jinja2 templates, the primary goal is backward compatibility - generated Dockerfiles must be identical to the originals. Security improvements or other enhancements that would change the generated output are out of scope for templating PRs and should be addressed separately.

Applied to files:

launch/dynamo-tui/Dockerfile.dev
launch/dynamo-tui/docker-build.sh
launch/dynamo-tui/docker-run.sh

📚 Learning: 2025-09-03T01:10:12.599Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2822
File: container/Dockerfile.vllm:343-352
Timestamp: 2025-09-03T01:10:12.599Z
Learning: In the dynamo project's local-dev Docker targets, USER_UID and USER_GID build args are intentionally left without default values to force explicit UID/GID mapping during build time, preventing file permission issues in local development environments where container users need to match host user permissions for mounted volumes.

Applied to files:

launch/dynamo-tui/Dockerfile.dev

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

launch/dynamo-tui/Dockerfile.dev
launch/dynamo-tui/docker-build.sh
launch/dynamo-tui/docker-run.sh

📚 Learning: 2025-08-02T06:09:41.078Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2255
File: .devcontainer/build-dev-image.sh:173-177
Timestamp: 2025-08-02T06:09:41.078Z
Learning: In bash scripts, SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")" preserves the relative/absolute nature of how the script was invoked. When a script is run with a relative path like .devcontainer/build-dev-image.sh from the repo root, SCRIPT_DIR becomes .devcontainer (relative), making it valid for Docker COPY commands within the build context.

Applied to files:

launch/dynamo-tui/docker-build.sh

📚 Learning: 2025-08-02T06:09:41.078Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2255
File: .devcontainer/build-dev-image.sh:173-177
Timestamp: 2025-08-02T06:09:41.078Z
Learning: In bash scripts, SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")" preserves the relative/absolute nature of how the script was invoked. When a script is run with a relative path like ./build-dev-image.sh or .devcontainer/build-dev-image.sh, SCRIPT_DIR becomes a relative path (.devcontainer), making it valid for Docker COPY commands within the build context.

Applied to files:

launch/dynamo-tui/docker-build.sh

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/runtime/src/discovery/mod.rs

📚 Learning: 2025-09-16T00:26:43.641Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/examples/system_metrics/README.md:65-65
Timestamp: 2025-09-16T00:26:43.641Z
Learning: The team at ai-dynamo/dynamo prefers to use consistent metric naming patterns with _total suffixes across all metric types (including gauges) for internal consistency, even when this differs from strict Prometheus conventions that reserve _total for counters only. This design decision was confirmed by keivenchang in PR 3035, referencing examples in prometheus_names.rs and input from team members.

Applied to files:

launch/dynamo-tui/src/metrics.rs

📚 Learning: 2025-06-02T19:37:27.666Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.

Applied to files:

launch/dynamo-tui/src/sources.rs

🧬 Code graph analysis (6)

launch/dynamo-tui/src/ui.rs (1)

launch/dynamo-tui/src/app.rs (6)

default (47-57)

default (74-76)

default (130-132)

selection (216-218)

display_name (470-477)

as_color (479-487)

launch/dynamo-tui/src/input.rs (1)

launch/dynamo-tui/src/main.rs (2)

mpsc (57-57)

mpsc (58-58)

launch/dynamo-tui/src/main.rs (5)

launch/dynamo-tui/src/sources.rs (3)

spawn_discovery_pipeline (20-94)

spawn_metrics_pipeline (156-202)

spawn_nats_pipeline (96-136)

launch/dynamo-tui/src/ui.rs (1)

render (17-56)

launch/dynamo-tui/src/input.rs (1)

spawn_input_listener (21-53)

launch/dynamo-tui/src/app.rs (5)

default (47-57)

default (74-76)

default (130-132)

apply_event (136-173)

handle_input (175-187)

launch/dynamo-tui/src/metrics.rs (1)

default (33-45)

launch/dynamo-tui/src/metrics.rs (1)

lib/runtime/examples/system_metrics/tests/integration_test.rs (1)

line (164-167)

launch/dynamo-tui/src/sources.rs (1)

launch/dynamo-tui/src/metrics.rs (1)

process_metrics (48-88)

launch/dynamo-tui/src/app.rs (2)

launch/dynamo-tui/src/metrics.rs (1)

default (33-45)

lib/runtime/src/component/client.rs (1)

instances (73-75)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4296/merge) by AsadShahid04.

lib/runtime/src/discovery/mod.rs

[error] 160-160: Trailing whitespace detected by pre-commit hook and modified in this file.

[error] 1-1: Command failed with exit code 1: pre-commit run --show-diff-on-failure --color=always --all-files

🪛 GitHub Actions: Rust pre-merge checks

lib/runtime/src/discovery/mod.rs

[error] 160-202: cargo fmt -- --check failed (exit code 1) due to formatting differences detected in lib/runtime/src/discovery/mod.rs. Run 'cargo fmt' to fix formatting.

🪛 Shellcheck (0.11.0)

launch/dynamo-tui/docker-test.sh

[error] 29-29: Argument mixes string and array. Use * or separate argument.

(SC2145)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (20)

lib/runtime/src/discovery/mod.rs (1)

8-8: serde_json import aligns with expanded usage

Bringing serde_json into scope is consistent with its broader use in this module (manual Deserialize impl and card_json handling); no issues here.

Cargo.toml (1)

7-7: LGTM! Clean workspace configuration updates.

The addition of the new dynamo-tui workspace member and the readme field properly integrate the new TUI crate into the project structure.

Also applies to: 45-45

launch/dynamo-tui/Cargo.toml (1)

1-32: LGTM! Well-structured crate manifest.

The dependencies are appropriate for a TUI monitoring application, and the workspace inheritance promotes consistency across the project.

launch/dynamo-tui/README.md (1)

1-51: LGTM! Comprehensive and clear documentation.

The README provides excellent guidance for users, including usage examples, configuration options, key bindings, and honest discussion of current limitations.

launch/dynamo-tui/DOCKER.md (1)

1-123: LGTM! Thorough Docker workflow documentation.

The guide provides excellent coverage of Docker-based development workflows, including macOS-specific networking considerations and helpful troubleshooting steps.

launch/dynamo-tui/src/metrics.rs (3)

6-46: LGTM! Well-designed data structures.

The MetricsSnapshot and PrometheusSample structs are appropriately structured with Option types for metrics that may be unavailable. The Default implementation provides sensible zero values.

48-88: LGTM! Solid metrics processing logic.

The process_metrics function correctly extracts Prometheus metrics, computes averages, and calculates rates. The hardcoded metric names (dynamo_frontend_*) are acknowledged as a current limitation in the README.

90-146: LGTM! Well-implemented helper functions.

The parsing and calculation helpers are correctly implemented:

sum_metric and parse_sample_value properly extract values from Prometheus format

average_ms guards against division by near-zero counts

rate_metrics handles counter resets gracefully by returning None for negative deltas

value_or_none filters out near-zero gauge values

The defensive max(0.0) in rate_metrics (lines 129, 134) guards against floating-point edge cases after the negativity check, which is good defensive programming.

.gitignore (1)

113-114: LGTM! Appropriate gitignore entry.

Ignoring the scripts/ directory for local development tools is a sensible addition.

launch/dynamo-tui/docker-build.sh (1)

1-26: LGTM! Well-structured build script.

The script follows bash best practices with strict error handling (set -e), proper directory resolution, and helpful user guidance. The approach aligns with established patterns in the project.

launch/dynamo-tui/src/app.rs (7)

10-58: Event model and NATS snapshot defaults are coherent with the pipelines

AppEvent variants map cleanly onto what sources.rs emits (discovery snapshot / up / down, NATS, metrics, info, error), which keeps the event-driven design clear. NatsSnapshot’s fields and Default impl (Disconnected with zeroed counters and Instant::now()) give the UI a safe baseline before any real data arrives.

No changes needed here.

79-188: Core App state and apply_event/handle_input logic are solid

A few points that look good:

App keeps all mutable UI state encapsulated (namespaces tree, selection, last_* markers, NATS/metrics), simplifying the rest of the TUI.

apply_event:

Rebuilds the tree on DiscoverySnapshot and resets selection indices once.

Uses insert_instance/remove_instance to respond to InstanceUp/InstanceDown, so the tests exercise the same paths as the pipelines.

Stores latest NATS/metrics snapshots and info/error messages.

Always calls clamp_selection() at the end, maintaining navigation invariants after any mutation.

handle_input cleanly separates “quit” (returns true) from all other inputs, which only mutate state and then return false.

This section looks correct and maintainable as-is.

261-325: Instance insertion/removal semantics match the intended endpoint lifecycle

insert_instance and remove_instance correctly maintain the namespaces/components/endpoints hierarchy:

Lazily creates namespace/component/endpoint entries as needed with reasonable defaults (new endpoints start in Provisioning but immediately become Ready upon first instance insertion).

Updates endpoint.updated_at and per-instance last_seen on each InstanceUp, so repeated “up” events for the same instance naturally refresh its timestamp.

On InstanceDown, removes only the specific instance; if that was the last one, marks the endpoint Offline but keeps it in the tree—this matches the test and is helpful for operators to see offline endpoints instead of having them disappear.

Given the TUI’s goals, this behavior is exactly what you want.

327-407: Navigation (move_selection, move_focus, shift_index, clamp_selection) is well thought out

The navigation helpers implement a consistent, K9s-like behavior:

shift_index wraps around and gracefully handles empty lists.

move_selection adjusts the appropriate index based on the focused column, resetting deeper indices when moving up a level (e.g., changing namespace resets component/endpoint indices).

move_focus only allows moving right if there is at least one component/endpoint to move into, and moves left unconditionally where appropriate.

clamp_selection:

Handles the “no namespaces” case early.

Clamps all indices to valid ranges.

Downgrades focus from Components/Endpoints to Namespaces when there’s nothing in those columns, preventing invalid focus states.

This should produce predictable, robust navigation even as the discovery tree changes in real time.

409-452: Health summaries and aggregate_status encode a clear precedence model

endpoint_health_summary and component_health_summary:

For endpoints, surface status, instance count, and the most recent last_seen, which is exactly what the UI needs.

For components, start from Offline and fold over endpoints with aggregate_status, tracking total/active endpoints and instance count.

aggregate_status’s ordering:

Prefers Ready whenever any endpoint is Ready.

Otherwise prefers Provisioning.

Treats Unknown as neutral, deferring to the other status.

Falls back to Offline when everything is Offline.

This precedence is reasonable and deterministic. If you ever need a different ordering (e.g., treating Unknown as “worse” than Offline), it will be easy to adjust here centrally.

490-547: Tests cover the key discovery and offline-path behaviors

The tests validate the most critical flows:

discovery_snapshot_populates_tree ensures that:

Multiple namespaces/components/endpoints are created from a snapshot.

Component and endpoint instance counts are correct.

Endpoint status is Ready after snapshot seeding.

instance_down_marks_endpoint_offline verifies that:

Applying InstanceDown does not remove the endpoint node.

The endpoint is marked Offline and its instances map is emptied.

These directly exercise the same public methods the pipelines use (apply_event, component_at, endpoint_at, summaries), which is ideal. As you add more UI features (e.g., metrics- or NATS-driven views), similar focused tests around App will keep things robust.

No changes needed.

469-487: Color mapping is semantically sound and properly centralized

The as_color method uses standard terminal color conventions (Gray for unknown, Yellow for provisioning, Green for ready, Red for offline) that read well across dark and light terminal themes. The method is called from exactly three locations in ui.rs, keeping styling logic in one place.

launch/dynamo-tui/src/sources.rs (3)

96-136: The review comment is incorrect and should be disregarded.

The documented behavior of tokio::time::interval is that the first tick() completes immediately, not "typically waits for interval" as the review claims. The comment "Fire first tick immediately" is therefore accurate and reflects the actual behavior of the code. The code is correct as written—it provides an immediate snapshot via the explicit build_nats_snapshot call before the loop, and then the interval setup reinforces this with correct semantics.

Likely an incorrect or invalid review comment.

138-154: NATS snapshot construction is correct and aligns with async-nats API

The Client::statistics() method returns an Arc<Statistics> with AtomicU64 fields, and the Client::connection_state() method returns the connection::State enum. Your code correctly uses .load(Ordering::Relaxed) on the atomic counters and pattern-matches the connection state. Using Ordering::Relaxed for these read-only metrics is appropriate for UI display. The implementation is sound.

50-90: Discovery stream loop behavior confirmed correct; no additional event variants in use

The code appropriately handles the discovery API. The loop:

Correctly terminates on cancel.cancelled() or stream end

Tracks Added(Endpoint) instances and sends InstanceUp

Sends fully-populated InstanceDown on Removed(instance_id) lookups

Safely ignores non-endpoint events with Ok(_), which is future-proof since only Added and Removed variants are used in practice for endpoint discovery

coderabbitai · 2025-11-19T06:18:56Z

launch/dynamo-tui/docker-test.sh

+# Check if image exists
+if ! docker image inspect dynamo-tui:dev >/dev/null 2>&1; then
+    echo "Image not found. Building first..."
+    ./launch/dynamo-tui/docker-build.sh
+fi
+
+echo "Running dynamo-tui tests in Docker..."
+echo ""
+
+# Run tests (build if needed, then test)
+docker run --rm \
+    -v "$PROJECT_ROOT:/workspace" \
+    -w /workspace \
+    rust:1.90.0-slim bash -c "
+        apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1
+        cargo test -p dynamo-tui -- '$@'
+    "


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Fix argument forwarding into the container and avoid building an unused image

Two issues here:

CLI args are not forwarded to cargo test inside the container

The inner command runs:

cargo test -p dynamo-tui -- '$@'

Because $@ is inside single quotes in the inner bash -c "..." script, it is never expanded; cargo test always receives the literal string $@, and outer-script arguments are effectively ignored. This also matches the ShellCheck SC2145 hint.

You can forward the outer script’s arguments safely using the standard bash -c ... bash "$@" pattern:

-# Run tests (build if needed, then test) -docker run --rm \ - -v "$PROJECT_ROOT:/workspace" \ - -w /workspace \ - rust:1.90.0-slim bash -c " - apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1 - cargo test -p dynamo-tui -- '$@' - " +# Run tests (build if needed, then test) +docker run --rm \ + -v "$PROJECT_ROOT:/workspace" \ + -w /workspace \ + rust:1.90.0-slim bash -lc ' + apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1 + cargo test -p dynamo-tui -- "$@" + ' bash "$@"

This way, $@ in the inner script expands to the arguments you pass to docker-test.sh.

Image existence check builds dynamo-tui:dev but tests use rust:1.90.0-slim

The script ensures dynamo-tui:dev exists, but then never uses it (tests run in rust:1.90.0-slim). That adds an unnecessary build step and may confuse users.

Either:

Switch the docker run to use dynamo-tui:dev if that image is intended to be the dev/test environment, or

Drop the image-existence check entirely if tests are meant to run in the stock Rust image.

🏁 Script executed:

#!/bin/bash # Verify the file exists and show its content echo "=== Checking docker-test.sh ===" if [ -f "launch/dynamo-tui/docker-test.sh" ]; then cat "launch/dynamo-tui/docker-test.sh" else echo "File not found" fi

Length of output: 918

🏁 Script executed:

#!/bin/bash # Check docker-build.sh to understand what dynamo-tui:dev image is echo "=== Checking docker-build.sh ===" if [ -f "launch/dynamo-tui/docker-build.sh" ]; then cat "launch/dynamo-tui/docker-build.sh" else echo "File not found" fi

Length of output: 672

🏁 Script executed:

#!/bin/bash # Search for other references to dynamo-tui:dev or docker-test.sh in the codebase echo "=== Searching for dynamo-tui:dev references ===" rg -n "dynamo-tui:dev" --type=sh --type=bash echo "" echo "=== Searching for docker-test.sh usage ===" rg -n "docker-test\.sh" -C2

Length of output: 815

Fix argument forwarding into the container and avoid building an unused image

Two issues verified:

CLI args are not forwarded to cargo test inside the container

Line 29 runs cargo test -p dynamo-tui -- '$@'. Because $@ is inside single quotes within the bash -c "..." script, it never expands; the DOCKER.md documents usage like ./launch/dynamo-tui/docker-test.sh discovery_snapshot_populates_tree, but those arguments are currently ignored.

Use the standard pattern to forward arguments:

-# Run tests (build if needed, then test) -docker run --rm \ - -v "$PROJECT_ROOT:/workspace" \ - -w /workspace \ - rust:1.90.0-slim bash -c " - apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1 - cargo test -p dynamo-tui -- '$@' - " +# Run tests (build if needed, then test) +docker run --rm \ + -v "$PROJECT_ROOT:/workspace" \ + -w /workspace \ + rust:1.90.0-slim bash -lc ' + apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1 + cargo test -p dynamo-tui -- "$@" + ' bash "$@"

Image existence check builds dynamo-tui:dev but tests use rust:1.90.0-slim

Lines 15–19 ensure dynamo-tui:dev exists, but line 27 never uses it; tests always run in the stock Rust image. This adds an unnecessary build step.

Either switch docker run to use dynamo-tui:dev or remove the image-existence check if tests are meant to run in the stock image.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Check if image exists

if ! docker image inspect dynamo-tui:dev >/dev/null 2>&1; then

echo "Image not found. Building first..."

./launch/dynamo-tui/docker-build.sh

fi

echo "Running dynamo-tui tests in Docker..."

echo ""

# Run tests (build if needed, then test)

docker run --rm \

-v "$PROJECT_ROOT:/workspace" \

-w /workspace \

rust:1.90.0-slim bash -c "

apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1

cargo test -p dynamo-tui -- '$@'

"

# Check if image exists

if ! docker image inspect dynamo-tui:dev >/dev/null 2>&1; then

echo "Image not found. Building first..."

./launch/dynamo-tui/docker-build.sh

fi

echo "Running dynamo-tui tests in Docker..."

echo ""

# Run tests (build if needed, then test)

docker run --rm \

-v "$PROJECT_ROOT:/workspace" \

-w /workspace \

rust:1.90.0-slim bash -lc '

apt-get update -qq && apt-get install -y -qq pkg-config libssl-dev build-essential protobuf-compiler >/dev/null 2>&1

cargo test -p dynamo-tui -- "$@"

' bash "$@"

🧰 Tools

🪛 Shellcheck (0.11.0)

[error] 29-29: Argument mixes string and array. Use * or separate argument.

(SC2145)

🤖 Prompt for AI Agents

In launch/dynamo-tui/docker-test.sh around lines 14 to 30, there are two fixes: the script currently checks/bails to build dynamo-tui:dev but then runs tests in rust:1.90.0-slim (so either remove the unused image check or change the docker run to use dynamo-tui:dev), and CLI args are not forwarded into the container because $@ is inside single quotes within the bash -c string; change the invocation to pass arguments correctly (use the common pattern: bash -lc "cargo test -p dynamo-tui -- \"\$@\"" -- "$@" or otherwise ensure $@ is expanded by the outer shell into the container command) so that arguments supplied to docker-test.sh reach cargo test.

coderabbitai · 2025-11-19T06:18:56Z

launch/dynamo-tui/src/main.rs

+#[tokio::main]
+async fn main() -> anyhow::Result<()> {
+    dynamo_runtime::logging::init();
+
+    let args = Args::parse();
+
+    let runtime = Runtime::from_settings().context("failed to create runtime from settings")?;
+    let distributed_runtime = Arc::new(DistributedRuntime::from_settings(runtime.clone()).await?);
+
+    let cancel_token = CancellationToken::new();
+
+    let (app_event_tx, app_event_rx) = mpsc::channel::<AppEvent>(256);
+    let (input_tx, input_rx) = mpsc::channel::<InputEvent>(32);
+
+    let mut tasks = Vec::new();
+
+    tasks.push(
+        spawn_discovery_pipeline(
+            distributed_runtime.clone(),
+            app_event_tx.clone(),
+            cancel_token.clone(),
+        )
+        .await?,
+    );
+
+    if let Some(nats_handle) = spawn_nats_pipeline(
+        distributed_runtime.clone(),
+        app_event_tx.clone(),
+        cancel_token.clone(),
+        args.nats_interval,
+    )
+    .await?
+    {
+        tasks.push(nats_handle);
+    }
+
+    if let Some(url) = args.metrics_url.clone() {
+        if let Some(metrics_handle) = spawn_metrics_pipeline(
+            url,
+            args.metrics_interval,
+            app_event_tx.clone(),
+            cancel_token.clone(),
+        )
+        .await?
+        {
+            tasks.push(metrics_handle);
+        }
+    }
+
+    tasks.push(input::spawn_input_listener(input_tx, cancel_token.clone()));
+
+    let backend = CrosstermBackend::new(std::io::stdout());
+    let terminal = Terminal::new(backend)?;
+
+    execute!(std::io::stdout(), EnterAlternateScreen, EnableMouseCapture)?;
+    crossterm::terminal::enable_raw_mode()?;
+
+    let res = run_app(terminal, app_event_rx, input_rx, cancel_token.clone()).await;
+
+    cancel_token.cancel();
+
+    // Ensure background tasks exit
+    for handle in tasks {
+        handle.abort();
+    }
+
+    restore_terminal()?;
+
+    if let Err(err) = res {
+        tracing::error!(error = %err, "Dynamo TUI exited with error");
+        return Err(err);
+    }
+
+    // Shutdown the runtime gracefully
+    distributed_runtime.shutdown();
+    runtime.shutdown();
+
+    Ok(())
+}
+
+/// Returns the terminal to its previous state.
+fn restore_terminal() -> anyhow::Result<()> {
+    crossterm::terminal::disable_raw_mode()?;
+    execute!(std::io::stdout(), LeaveAlternateScreen)?;
+    Ok(())
+}
+
+async fn run_app(
+    mut terminal: Terminal<CrosstermBackend<std::io::Stdout>>,
+    mut app_events: mpsc::Receiver<AppEvent>,
+    mut input_events: mpsc::Receiver<InputEvent>,
+    cancel_token: CancellationToken,
+) -> anyhow::Result<()> {
+    let mut app = App::default();
+    let mut tick = tokio::time::interval(std::time::Duration::from_millis(100));
+
+    loop {
+        tokio::select! {
+            _ = cancel_token.cancelled() => break,
+            _ = tick.tick() => {
+                terminal.draw(|frame| render(frame, &app))?;
+            }
+            Some(event) = app_events.recv() => {
+                app.apply_event(event);
+            }
+            Some(input) = input_events.recv() => {
+                if app.handle_input(input) {
+                    break;
+                }
+            }
+            else => break,
+        }
+    }
+
+    Ok(())
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

App wiring and shutdown look good; consider symmetric mouse capture teardown

The orchestration from Args parsing through runtime creation, pipeline spawning, and the run_app loop is clean and the cancellation story (CancellationToken + aborting task handles) is reasonable for a TUI.

One minor UX nit: you enable mouse capture when entering the alternate screen:

execute!(std::io::stdout(), EnterAlternateScreen, EnableMouseCapture)?; crossterm::terminal::enable_raw_mode()?;

but restore_terminal only disables raw mode and leaves the alternate screen:

fn restore_terminal() -> anyhow::Result<()> { crossterm::terminal::disable_raw_mode()?; execute!(std::io::stdout(), LeaveAlternateScreen)?; Ok(()) }

For full symmetry with crossterm’s API, you may want to also call DisableMouseCapture in restore_terminal, so the terminal is always returned to its previous state even if some terminals don’t reset mouse mode automatically.

🌐 Web query:

What is the recommended pattern in crossterm for enabling and disabling mouse capture and alternate screen in a TUI application?

💡 Result:

Recommended pattern

On startup: enable raw mode, enter alternate screen, enable mouse capture (and hide cursor if desired).

On shutdown (or in a Drop/guard): disable mouse capture, leave alternate screen, disable raw mode (and show cursor).

Wrap setup/teardown in an RAII guard (implement Drop) or use a scope/panic hook so teardown always runs even on panic.

Minimal example (conceptual):
use std::io::stdout;
use crossterm::{
execute,
terminal::{enable_raw_mode, disable_raw_mode, EnterAlternateScreen, LeaveAlternateScreen},
event::{EnableMouseCapture, DisableMouseCapture},
cursor::{Hide, Show},
};

// setup enable_raw_mode()?; execute!(stdout(), EnterAlternateScreen, EnableMouseCapture, Hide)?; // ... run your TUI ... // teardown (ensure this always runs—Drop/panic-hook/guard) execute!(stdout(), DisableMouseCapture, LeaveAlternateScreen, Show)?; disable_raw_mode()?;

References/examples: crossterm API (enter/leave alt screen, enable/disable mouse, raw mode). [1] and a common TUI pattern using Drop for cleanup (ratatui example). [2]

Sources
[1] crossterm docs (terminal & event commands).
[2] ratatui / tui setup & Drop-based teardown example.

Fix asymmetric terminal restoration and missing mouse capture disable

The startup/shutdown order matters for proper terminal state recovery. Currently, restore_terminal() disables raw mode before leaving the alternate screen and is missing DisableMouseCapture entirely.

Based on the recommended crossterm pattern, teardown should disable mouse capture, leave alternate screen, then disable raw mode. Update restore_terminal() to:

fn restore_terminal() -> anyhow::Result<()> { execute!(std::io::stdout(), DisableMouseCapture, LeaveAlternateScreen)?; crossterm::terminal::disable_raw_mode()?; Ok(()) }

Additionally, wrap setup/teardown in an RAII guard (implement Drop) or use a panic hook so teardown always runs even on panic, since a panic in run_app would skip restore_terminal() and leave the terminal in a broken state.

🤖 Prompt for AI Agents

In launch/dynamo-tui/src/main.rs around lines 46 to 161, restore_terminal currently disables raw mode before leaving the alternate screen and omits DisableMouseCapture, and teardown is not guaranteed on panic; update restore_terminal to first disable mouse capture and leave the alternate screen, then disable raw mode (use execute! with DisableMouseCapture and LeaveAlternateScreen before crossterm::terminal::disable_raw_mode()), and ensure teardown always runs by introducing an RAII guard type (struct TerminalGuard) that performs the teardown in Drop (or install a panic hook that calls restore_terminal) and create an instance at startup so the terminal is restored even if run_app panics.

coderabbitai · 2025-11-19T06:18:56Z

lib/runtime/src/discovery/mod.rs

+impl<'de> serde::Deserialize<'de> for DiscoveryInstance {
+    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
+    where
+        D: serde::Deserializer<'de>,
+    {
+        // First, try to deserialize as a JSON value to inspect it
+        let value = serde_json::Value::deserialize(deserializer)?;
+
+        // Check if it has a "type" field (new format)
+        if let Some(type_value) = value.get("type") {
+            // New format with "type" field
+            match type_value.as_str() {
+                Some("Endpoint") => {
+                    // Remove the "type" field before deserializing as Instance
+                    let mut instance_value = value.clone();
+                    if let Some(obj) = instance_value.as_object_mut() {
+                        obj.remove("type");
+                    }
+                    let instance: crate::component::Instance = serde_json::from_value(instance_value)
+                        .map_err(|e| serde::de::Error::custom(format!(
+                            "Failed to deserialize Endpoint: {}", e
+                        )))?;
+                    Ok(DiscoveryInstance::Endpoint(instance))
+                }
+                Some("Model") => {
+                    // Deserialize as Model variant
+                    let model: ModelVariant = serde_json::from_value(value)
+                        .map_err(|e| serde::de::Error::custom(format!(
+                            "Failed to deserialize Model: {}", e
+                        )))?;
+                    Ok(DiscoveryInstance::Model {
+                        namespace: model.namespace,
+                        component: model.component,
+                        endpoint: model.endpoint,
+                        instance_id: model.instance_id,
+                        card_json: model.card_json,
+                    })
+                }
+                Some(unknown) => Err(serde::de::Error::custom(format!(
+                    "Unknown DiscoveryInstance type: {}", unknown
+                ))),
+                None => Err(serde::de::Error::custom(
+                    "DiscoveryInstance type field is not a string"
+                )),
+            }
+        } else {
+            // Old format: try to deserialize as an Endpoint (Instance)
+            // This handles backward compatibility with data stored before the DiscoveryInstance enum
+            let instance: crate::component::Instance = serde_json::from_value(value)
+                .map_err(|e| serde::de::Error::custom(format!(
+                    "Failed to deserialize as old format Instance: {}", e
+                )))?;
+            Ok(DiscoveryInstance::Endpoint(instance))
+        }
+    }
+}


⚠️ Potential issue | 🟡 Minor

Fix rustfmt and trailing whitespace to unblock CI

The impl<'de> Deserialize<'de> for DiscoveryInstance block currently fails cargo fmt -- --check, and there is trailing whitespace reported by pre-commit (around Line 160–163 in this file). Please run cargo fmt and clean up trailing whitespace so the Rust and pre-commit checks pass.

🧰 Tools

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4296/merge) by AsadShahid04.

[error] 160-160: Trailing whitespace detected by pre-commit hook and modified in this file.

🪛 GitHub Actions: Rust pre-merge checks

[error] 160-202: cargo fmt -- --check failed (exit code 1) due to formatting differences detected in lib/runtime/src/discovery/mod.rs. Run 'cargo fmt' to fix formatting.

🤖 Prompt for AI Agents

In lib/runtime/src/discovery/mod.rs around lines 156 to 211, the Deserialize impl is misformatted and contains trailing whitespace (notably around lines 160–163); run `cargo fmt` (or `rustfmt`) to fix formatting, remove any trailing spaces on the reported lines, and re-run pre-commit/cargo fmt -- --check to ensure the file is clean before committing.

dagil-nvidia · 2025-12-05T20:21:30Z

@AsadShahid04 - are you planning to address the comments from CodeRabbit? If not, we will close PR in 7 days

cursoragent and others added 3 commits November 12, 2025 10:04

feat: Add Dynamo TUI for monitoring deployments

cf55c31

This commit introduces a new terminal UI for Dynamo, providing real-time insights into namespaces, components, endpoints, NATS statistics, and Prometheus metrics. Co-authored-by: asadblue2015 <[email protected]>

feat: Add async-nats and refactor app state management

23210a1

This commit introduces async-nats integration and refactors the application's state management for better organization and efficiency. Co-authored-by: asadblue2015 <[email protected]>

fix: correct test assertions in discovery_snapshot_populates_tree

6ad2133

- Fix endpoint count expectation (1 instead of 2) - Fix instance count expectation (2 instead of 3) - Add readme field to workspace.package in Cargo.toml - Tests now correctly reflect the actual data structure

pull-request-size bot added the size/XXL label Nov 13, 2025

github-actions bot added the fix label Nov 13, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Nov 13, 2025

AsadShahid04 changed the title ~~fix: correct test assertions in discovery_snapshot_populates_tree~~ feat: Add Dynamo TUI MVP for monitoring deployments Nov 13, 2025

github-actions bot added feat and removed fix labels Nov 13, 2025

feat: add Docker-based development setup for TUI

c044d2a

- Add Dockerfile.dev for Linux-based builds (fixes macOS inotify issues) - Add docker-build.sh, docker-run.sh, and docker-test.sh scripts - Add DOCKER.md documentation for Docker workflow - Enables cross-platform development and testing

AsadShahid04 force-pushed the feature/dynamo-tui-mvp branch from e431461 to c044d2a Compare November 19, 2025 03:39

AsadShahid04 and others added 3 commits November 18, 2025 19:39

chore: add scripts/ to .gitignore

2674ecb

Add backward compatibility for DiscoveryInstance deserialization

5e97873

- Support both old format (without 'type' field) and new format (with 'type' field) - Allows TUI to parse ETCD data from workers using older runtime versions - Fixes 'missing field type' parsing errors in TUI

Merge branch 'main' into feature/dynamo-tui-mvp

be19622

AsadShahid04 marked this pull request as ready for review November 19, 2025 06:13

AsadShahid04 requested a review from a team as a code owner November 19, 2025 06:13

AsadShahid04 mentioned this pull request Nov 19, 2025

[FEATURE]: TUI for Dynamo #1453

Open

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

Merge branch 'main' into feature/dynamo-tui-mvp

0cf6171

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Dynamo TUI MVP for monitoring deployments #4296

feat: Add Dynamo TUI MVP for monitoring deployments #4296

Uh oh!

AsadShahid04 commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 19, 2025

Uh oh!

coderabbitai bot Nov 19, 2025

Uh oh!

coderabbitai bot Nov 19, 2025

Uh oh!

dagil-nvidia commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add Dynamo TUI MVP for monitoring deployments #4296

Are you sure you want to change the base?

feat: Add Dynamo TUI MVP for monitoring deployments #4296

Uh oh!

Conversation

AsadShahid04 commented Nov 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Features Implemented

Core TUI (#1453)

Health Monitoring (#1454)

NATS Monitoring (#1455)

Metrics Display (#1456)

Changes

Usage

Native Build

Docker (Recommended for macOS)

Keyboard Shortcuts

Testing

Related Issues

Reference

Notes

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 19, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

dagil-nvidia commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AsadShahid04 commented Nov 13, 2025 •

edited by coderabbitai bot

Loading