fix: Interactive inputs actually stops, does not ignore stop token #3057

grahamking · 2025-09-16T15:40:40Z

Removes the echo_core engine which relied on ignoring stop tokens, so that we only have a single echo engine. That's easier to explain.

echo_core would have been useful for debugging template issues but in practice a tracing::debug! statement is just as useful and simpler to use.

Summary by CodeRabbit

Refactor
- Consolidated echo engines into a single “echo” engine for simpler selection. Existing “echo_full” inputs remain compatible as an alias. Engine listings now show only “echo.”
Documentation
- Updated guides and examples to use “echo” throughout.
- Clarified Echo configuration, including DYN_TOKEN_ECHO_DELAY_MS (default ~10ms/token ≈100 tokens/s).
- Removed obsolete examples and options related to the previous split (e.g., ignore_eos).

Closes #2918 Removes the `echo_core` engine which relied on ignoring stop tokens, so that we only have a single `echo` engine. That's easier to explain. `echo_core` would have been useful for debugging template issues but in practice a `tracing::debug!` statement is just as useful and simpler to use. Signed-off-by: Graham King <[email protected]>

copy-pr-bot · 2025-09-16T15:40:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

grahamking · 2025-09-16T15:42:54Z

/ok to test 9c30fad

coderabbitai · 2025-09-16T15:49:10Z

Walkthrough

Consolidates two echo engines into a single echo engine across docs, CLI, Rust core, and Python bindings. Removes NvExt usage in text input. Updates engine construction to use make_echo_engine. Adjusts enums and parsing/printing to a single Echo variant. Minor doc/comment updates.

Changes

Cohort / File(s)	Summary
Docs: unify echo engine `docs/guides/dynamo_run.md`	Replace echo_full/echo_core with a single echo engine; update examples, env var references, and descriptions.
CLI options and validation `launch/dynamo-run/src/opt.rs`, `launch/dynamo-run/src/flags.rs`, `launch/dynamo-run/src/lib.rs`	Remove `EchoFull`/`EchoCore` variants; add `Echo` variant. Update parsing, display, available engines, validation, and engine selection to route Echo to `make_echo_engine()` with `EngineConfig::StaticFull`.
LLM engine consolidation `lib/llm/src/engines.rs`	Merge EchoEngineCore/EchoEngineFull into a single `EchoEngine`. Remove `make_engine_full()`/delta_core; add `make_echo_engine()` returning `Arc<dyn StreamingEngine>`. Adapt chat/completion streaming to per-character tokens with delay; embedding remains unimplemented.
Python bindings update `lib/bindings/python/rust/llm/entrypoint.rs`	Switch Echo engine constructor to `make_echo_engine()` in the StaticFull path.
Request input cleanup `lib/llm/src/entrypoint/input/text.rs`	Remove NvExt import and usage; stop setting `ignore_eos`; build requests without NvExt.
Comment tidy `lib/llm/src/local_model.rs`	Update comment to reference unified echo engine.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant DR as dynamo-run (CLI)
  participant EP as LLM Entrypoint
  participant ENG as EchoEngine (unified)
  participant STR as Stream to Client

  U->>DR: run with --out=echo
  DR->>EP: EngineConfig::StaticFull(Echo)
  EP->>ENG: make_echo_engine()
  Note over ENG: Unified echo engine<br/>per-char streaming with delay
  U->>DR: Prompt input
  DR->>ENG: Chat/Completion request
  loop for each character
    ENG-->>STR: stream token (char)
  end
  ENG-->>STR: finish (Stop)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(python): Python bindings for the Dynamo CLI tools #1799 — Adjusts Python/Rust entrypoint to expose/use EngineType::Echo; overlaps with echo engine selection changes here.
refactor: refactored using Choice and CompletionFinishReason #1635 — Modifies echo engine implementation and finish-reason handling; related to the unified echo engine updates.
feat: align OpenAI response IDs with distributed trace IDs #2496 — Passes context/request ID into response generators; adjacent to the engine streaming flow touched in this PR.

Poem

A rabbit taps return with glee,
One echo now—unified, free.
Streams of chars hop, hop, hop,
Start to finish, cleanly stop.
Docs in tune, flags align—
Thump goes code in perfect time. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	Most edits are in-scope for fixing stop-token behavior, but I detected a potentially out-of-scope, breaking public API change: lib/llm/src/engines.rs adds pub fn make_echo_engine() while removing pub fn make_engine_full(), which may break external callers and is not documented in the PR description. Documentation and the PR body do not call out this API-level removal or provide migration guidance. Because public API removals impact downstream consumers, this should be considered out-of-scope unless explicitly intended and communicated.	Document the API change in the PR and changelog and either provide a compatibility shim (reintroduce make_engine_full that forwards to make_echo_engine) or clearly describe migration steps for downstream users; also add a CI/smoke test that exercises the public engine constructor to detect breakage. Once the compatibility or documentation is added, the out-of-scope/backwards-compatibility concern will be mitigated.
Description Check	⚠️ Warning	The PR description states the intent and references the linked issue (Closes #2918) and the removal of echo_core, but it does not follow the repository's required template: it lacks explicit 'Overview', 'Details', and 'Where should the reviewer start?' sections and does not enumerate key changed files or any migration notes. Because those required template sections are missing, the description is incomplete for a thorough review. The existing short paragraph is helpful but insufficient against the repository's template expectations.	Please update the PR description to follow the repository template by adding an 'Overview' summary, a 'Details' section listing important code changes and files, and a 'Where should the reviewer start?' section that highlights the main files to review (for example lib/llm/src/engines.rs, launch/dynamo-run/src/opt.rs, and lib/llm/src/entrypoint/input/text.rs). Also explicitly call out any breaking API changes or migration steps (e.g., removal/renaming of engine constructors) and include short testing/verification instructions. This will speed review and ensure downstream consumers are informed.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title directly references the bug being fixed—interactive inputs stopping and stop-token handling—and maps to the changes that remove the echo_core path and unify the echo engine. It is concise and highlights the primary user-facing behavior corrected. The wording has a minor grammatical issue (“inputs actually stops”) but that does not make it unrelated to the changeset.
Linked Issues Check	✅ Passed	The changes align with the objectives of linked issue [#2918]: the PR removes the echo_core path that ignored stop tokens, removes NvExt ignore_eos usage, and consolidates to a single echo engine that streams output and emits a Stop finish reason. File-level summaries (engines.rs, entrypoint/input/text.rs, opt.rs, and launch/dynamo-run updates) show coordinated changes across the stack to ensure stop-token handling is respected. Therefore the code changes appear to directly address and resolve the interactive frontend stop behavior reported in the issue.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

launch/dynamo-run/src/lib.rs (2)
198-225: Build breaker: references to removed Output::EchoFull

gguf_default() and safetensors_default() still return Output::EchoFull, which no longer exists. Replace with Output::Echo.
 fn gguf_default() -> Output {
@@
-    #[cfg(not(any(feature = "mistralrs", feature = "llamacpp")))]
-    {
-        Output::EchoFull
-    }
+    #[cfg(not(any(feature = "mistralrs", feature = "llamacpp")))]
+    {
+        Output::Echo
+    }
 }
 
 fn safetensors_default() -> Output {
@@
-    #[cfg(not(feature = "mistralrs"))]
-    {
-        Output::EchoFull
-    }
+    #[cfg(not(feature = "mistralrs"))]
+    {
+        Output::Echo
+    }
71-76: Remove/update stale echo_ references found by repo scan*

Occurrences in launch/dynamo-run/src/lib.rs:

lines 41–46: "echo" | "echo_full" => Ok(Output::Echo)

lines 208–214: Output::EchoFull (cfg:not(any(feature = "mistralrs", feature = "llamacpp")))

lines 220–225: Output::EchoFull (cfg:not(feature = "mistralrs"))

Replace or consolidate these to the canonical API (or adjust feature cfgs/docs/tests) to avoid build/runtime surprises.

🧹 Nitpick comments (4)

lib/llm/src/entrypoint/input/text.rs (1)

142-145: Avoid double-check + unwrap on finish_reason

Minor tidy: match the finish reason instead of is_some() then unwrap().

-                    if chat_comp.finish_reason.is_some() {
-                        tracing::trace!("finish reason: {:?}", chat_comp.finish_reason.unwrap());
-                        break;
-                    }
+                    if let Some(reason) = chat_comp.finish_reason {
+                        tracing::trace!("finish reason: {:?}", reason);
+                        break;
+                    }

lib/llm/src/engines.rs (1)

141-154: Guard against empty message list to prevent panic

next_back().unwrap() will panic if messages is empty. Return a clean error instead.

-        let req = request.inner.messages.into_iter().next_back().unwrap();
+        let Some(req) = request.inner.messages.into_iter().next_back() else {
+            return Err(anyhow::anyhow!("Empty chat messages in request").into());
+        };

launch/dynamo-run/src/opt.rs (1)

43-45: Consider aliasing “echo_core” to preserve CLI compatibility

Mapping "echo_core" → Output::Echo avoids unnecessary breakage for existing scripts.
-            "echo" | "echo_full" => Ok(Output::Echo),
+            "echo" | "echo_full" | "echo_core" => Ok(Output::Echo),

docs/guides/dynamo_run.md (1)

321-334: Fix markdownlint: specify language for fenced code blocks

Add bash to the two code fences to satisfy MD040.
-```
+```bash
 dynamo-run in=http out=echo --model-name my_model
-```
+```

-```
+```bash
 # Set token echo delay to 1ms (1000 tokens per second)
 DYN_TOKEN_ECHO_DELAY_MS=1 dynamo-run in=http out=echo
-```
+```

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa80ac4 and 9c30fad.

⛔ Files ignored due to path filters (1)

lib/bindings/python/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

docs/guides/dynamo_run.md (1 hunks)
launch/dynamo-run/src/flags.rs (1 hunks)
launch/dynamo-run/src/lib.rs (1 hunks)
launch/dynamo-run/src/opt.rs (4 hunks)
lib/bindings/python/rust/llm/entrypoint.rs (1 hunks)
lib/llm/src/engines.rs (5 hunks)
lib/llm/src/entrypoint/input/text.rs (1 hunks)
lib/llm/src/local_model.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-25T22:04:45.205Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/entrypoint/input/text.rs

🧬 Code graph analysis (2)

lib/bindings/python/rust/llm/entrypoint.rs (1)

lib/llm/src/engines.rs (1)

make_echo_engine (120-124)

launch/dynamo-run/src/lib.rs (2)

lib/llm/src/engines.rs (5)

new (75-77)

new (87-89)

new (300-302)

new (324-326)

make_echo_engine (120-124)

lib/llm/src/entrypoint.rs (1)

local_model (66-74)

🪛 markdownlint-cli2 (0.17.2)

docs/guides/dynamo_run.md

325-325: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

331-331: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)

🔇 Additional comments (6)

lib/llm/src/local_model.rs (1)

213-213: Comment rename aligns with engine consolidation

Comment correctly reflects the unified echo engine.

lib/llm/src/entrypoint/input/text.rs (1)

110-114: Removing NvExt fixes “ignore_eos” and enables proper stop handling

Setting nvext: None removes the prior ignore_eos behavior. This should allow the interactive loop to see a finish_reason and stop.

Please verify with both a real backend (e.g., vLLM) and the echo engine:

Run python -m dynamo.frontend --interactive, enter a short prompt, confirm it stops after the model’s stop token.

Pipe a single prompt (non-interactive) and confirm process exits promptly.

lib/llm/src/engines.rs (1)

120-124: Unified echo engine factory looks good

make_echo_engine() cleanly wraps EchoEngine with the dispatcher.

launch/dynamo-run/src/flags.rs (1)

207-207: Echo validation path OK

No additional validation needed for the unified echo engine.

lib/bindings/python/rust/llm/entrypoint.rs (1)

221-224: Python binding correctly switches to make_echo_engine()

Matches the Rust-side consolidation.

launch/dynamo-run/src/lib.rs (1)

94-117: Engine selection for Echo path is consistent

Echo routes to StaticFull with make_echo_engine(). Looks correct.

Signed-off-by: Graham King <[email protected]>

grahamking · 2025-09-16T16:00:20Z

/ok to test 1c647ae

Signed-off-by: Graham King <[email protected]>

grahamking · 2025-09-16T16:39:22Z

/ok to test f1947d7

lib/runtime/src/system_status_server.rs

…3057) Signed-off-by: Graham King <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>

grahamking requested a review from a team as a code owner September 16, 2025 15:40

pull-request-size bot added the size/L label Sep 16, 2025

github-actions bot added the fix label Sep 16, 2025

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

grahamking added 2 commits September 16, 2025 11:53

Fix clippy

76d1ec4

Signed-off-by: Graham King <[email protected]>

Thank you Code Rabbit.

1c647ae

Signed-off-by: Graham King <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 16:00 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 16:05 Inactive

Fix tests

f1947d7

Signed-off-by: Graham King <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 16:39 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 16:40 Inactive

rmccorm4 reviewed Sep 16, 2025

View reviewed changes

lib/runtime/src/system_status_server.rs Show resolved Hide resolved

rmccorm4 approved these changes Sep 16, 2025

View reviewed changes

grahamking merged commit 87e6e05 into main Sep 16, 2025
17 of 18 checks passed

grahamking deleted the gk-fix-interactive-stop branch September 16, 2025 18:39

kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025

fix: Interactive inputs actually stops, does not ignore stop token (#…

919538d

…3057) Signed-off-by: Graham King <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Interactive inputs actually stops, does not ignore stop token #3057

fix: Interactive inputs actually stops, does not ignore stop token #3057

Uh oh!

grahamking commented Sep 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 16, 2025

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Interactive inputs actually stops, does not ignore stop token #3057

fix: Interactive inputs actually stops, does not ignore stop token #3057

Uh oh!

Conversation

grahamking commented Sep 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 16, 2025

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grahamking commented Sep 16, 2025 •

edited by coderabbitai bot

Loading