Skip to content

Conversation

@grahamking
Copy link
Contributor

@grahamking grahamking commented Sep 16, 2025

Closes #2918

Removes the echo_core engine which relied on ignoring stop tokens, so that we only have a single echo engine. That's easier to explain.

echo_core would have been useful for debugging template issues but in practice a tracing::debug! statement is just as useful and simpler to use.

Summary by CodeRabbit

  • Refactor
    • Consolidated echo engines into a single “echo” engine for simpler selection. Existing “echo_full” inputs remain compatible as an alias. Engine listings now show only “echo.”
  • Documentation
    • Updated guides and examples to use “echo” throughout.
    • Clarified Echo configuration, including DYN_TOKEN_ECHO_DELAY_MS (default ~10ms/token ≈100 tokens/s).
    • Removed obsolete examples and options related to the previous split (e.g., ignore_eos).

Closes #2918

Removes the `echo_core` engine which relied on ignoring stop tokens, so that we only
have a single `echo` engine. That's easier to explain.

`echo_core` would have been useful for debugging template issues but in
practice a `tracing::debug!` statement is just as useful and simpler to
use.

Signed-off-by: Graham King <[email protected]>
@grahamking grahamking requested a review from a team as a code owner September 16, 2025 15:40
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@grahamking
Copy link
Contributor Author

/ok to test 9c30fad

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 16, 2025

Walkthrough

Consolidates two echo engines into a single echo engine across docs, CLI, Rust core, and Python bindings. Removes NvExt usage in text input. Updates engine construction to use make_echo_engine. Adjusts enums and parsing/printing to a single Echo variant. Minor doc/comment updates.

Changes

Cohort / File(s) Summary
Docs: unify echo engine
docs/guides/dynamo_run.md
Replace echo_full/echo_core with a single echo engine; update examples, env var references, and descriptions.
CLI options and validation
launch/dynamo-run/src/opt.rs, launch/dynamo-run/src/flags.rs, launch/dynamo-run/src/lib.rs
Remove EchoFull/EchoCore variants; add Echo variant. Update parsing, display, available engines, validation, and engine selection to route Echo to make_echo_engine() with EngineConfig::StaticFull.
LLM engine consolidation
lib/llm/src/engines.rs
Merge EchoEngineCore/EchoEngineFull into a single EchoEngine. Remove make_engine_full()/delta_core; add make_echo_engine() returning Arc<dyn StreamingEngine>. Adapt chat/completion streaming to per-character tokens with delay; embedding remains unimplemented.
Python bindings update
lib/bindings/python/rust/llm/entrypoint.rs
Switch Echo engine constructor to make_echo_engine() in the StaticFull path.
Request input cleanup
lib/llm/src/entrypoint/input/text.rs
Remove NvExt import and usage; stop setting ignore_eos; build requests without NvExt.
Comment tidy
lib/llm/src/local_model.rs
Update comment to reference unified echo engine.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant DR as dynamo-run (CLI)
  participant EP as LLM Entrypoint
  participant ENG as EchoEngine (unified)
  participant STR as Stream to Client

  U->>DR: run with --out=echo
  DR->>EP: EngineConfig::StaticFull(Echo)
  EP->>ENG: make_echo_engine()
  Note over ENG: Unified echo engine<br/>per-char streaming with delay
  U->>DR: Prompt input
  DR->>ENG: Chat/Completion request
  loop for each character
    ENG-->>STR: stream token (char)
  end
  ENG-->>STR: finish (Stop)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

A rabbit taps return with glee,
One echo now—unified, free.
Streams of chars hop, hop, hop,
Start to finish, cleanly stop.
Docs in tune, flags align—
Thump goes code in perfect time. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning Most edits are in-scope for fixing stop-token behavior, but I detected a potentially out-of-scope, breaking public API change: lib/llm/src/engines.rs adds pub fn make_echo_engine() while removing pub fn make_engine_full(), which may break external callers and is not documented in the PR description. Documentation and the PR body do not call out this API-level removal or provide migration guidance. Because public API removals impact downstream consumers, this should be considered out-of-scope unless explicitly intended and communicated. Document the API change in the PR and changelog and either provide a compatibility shim (reintroduce make_engine_full that forwards to make_echo_engine) or clearly describe migration steps for downstream users; also add a CI/smoke test that exercises the public engine constructor to detect breakage. Once the compatibility or documentation is added, the out-of-scope/backwards-compatibility concern will be mitigated.
Description Check ⚠️ Warning The PR description states the intent and references the linked issue (Closes #2918) and the removal of echo_core, but it does not follow the repository's required template: it lacks explicit 'Overview', 'Details', and 'Where should the reviewer start?' sections and does not enumerate key changed files or any migration notes. Because those required template sections are missing, the description is incomplete for a thorough review. The existing short paragraph is helpful but insufficient against the repository's template expectations. Please update the PR description to follow the repository template by adding an 'Overview' summary, a 'Details' section listing important code changes and files, and a 'Where should the reviewer start?' section that highlights the main files to review (for example lib/llm/src/engines.rs, launch/dynamo-run/src/opt.rs, and lib/llm/src/entrypoint/input/text.rs). Also explicitly call out any breaking API changes or migration steps (e.g., removal/renaming of engine constructors) and include short testing/verification instructions. This will speed review and ensure downstream consumers are informed.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title directly references the bug being fixed—interactive inputs stopping and stop-token handling—and maps to the changes that remove the echo_core path and unify the echo engine. It is concise and highlights the primary user-facing behavior corrected. The wording has a minor grammatical issue (“inputs actually stops”) but that does not make it unrelated to the changeset.
Linked Issues Check ✅ Passed The changes align with the objectives of linked issue [#2918]: the PR removes the echo_core path that ignored stop tokens, removes NvExt ignore_eos usage, and consolidates to a single echo engine that streams output and emits a Stop finish reason. File-level summaries (engines.rs, entrypoint/input/text.rs, opt.rs, and launch/dynamo-run updates) show coordinated changes across the stack to ensure stop-token handling is respected. Therefore the code changes appear to directly address and resolve the interactive frontend stop behavior reported in the issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
launch/dynamo-run/src/lib.rs (2)

198-225: Build breaker: references to removed Output::EchoFull

gguf_default() and safetensors_default() still return Output::EchoFull, which no longer exists. Replace with Output::Echo.

 fn gguf_default() -> Output {
@@
-    #[cfg(not(any(feature = "mistralrs", feature = "llamacpp")))]
-    {
-        Output::EchoFull
-    }
+    #[cfg(not(any(feature = "mistralrs", feature = "llamacpp")))]
+    {
+        Output::Echo
+    }
 }
 
 fn safetensors_default() -> Output {
@@
-    #[cfg(not(feature = "mistralrs"))]
-    {
-        Output::EchoFull
-    }
+    #[cfg(not(feature = "mistralrs"))]
+    {
+        Output::Echo
+    }

71-76: Remove/update stale echo_ references found by repo scan*

Occurrences in launch/dynamo-run/src/lib.rs:

  • lines 41–46: "echo" | "echo_full" => Ok(Output::Echo)
  • lines 208–214: Output::EchoFull (cfg:not(any(feature = "mistralrs", feature = "llamacpp")))
  • lines 220–225: Output::EchoFull (cfg:not(feature = "mistralrs"))

Replace or consolidate these to the canonical API (or adjust feature cfgs/docs/tests) to avoid build/runtime surprises.

🧹 Nitpick comments (4)
lib/llm/src/entrypoint/input/text.rs (1)

142-145: Avoid double-check + unwrap on finish_reason

Minor tidy: match the finish reason instead of is_some() then unwrap().

-                    if chat_comp.finish_reason.is_some() {
-                        tracing::trace!("finish reason: {:?}", chat_comp.finish_reason.unwrap());
-                        break;
-                    }
+                    if let Some(reason) = chat_comp.finish_reason {
+                        tracing::trace!("finish reason: {:?}", reason);
+                        break;
+                    }
lib/llm/src/engines.rs (1)

141-154: Guard against empty message list to prevent panic

next_back().unwrap() will panic if messages is empty. Return a clean error instead.

-        let req = request.inner.messages.into_iter().next_back().unwrap();
+        let Some(req) = request.inner.messages.into_iter().next_back() else {
+            return Err(anyhow::anyhow!("Empty chat messages in request").into());
+        };
launch/dynamo-run/src/opt.rs (1)

43-45: Consider aliasing “echo_core” to preserve CLI compatibility

Mapping "echo_core"Output::Echo avoids unnecessary breakage for existing scripts.

-            "echo" | "echo_full" => Ok(Output::Echo),
+            "echo" | "echo_full" | "echo_core" => Ok(Output::Echo),
docs/guides/dynamo_run.md (1)

321-334: Fix markdownlint: specify language for fenced code blocks

Add bash to the two code fences to satisfy MD040.

-```
+```bash
 dynamo-run in=http out=echo --model-name my_model
-```
+```

-```
+```bash
 # Set token echo delay to 1ms (1000 tokens per second)
 DYN_TOKEN_ECHO_DELAY_MS=1 dynamo-run in=http out=echo
-```
+```
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa80ac4 and 9c30fad.

⛔ Files ignored due to path filters (1)
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • docs/guides/dynamo_run.md (1 hunks)
  • launch/dynamo-run/src/flags.rs (1 hunks)
  • launch/dynamo-run/src/lib.rs (1 hunks)
  • launch/dynamo-run/src/opt.rs (4 hunks)
  • lib/bindings/python/rust/llm/entrypoint.rs (1 hunks)
  • lib/llm/src/engines.rs (5 hunks)
  • lib/llm/src/entrypoint/input/text.rs (1 hunks)
  • lib/llm/src/local_model.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

  • lib/llm/src/entrypoint/input/text.rs
🧬 Code graph analysis (2)
lib/bindings/python/rust/llm/entrypoint.rs (1)
lib/llm/src/engines.rs (1)
  • make_echo_engine (120-124)
launch/dynamo-run/src/lib.rs (2)
lib/llm/src/engines.rs (5)
  • new (75-77)
  • new (87-89)
  • new (300-302)
  • new (324-326)
  • make_echo_engine (120-124)
lib/llm/src/entrypoint.rs (1)
  • local_model (66-74)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md

325-325: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


331-331: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (6)
lib/llm/src/local_model.rs (1)

213-213: Comment rename aligns with engine consolidation

Comment correctly reflects the unified echo engine.

lib/llm/src/entrypoint/input/text.rs (1)

110-114: Removing NvExt fixes “ignore_eos” and enables proper stop handling

Setting nvext: None removes the prior ignore_eos behavior. This should allow the interactive loop to see a finish_reason and stop.

Please verify with both a real backend (e.g., vLLM) and the echo engine:

  1. Run python -m dynamo.frontend --interactive, enter a short prompt, confirm it stops after the model’s stop token.
  2. Pipe a single prompt (non-interactive) and confirm process exits promptly.
lib/llm/src/engines.rs (1)

120-124: Unified echo engine factory looks good

make_echo_engine() cleanly wraps EchoEngine with the dispatcher.

launch/dynamo-run/src/flags.rs (1)

207-207: Echo validation path OK

No additional validation needed for the unified echo engine.

lib/bindings/python/rust/llm/entrypoint.rs (1)

221-224: Python binding correctly switches to make_echo_engine()

Matches the Rust-side consolidation.

launch/dynamo-run/src/lib.rs (1)

94-117: Engine selection for Echo path is consistent

Echo routes to StaticFull with make_echo_engine(). Looks correct.

Signed-off-by: Graham King <[email protected]>
Signed-off-by: Graham King <[email protected]>
@grahamking
Copy link
Contributor Author

/ok to test 1c647ae

Signed-off-by: Graham King <[email protected]>
@grahamking
Copy link
Contributor Author

/ok to test f1947d7

@grahamking grahamking merged commit 87e6e05 into main Sep 16, 2025
17 of 18 checks passed
@grahamking grahamking deleted the gk-fix-interactive-stop branch September 16, 2025 18:39
kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Interactive frontend cannot stop the output

3 participants