-
Notifications
You must be signed in to change notification settings - Fork 751
fix: Interactive inputs actually stops, does not ignore stop token #3057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Closes #2918 Removes the `echo_core` engine which relied on ignoring stop tokens, so that we only have a single `echo` engine. That's easier to explain. `echo_core` would have been useful for debugging template issues but in practice a `tracing::debug!` statement is just as useful and simpler to use. Signed-off-by: Graham King <[email protected]>
|
/ok to test 9c30fad |
WalkthroughConsolidates two echo engines into a single echo engine across docs, CLI, Rust core, and Python bindings. Removes NvExt usage in text input. Updates engine construction to use make_echo_engine. Adjusts enums and parsing/printing to a single Echo variant. Minor doc/comment updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant DR as dynamo-run (CLI)
participant EP as LLM Entrypoint
participant ENG as EchoEngine (unified)
participant STR as Stream to Client
U->>DR: run with --out=echo
DR->>EP: EngineConfig::StaticFull(Echo)
EP->>ENG: make_echo_engine()
Note over ENG: Unified echo engine<br/>per-char streaming with delay
U->>DR: Prompt input
DR->>ENG: Chat/Completion request
loop for each character
ENG-->>STR: stream token (char)
end
ENG-->>STR: finish (Stop)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
launch/dynamo-run/src/lib.rs (2)
198-225: Build breaker: references to removed Output::EchoFull
gguf_default()andsafetensors_default()still returnOutput::EchoFull, which no longer exists. Replace withOutput::Echo.fn gguf_default() -> Output { @@ - #[cfg(not(any(feature = "mistralrs", feature = "llamacpp")))] - { - Output::EchoFull - } + #[cfg(not(any(feature = "mistralrs", feature = "llamacpp")))] + { + Output::Echo + } } fn safetensors_default() -> Output { @@ - #[cfg(not(feature = "mistralrs"))] - { - Output::EchoFull - } + #[cfg(not(feature = "mistralrs"))] + { + Output::Echo + }
71-76: Remove/update stale echo_ references found by repo scan*Occurrences in launch/dynamo-run/src/lib.rs:
- lines 41–46:
"echo" | "echo_full" => Ok(Output::Echo)- lines 208–214:
Output::EchoFull(cfg:not(any(feature = "mistralrs", feature = "llamacpp")))- lines 220–225:
Output::EchoFull(cfg:not(feature = "mistralrs"))Replace or consolidate these to the canonical API (or adjust feature cfgs/docs/tests) to avoid build/runtime surprises.
🧹 Nitpick comments (4)
lib/llm/src/entrypoint/input/text.rs (1)
142-145: Avoid double-check + unwrap on finish_reasonMinor tidy: match the finish reason instead of
is_some()thenunwrap().- if chat_comp.finish_reason.is_some() { - tracing::trace!("finish reason: {:?}", chat_comp.finish_reason.unwrap()); - break; - } + if let Some(reason) = chat_comp.finish_reason { + tracing::trace!("finish reason: {:?}", reason); + break; + }lib/llm/src/engines.rs (1)
141-154: Guard against empty message list to prevent panic
next_back().unwrap()will panic ifmessagesis empty. Return a clean error instead.- let req = request.inner.messages.into_iter().next_back().unwrap(); + let Some(req) = request.inner.messages.into_iter().next_back() else { + return Err(anyhow::anyhow!("Empty chat messages in request").into()); + };launch/dynamo-run/src/opt.rs (1)
43-45: Consider aliasing “echo_core” to preserve CLI compatibilityMapping
"echo_core"→Output::Echoavoids unnecessary breakage for existing scripts.- "echo" | "echo_full" => Ok(Output::Echo), + "echo" | "echo_full" | "echo_core" => Ok(Output::Echo),docs/guides/dynamo_run.md (1)
321-334: Fix markdownlint: specify language for fenced code blocksAdd
bashto the two code fences to satisfy MD040.-``` +```bash dynamo-run in=http out=echo --model-name my_model -``` +``` -``` +```bash # Set token echo delay to 1ms (1000 tokens per second) DYN_TOKEN_ECHO_DELAY_MS=1 dynamo-run in=http out=echo -``` +```
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
lib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
docs/guides/dynamo_run.md(1 hunks)launch/dynamo-run/src/flags.rs(1 hunks)launch/dynamo-run/src/lib.rs(1 hunks)launch/dynamo-run/src/opt.rs(4 hunks)lib/bindings/python/rust/llm/entrypoint.rs(1 hunks)lib/llm/src/engines.rs(5 hunks)lib/llm/src/entrypoint/input/text.rs(1 hunks)lib/llm/src/local_model.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.
Applied to files:
lib/llm/src/entrypoint/input/text.rs
🧬 Code graph analysis (2)
lib/bindings/python/rust/llm/entrypoint.rs (1)
lib/llm/src/engines.rs (1)
make_echo_engine(120-124)
launch/dynamo-run/src/lib.rs (2)
lib/llm/src/engines.rs (5)
new(75-77)new(87-89)new(300-302)new(324-326)make_echo_engine(120-124)lib/llm/src/entrypoint.rs (1)
local_model(66-74)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md
325-325: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
331-331: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (6)
lib/llm/src/local_model.rs (1)
213-213: Comment rename aligns with engine consolidationComment correctly reflects the unified echo engine.
lib/llm/src/entrypoint/input/text.rs (1)
110-114: Removing NvExt fixes “ignore_eos” and enables proper stop handlingSetting
nvext: Noneremoves the priorignore_eosbehavior. This should allow the interactive loop to see afinish_reasonand stop.Please verify with both a real backend (e.g., vLLM) and the echo engine:
- Run
python -m dynamo.frontend --interactive, enter a short prompt, confirm it stops after the model’s stop token.- Pipe a single prompt (non-interactive) and confirm process exits promptly.
lib/llm/src/engines.rs (1)
120-124: Unified echo engine factory looks good
make_echo_engine()cleanly wrapsEchoEnginewith the dispatcher.launch/dynamo-run/src/flags.rs (1)
207-207: Echo validation path OKNo additional validation needed for the unified echo engine.
lib/bindings/python/rust/llm/entrypoint.rs (1)
221-224: Python binding correctly switches to make_echo_engine()Matches the Rust-side consolidation.
launch/dynamo-run/src/lib.rs (1)
94-117: Engine selection for Echo path is consistentEcho routes to
StaticFullwithmake_echo_engine(). Looks correct.
Signed-off-by: Graham King <[email protected]>
Signed-off-by: Graham King <[email protected]>
|
/ok to test 1c647ae |
Signed-off-by: Graham King <[email protected]>
|
/ok to test f1947d7 |
…3057) Signed-off-by: Graham King <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>
Closes #2918
Removes the
echo_coreengine which relied on ignoring stop tokens, so that we only have a singleechoengine. That's easier to explain.echo_corewould have been useful for debugging template issues but in practice atracing::debug!statement is just as useful and simpler to use.Summary by CodeRabbit