Skip to content

Conversation

@nvrohanv
Copy link
Contributor

@nvrohanv nvrohanv commented Dec 10, 2025

Overview:

Details: Some models like mistralai/Mistral-Small-3.2-24B-Instruct-2506 use a different tokenizer format (tekken) which is currently not supported. However vllm does support it so while we should add support long term in rust in the short term we should allow users to work around by just using the vllm tokenizer. This PR adds that option.

Summary by CodeRabbit

  • New Features
    • Added --use-vllm-tokenizer CLI flag to optionally use vLLM's tokenizer.

✏️ Tip: You can customize this high-level summary in your review settings.

@nvrohanv nvrohanv requested review from a team as code owners December 10, 2025 10:15
@github-actions github-actions bot added the feat label Dec 10, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Walkthrough

A new vLLM tokenizer mode is introduced via a boolean configuration flag and CLI argument. The main module conditionally switches between Text and Tokens ModelInput types based on this flag during registration.

Changes

Cohort / File(s) Summary
Configuration & CLI Setup
components/src/dynamo/vllm/args.py
Added use_vllm_tokenizer boolean field to Config (default: False) and registered corresponding CLI flag --use-vllm-tokenizer with help text
Tokenizer Mode Logic
components/src/dynamo/vllm/main.py
Conditionally select ModelInput type (Text if vLLM tokenizer enabled, Tokens otherwise) during prefill/chat registration; emit logging to indicate active tokenizer mode

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify CLI flag and config field are properly wired together
  • Confirm ModelInput type selection logic aligns with intended tokenizer mode behavior in both prefill and chat paths
  • Check that logging statements accurately reflect the selected tokenizer configuration

Poem

🐰 A tokenizer flag hops into view,
Text or Tokens—the choice now is true!
vLLM's way or the classic dear song,
Configuration whispers which path to belong. ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an option to use the vllm tokenizer.
Description check ✅ Passed The description includes Overview and Details sections with context about the problem and solution, but is missing the 'Where should the reviewer start?' and 'Related Issues' sections from the template.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
components/src/dynamo/vllm/args.py (1)

206-214: Clarify help text scope.

The help text states "only /v1/chat/completions endpoint will be available," but this limitation only applies to decode workers, not prefill workers (which serve different endpoints). Consider making this more precise.

Apply this diff to clarify:

         help=(
             "Use vLLM's built-in tokenizer instead of Dynamo's Rust tokenizer. "
             "This is required for models that use non-standard tokenizers (e.g., Mistral's tekken tokenizer). "
-            "When enabled, only /v1/chat/completions endpoint will be available."
+            "When enabled for decode workers, only /v1/chat/completions endpoint will be available."
         ),
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe10dbf and de37a4a.

📒 Files selected for processing (2)
  • components/src/dynamo/vllm/args.py (3 hunks)
  • components/src/dynamo/vllm/main.py (2 hunks)
🔇 Additional comments (4)
components/src/dynamo/vllm/args.py (2)

71-72: LGTM - Config field addition is clean.

The new boolean field with a safe default (False) follows the existing pattern in the Config class.


317-317: LGTM - Propagation follows the standard pattern.

The flag is correctly propagated from parsed arguments to the config object.

components/src/dynamo/vllm/main.py (2)

421-431: LGTM - Conditional tokenizer selection is implemented correctly for prefill.

The logic clearly switches between ModelInput.Text and ModelInput.Tokens based on the configuration flag, with appropriate logging to inform operators of the choice.


422-426: vLLM supports tekken tokenizer for this model.

Verification confirms vLLM includes built-in support for Mistral's Tekken tokenizer and explicitly supports mistralai/Mistral-Small-3.2-24B-Instruct-2506. The model's tokenizer files are included in the Hugging Face repository, and vLLM can load them with the appropriate tokenizer mode configuration. This workaround will function as intended.

Comment on lines +555 to +566
# Determine model input type based on tokenizer configuration
if config.use_vllm_tokenizer:
model_input = ModelInput.Text
# When using vLLM's tokenizer, only Chat endpoint is supported
model_type = ModelType.Chat
logger.info(
"Using vLLM's built-in tokenizer (--use-vllm-tokenizer). "
"Only /v1/chat/completions endpoint will be available."
)
else:
model_input = ModelInput.Tokens

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Flag potential conflict with --dyn-endpoint-types flag.

When config.use_vllm_tokenizer is True, line 559 unconditionally sets model_type = ModelType.Chat, overriding the user's --dyn-endpoint-types choice parsed on line 543. If a user specifies --dyn-endpoint-types completions together with --use-vllm-tokenizer, their endpoint preference will be silently ignored.

Consider adding validation to detect this conflict and either:

  1. Raise an error if incompatible flags are combined, or
  2. Log a warning that --dyn-endpoint-types is being overridden

Example validation:

# Determine model input type based on tokenizer configuration
if config.use_vllm_tokenizer:
    model_input = ModelInput.Text
    # When using vLLM's tokenizer, only Chat endpoint is supported
    if config.dyn_endpoint_types != "chat" and config.dyn_endpoint_types != "chat,completions":
        logger.warning(
            f"--use-vllm-tokenizer requires chat endpoint. "
            f"Overriding --dyn-endpoint-types={config.dyn_endpoint_types} to 'chat'."
        )
    model_type = ModelType.Chat
    logger.info(
        "Using vLLM's built-in tokenizer (--use-vllm-tokenizer). "
        "Only /v1/chat/completions endpoint will be available."
    )
else:
    model_input = ModelInput.Tokens
🤖 Prompt for AI Agents
In components/src/dynamo/vllm/main.py around lines 555 to 566, the code
unconditionally sets model_type = ModelType.Chat when config.use_vllm_tokenizer
is True, silently overriding a user-provided --dyn-endpoint-types value; add
validation to detect this conflict and either log a clear warning (stating that
--dyn-endpoint-types is being overridden to chat) or raise an error for
incompatible combination, then set model_type = ModelType.Chat only after
handling the warning/error so the user is informed and behavior is explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants