feat: add option to use vllm tokenizer #4850

nvrohanv · 2025-12-10T10:15:23Z

Overview:

Details: Some models like `mistralai/Mistral-Small-3.2-24B-Instruct-2506` use a different tokenizer format (tekken) which is currently not supported. However vllm does support it so while we should add support long term in rust in the short term we should allow users to work around by just using the vllm tokenizer. This PR adds that option.

Summary by CodeRabbit

New Features
- Added --use-vllm-tokenizer CLI flag to optionally use vLLM's tokenizer.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T10:18:27Z

Walkthrough

A new vLLM tokenizer mode is introduced via a boolean configuration flag and CLI argument. The main module conditionally switches between Text and Tokens ModelInput types based on this flag during registration.

Changes

Cohort / File(s)	Summary
Configuration & CLI Setup `components/src/dynamo/vllm/args.py`	Added `use_vllm_tokenizer` boolean field to Config (default: False) and registered corresponding CLI flag `--use-vllm-tokenizer` with help text
Tokenizer Mode Logic `components/src/dynamo/vllm/main.py`	Conditionally select ModelInput type (Text if vLLM tokenizer enabled, Tokens otherwise) during prefill/chat registration; emit logging to indicate active tokenizer mode

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Verify CLI flag and config field are properly wired together
Confirm ModelInput type selection logic aligns with intended tokenizer mode behavior in both prefill and chat paths
Check that logging statements accurately reflect the selected tokenizer configuration

Poem

🐰 A tokenizer flag hops into view,
Text or Tokens—the choice now is true!
vLLM's way or the classic dear song,
Configuration whispers which path to belong. ✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding an option to use the vllm tokenizer.
Description check	✅ Passed	The description includes Overview and Details sections with context about the problem and solution, but is missing the 'Where should the reviewer start?' and 'Related Issues' sections from the template.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

components/src/dynamo/vllm/args.py (1)
206-214: Clarify help text scope.

The help text states "only /v1/chat/completions endpoint will be available," but this limitation only applies to decode workers, not prefill workers (which serve different endpoints). Consider making this more precise.

Apply this diff to clarify:
         help=(
             "Use vLLM's built-in tokenizer instead of Dynamo's Rust tokenizer. "
             "This is required for models that use non-standard tokenizers (e.g., Mistral's tekken tokenizer). "
-            "When enabled, only /v1/chat/completions endpoint will be available."
+            "When enabled for decode workers, only /v1/chat/completions endpoint will be available."
         ),

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe10dbf and de37a4a.

📒 Files selected for processing (2)

components/src/dynamo/vllm/args.py (3 hunks)
components/src/dynamo/vllm/main.py (2 hunks)

🔇 Additional comments (4)

components/src/dynamo/vllm/args.py (2)

71-72: LGTM - Config field addition is clean.

The new boolean field with a safe default (False) follows the existing pattern in the Config class.

317-317: LGTM - Propagation follows the standard pattern.

The flag is correctly propagated from parsed arguments to the config object.

components/src/dynamo/vllm/main.py (2)

421-431: LGTM - Conditional tokenizer selection is implemented correctly for prefill.

The logic clearly switches between ModelInput.Text and ModelInput.Tokens based on the configuration flag, with appropriate logging to inform operators of the choice.

422-426: vLLM supports tekken tokenizer for this model.

Verification confirms vLLM includes built-in support for Mistral's Tekken tokenizer and explicitly supports mistralai/Mistral-Small-3.2-24B-Instruct-2506. The model's tokenizer files are included in the Hugging Face repository, and vLLM can load them with the appropriate tokenizer mode configuration. This workaround will function as intended.

coderabbitai · 2025-12-10T10:18:30Z

components/src/dynamo/vllm/main.py

+        # Determine model input type based on tokenizer configuration
+        if config.use_vllm_tokenizer:
+            model_input = ModelInput.Text
+            # When using vLLM's tokenizer, only Chat endpoint is supported
+            model_type = ModelType.Chat
+            logger.info(
+                "Using vLLM's built-in tokenizer (--use-vllm-tokenizer). "
+                "Only /v1/chat/completions endpoint will be available."
+            )
+        else:
+            model_input = ModelInput.Tokens
+


⚠️ Potential issue | 🟠 Major

Flag potential conflict with --dyn-endpoint-types flag.

When config.use_vllm_tokenizer is True, line 559 unconditionally sets model_type = ModelType.Chat, overriding the user's --dyn-endpoint-types choice parsed on line 543. If a user specifies --dyn-endpoint-types completions together with --use-vllm-tokenizer, their endpoint preference will be silently ignored.

Consider adding validation to detect this conflict and either:

Raise an error if incompatible flags are combined, or

Log a warning that --dyn-endpoint-types is being overridden

Example validation:

# Determine model input type based on tokenizer configuration if config.use_vllm_tokenizer: model_input = ModelInput.Text # When using vLLM's tokenizer, only Chat endpoint is supported if config.dyn_endpoint_types != "chat" and config.dyn_endpoint_types != "chat,completions": logger.warning( f"--use-vllm-tokenizer requires chat endpoint. " f"Overriding --dyn-endpoint-types={config.dyn_endpoint_types} to 'chat'." ) model_type = ModelType.Chat logger.info( "Using vLLM's built-in tokenizer (--use-vllm-tokenizer). " "Only /v1/chat/completions endpoint will be available." ) else: model_input = ModelInput.Tokens

🤖 Prompt for AI Agents

In components/src/dynamo/vllm/main.py around lines 555 to 566, the code unconditionally sets model_type = ModelType.Chat when config.use_vllm_tokenizer is True, silently overriding a user-provided --dyn-endpoint-types value; add validation to detect this conflict and either log a clear warning (stating that --dyn-endpoint-types is being overridden to chat) or raise an error for incompatible combination, then set model_type = ModelType.Chat only after handling the warning/error so the user is informed and behavior is explicit.

add option to use vllm tokenizer

de37a4a

nvrohanv requested review from a team as code owners December 10, 2025 10:15

pull-request-size bot added the size/M label Dec 10, 2025

github-actions bot added the feat label Dec 10, 2025

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add option to use vllm tokenizer #4850

feat: add option to use vllm tokenizer #4850

Uh oh!

nvrohanv commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 10, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add option to use vllm tokenizer #4850

Are you sure you want to change the base?

feat: add option to use vllm tokenizer #4850

Uh oh!

Conversation

nvrohanv commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 10, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nvrohanv commented Dec 10, 2025 •

edited by coderabbitai bot

Loading