-
Notifications
You must be signed in to change notification settings - Fork 738
feat: add option to use vllm tokenizer #4850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new vLLM tokenizer mode is introduced via a boolean configuration flag and CLI argument. The main module conditionally switches between Text and Tokens ModelInput types based on this flag during registration. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
components/src/dynamo/vllm/args.py (1)
206-214: Clarify help text scope.The help text states "only /v1/chat/completions endpoint will be available," but this limitation only applies to decode workers, not prefill workers (which serve different endpoints). Consider making this more precise.
Apply this diff to clarify:
help=( "Use vLLM's built-in tokenizer instead of Dynamo's Rust tokenizer. " "This is required for models that use non-standard tokenizers (e.g., Mistral's tekken tokenizer). " - "When enabled, only /v1/chat/completions endpoint will be available." + "When enabled for decode workers, only /v1/chat/completions endpoint will be available." ),
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/vllm/args.py(3 hunks)components/src/dynamo/vllm/main.py(2 hunks)
🔇 Additional comments (4)
components/src/dynamo/vllm/args.py (2)
71-72: LGTM - Config field addition is clean.The new boolean field with a safe default (False) follows the existing pattern in the Config class.
317-317: LGTM - Propagation follows the standard pattern.The flag is correctly propagated from parsed arguments to the config object.
components/src/dynamo/vllm/main.py (2)
421-431: LGTM - Conditional tokenizer selection is implemented correctly for prefill.The logic clearly switches between
ModelInput.TextandModelInput.Tokensbased on the configuration flag, with appropriate logging to inform operators of the choice.
422-426: vLLM supports tekken tokenizer for this model.Verification confirms vLLM includes built-in support for Mistral's Tekken tokenizer and explicitly supports mistralai/Mistral-Small-3.2-24B-Instruct-2506. The model's tokenizer files are included in the Hugging Face repository, and vLLM can load them with the appropriate tokenizer mode configuration. This workaround will function as intended.
| # Determine model input type based on tokenizer configuration | ||
| if config.use_vllm_tokenizer: | ||
| model_input = ModelInput.Text | ||
| # When using vLLM's tokenizer, only Chat endpoint is supported | ||
| model_type = ModelType.Chat | ||
| logger.info( | ||
| "Using vLLM's built-in tokenizer (--use-vllm-tokenizer). " | ||
| "Only /v1/chat/completions endpoint will be available." | ||
| ) | ||
| else: | ||
| model_input = ModelInput.Tokens | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flag potential conflict with --dyn-endpoint-types flag.
When config.use_vllm_tokenizer is True, line 559 unconditionally sets model_type = ModelType.Chat, overriding the user's --dyn-endpoint-types choice parsed on line 543. If a user specifies --dyn-endpoint-types completions together with --use-vllm-tokenizer, their endpoint preference will be silently ignored.
Consider adding validation to detect this conflict and either:
- Raise an error if incompatible flags are combined, or
- Log a warning that
--dyn-endpoint-typesis being overridden
Example validation:
# Determine model input type based on tokenizer configuration
if config.use_vllm_tokenizer:
model_input = ModelInput.Text
# When using vLLM's tokenizer, only Chat endpoint is supported
if config.dyn_endpoint_types != "chat" and config.dyn_endpoint_types != "chat,completions":
logger.warning(
f"--use-vllm-tokenizer requires chat endpoint. "
f"Overriding --dyn-endpoint-types={config.dyn_endpoint_types} to 'chat'."
)
model_type = ModelType.Chat
logger.info(
"Using vLLM's built-in tokenizer (--use-vllm-tokenizer). "
"Only /v1/chat/completions endpoint will be available."
)
else:
model_input = ModelInput.Tokens🤖 Prompt for AI Agents
In components/src/dynamo/vllm/main.py around lines 555 to 566, the code
unconditionally sets model_type = ModelType.Chat when config.use_vllm_tokenizer
is True, silently overriding a user-provided --dyn-endpoint-types value; add
validation to detect this conflict and either log a clear warning (stating that
--dyn-endpoint-types is being overridden to chat) or raise an error for
incompatible combination, then set model_type = ModelType.Chat only after
handling the warning/error so the user is informed and behavior is explicit.
Overview:
Details: Some models like
mistralai/Mistral-Small-3.2-24B-Instruct-2506use a different tokenizer format (tekken) which is currently not supported. However vllm does support it so while we should add support long term in rust in the short term we should allow users to work around by just using the vllm tokenizer. This PR adds that option.Summary by CodeRabbit
--use-vllm-tokenizerCLI flag to optionally use vLLM's tokenizer.✏️ Tip: You can customize this high-level summary in your review settings.