-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #17707: common: Deepseek V3.2 tool call parser #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
UPSTREAM PR #17707: common: Deepseek V3.2 tool call parser #405
Conversation
Removed TODO comment about untested tool call feature.
|
Explore the complete analysis inside the Version Insights Pull Request #405 - Performance Analysis SummaryPR Title: UPSTREAM PR #17707: common: Deepseek V3.2 tool call parser Analysis Classification: Condition 1 (No Performance Impact)This PR introduces a new chat template parser for DeepSeek V3.2 model with XML-based tool call format. The code changes add new parsing logic and test coverage without modifying existing inference paths. Performance Impact Assessment: The performance metrics show variations in STL container operations (vector::end(), map::begin()) with throughput changes ranging from 60-195 ns. However, these functions are not in the inference critical path. The actual changes are:
Inference Impact: None. The modified functions (chat parsers, template renderers) execute before model inference begins. Functions like The observed STL performance variations are compiler optimization artifacts unrelated to the functional changes, which purely extend chat template parsing capabilities for a new model format. |
738bfbf to
f01b714
Compare
f72076f to
3f5e1ff
Compare
|
Explore the complete analysis inside the Version Insights |
78ff3d3 to
117bfc3
Compare
Mirrored from ggml-org/llama.cpp#17707
Deepseek V3.2 uses a new tool-call format like this:
This PR introduces the tool-call parser for the new DeepSeek V3.2 model.
Since the official release does not provide a chat template, a provisional template has been added and tested only with llama.cpp. Compatibility with other inference engines is not guaranteed and may require further adjustments.
In addition, Minja polyfill detection has been slightly updated to accommodate the new template structure.
Needs PR #17376 to be merged first.