Skip to content

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 3, 2025

Mirrored from ggml-org/llama.cpp#17707

Deepseek V3.2 uses a new tool-call format like this:

<|DSML|function_calls>
<|DSML|invoke name="get_datetime">
<|DSML|parameter name="timezone" string="true">Asia/Shanghai</|DSML|parameter>
</|DSML|invoke>
</|DSML|function_calls>
<|DSML|function_calls>
<|DSML|invoke name="search">
<|DSML|parameter name="query" string="true">search agent benchmark 2024</|DSML|parameter>
<|DSML|parameter name="topn" string="false">10</|DSML|parameter>
<|DSML|parameter name="source" string="true">web</|DSML|parameter>
</|DSML|invoke>
<|DSML|invoke name="search">
<|DSML|parameter name="query" string="true">搜索智能体 基准测试</|DSML|parameter>
<|DSML|parameter name="topn" string="false">10</|DSML|parameter>
<|DSML|parameter name="source" string="true">web</|DSML|parameter>
</|DSML|invoke>
</|DSML|function_calls>

This PR introduces the tool-call parser for the new DeepSeek V3.2 model.

Since the official release does not provide a chat template, a provisional template has been added and tested only with llama.cpp. Compatibility with other inference engines is not guaranteed and may require further adjustments.

In addition, Minja polyfill detection has been slightly updated to accommodate the new template structure.

Needs PR #17376 to be merged first.

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Pull Request #405 - Performance Analysis Summary

PR Title: UPSTREAM PR #17707: common: Deepseek V3.2 tool call parser
Change Scope: 10 files modified (+729 additions, -60 deletions)


Analysis Classification: Condition 1 (No Performance Impact)

This PR introduces a new chat template parser for DeepSeek V3.2 model with XML-based tool call format. The code changes add new parsing logic and test coverage without modifying existing inference paths.

Performance Impact Assessment:

The performance metrics show variations in STL container operations (vector::end(), map::begin()) with throughput changes ranging from 60-195 ns. However, these functions are not in the inference critical path. The actual changes are:

  • chat-parser-xml-toolcall.cpp: Added allowed_literal_between_kvsep field support for parsing boolean literals between key-value separators. Modified parse_msg_with_xml_tool_calls() to handle tool calls within thinking blocks when allow_toolcall_in_think is enabled.

  • Power consumption: Three binaries show minimal increases: llama-tts (+914 nJ, +0.407%), llama-cvector-generator (+756 nJ, +0.343%), llama-run (+442 nJ, +0.230%). These are chat template utilities, not inference engines.

Inference Impact: None. The modified functions (chat parsers, template renderers) execute before model inference begins. Functions like llama_decode, llama_encode, and llama_tokenize are unchanged. Tokens per second remains unaffected.

The observed STL performance variations are compiler optimization artifacts unrelated to the functional changes, which purely extend chat template parsing capabilities for a new model format.

@loci-dev loci-dev force-pushed the main branch 13 times, most recently from 738bfbf to f01b714 Compare December 4, 2025 09:11
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from f72076f to 3f5e1ff Compare December 8, 2025 21:08
@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

@loci-dev loci-dev force-pushed the main branch 23 times, most recently from 78ff3d3 to 117bfc3 Compare December 11, 2025 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants