Skip to content

Latest commit

 

History

History
930 lines (718 loc) · 28.5 KB

File metadata and controls

930 lines (718 loc) · 28.5 KB

ADR-010: Function Calling / Tool Use in RuvLLM

Status: Proposed Date: 2026-01-20 Decision Makers: Ruvector Architecture Team Technical Area: LLM Capabilities / Agent Framework Integration


Context and Problem Statement

RuvLLM currently provides text generation capabilities but lacks structured function calling (tool use) support, which is essential for integration with modern agent frameworks like LangChain, LlamaIndex, CrewAI, and AutoGPT. Function calling enables models to interact with external tools, APIs, and databases in a structured, type-safe manner.

Current State

RuvLLM's generation API is limited to:

  • Text-in, text-out generation
  • No structured output parsing
  • No tool/function definition support
  • Manual prompt engineering required for tool interactions
  • No support for multi-turn tool conversations

Key Challenges

  1. Agent Framework Integration: Popular frameworks expect OpenAI-compatible function calling APIs
  2. Structured Outputs: Models need to generate valid JSON function calls, not freeform text
  3. Multi-Turn Conversations: Tool results must be fed back to the model for reasoning
  4. Parallel Tool Calls: Efficient agents need to call multiple tools simultaneously
  5. Model Format Compatibility: Different models (Llama, Mistral, Qwen) use different tool calling formats

Decision Drivers

Functional Requirements

  • Tool Definitions: JSON Schema-based function signatures
  • Tool Choice Control: Auto, none, required, or specific function selection
  • Parallel Calls: Multiple function calls in a single response
  • Result Integration: Feeding tool outputs back to the model
  • Type Safety: Validate function arguments against schemas

Compatibility Requirements

  • OpenAI API Compatible: Drop-in replacement for OpenAI function calling
  • Anthropic Tool Use: Map to Anthropic's tool_use format
  • Framework Integration: Direct support for LangChain, LlamaIndex, CrewAI
  • Model Agnostic: Work across Llama 3.1+, Mistral, Qwen, custom models

Performance Requirements

  • Constrained Generation: Force valid JSON output via logit biasing
  • Low Latency: <10ms overhead for tool call parsing
  • Streaming Support: Stream tool calls as they're generated
  • Batching: Process multiple tool calls efficiently

Considered Options

Option A: Prompt Engineering Only

Use structured prompts to request tool calls in JSON format, parse with regex/JSON parsers.

Pros:

  • No core changes to generation logic
  • Works with any model
  • Simple implementation

Cons:

  • Unreliable: models may generate invalid JSON
  • No type safety guarantees
  • Poor support for parallel tool calls
  • Requires extensive prompt tuning per model

Option B: Constrained Generation with Grammar

Implement constrained decoding using formal grammars (GBNF, JSON Schema) to force valid tool calls.

Pros:

  • Guarantees valid JSON output
  • Type-safe by construction
  • Works across model architectures
  • Best reliability for production

Cons:

  • Complex implementation (logit masking)
  • Requires grammar compiler
  • Potential performance overhead

Option C: Model-Specific Chat Templates

Leverage each model family's native tool calling format via chat templates.

Pros:

  • Optimal for models with native tool support (Llama 3.1+, Mistral)
  • Minimal overhead
  • Leverages model training

Cons:

  • Fragmented implementation across models
  • No support for models without native tool calling
  • Template maintenance burden

Decision Outcome

Chosen Option: Hybrid Approach - Option B (Constrained Generation) + Option C (Chat Templates)

Implement constrained generation with grammar-based validation as the foundation, with chat template optimizations for models with native tool calling support.

Rationale

  1. Reliability First: Constrained generation guarantees valid outputs for critical production use cases
  2. Performance Optimization: Chat templates optimize for models with native support (Llama 3.1+, Mistral)
  3. Universal Compatibility: Fallback to constrained generation for any model
  4. Future-Proof: New models can be added via chat templates without core changes

Technical Specifications

Tool Definition Schema

use serde::{Deserialize, Serialize};
use schemars::JsonSchema;

/// Tool/function definition for function calling
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolDefinition {
    /// Function name (must be valid identifier)
    pub name: String,

    /// Human-readable description for the model
    pub description: String,

    /// JSON Schema for function parameters
    pub parameters: JsonSchema,

    /// Required parameter names
    #[serde(default)]
    pub required: Vec<String>,
}

/// JSON Schema representation
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JsonSchema {
    #[serde(rename = "type")]
    pub schema_type: String,

    #[serde(skip_serializing_if = "Option::is_none")]
    pub properties: Option<std::collections::HashMap<String, JsonSchema>>,

    #[serde(skip_serializing_if = "Option::is_none")]
    pub items: Option<Box<JsonSchema>>,

    #[serde(skip_serializing_if = "Option::is_none")]
    pub description: Option<String>,

    #[serde(skip_serializing_if = "Option::is_none")]
    pub enum_values: Option<Vec<String>>,
}

/// Tool choice mode for generation
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum ToolChoice {
    /// Model decides whether to call tools
    Auto,

    /// Model must not call any tools
    None,

    /// Model must call at least one tool
    Required,

    /// Model must call this specific function
    Specific(String),
}

Tool Call Request and Response

/// Request with tool calling support
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolCallRequest {
    /// User message/prompt
    pub messages: Vec<ChatMessage>,

    /// Available tools/functions
    #[serde(default)]
    pub tools: Vec<ToolDefinition>,

    /// Tool choice mode
    #[serde(default)]
    pub tool_choice: ToolChoice,

    /// Enable parallel tool calls (default: true)
    #[serde(default = "default_true")]
    pub parallel_tool_calls: bool,

    /// Standard generation parameters
    #[serde(flatten)]
    pub params: GenerateParams,
}

/// Tool call in model response
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolCall {
    /// Unique identifier for this tool call
    pub id: String,

    /// Type (always "function" for now)
    #[serde(rename = "type")]
    pub call_type: String,

    /// Function call details
    pub function: FunctionCall,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FunctionCall {
    /// Function name (must match a tool definition)
    pub name: String,

    /// JSON-encoded function arguments
    pub arguments: serde_json::Value,
}

/// Chat message with tool call support
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChatMessage {
    /// Role: system, user, assistant, tool
    pub role: String,

    /// Text content
    #[serde(skip_serializing_if = "Option::is_none")]
    pub content: Option<String>,

    /// Tool calls (for assistant messages)
    #[serde(skip_serializing_if = "Option::is_none")]
    pub tool_calls: Option<Vec<ToolCall>>,

    /// Tool call ID (for tool result messages)
    #[serde(skip_serializing_if = "Option::is_none")]
    pub tool_call_id: Option<String>,
}

fn default_true() -> bool { true }

Chat Template Integration

Different models require different formatting for tool calling:

/// Chat template for tool calling
pub trait ToolCallingTemplate {
    /// Format messages with tool definitions
    fn format_with_tools(
        &self,
        messages: &[ChatMessage],
        tools: &[ToolDefinition],
        tool_choice: &ToolChoice,
    ) -> Result<String>;

    /// Parse tool calls from model output
    fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>>;

    /// Check if model has native tool calling support
    fn has_native_support(&self) -> bool;
}

/// Llama 3.1+ tool calling format
pub struct Llama31ToolTemplate;

impl ToolCallingTemplate for Llama31ToolTemplate {
    fn format_with_tools(
        &self,
        messages: &[ChatMessage],
        tools: &[ToolDefinition],
        tool_choice: &ToolChoice,
    ) -> Result<String> {
        // Llama 3.1 uses special <|python_tag|> tokens for tools
        let mut prompt = String::new();

        // Add tool definitions
        prompt.push_str("<|start_header_id|>system<|end_header_id|>\n\n");
        prompt.push_str("Available tools:\n");
        for tool in tools {
            prompt.push_str(&format!(
                "<|python_tag|>{}<|eom_id|>\n",
                serde_json::to_string_pretty(tool)?
            ));
        }

        // Add conversation history
        for msg in messages {
            prompt.push_str(&format!(
                "<|start_header_id|>{}<|end_header_id|>\n\n{}<|eom_id|>\n",
                msg.role,
                msg.content.as_deref().unwrap_or("")
            ));
        }

        // Start assistant response
        prompt.push_str("<|start_header_id|>assistant<|end_header_id|>\n\n");

        Ok(prompt)
    }

    fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>> {
        // Parse <|python_tag|>{"name": "...", "arguments": {...}}<|eom_id|>
        // Implementation details omitted for brevity
        todo!("Parse Llama 3.1 tool call format")
    }

    fn has_native_support(&self) -> bool { true }
}

/// Mistral tool calling format
pub struct MistralToolTemplate;

impl ToolCallingTemplate for MistralToolTemplate {
    fn format_with_tools(
        &self,
        messages: &[ChatMessage],
        tools: &[ToolDefinition],
        tool_choice: &ToolChoice,
    ) -> Result<String> {
        // Mistral uses [AVAILABLE_TOOLS] and [/AVAILABLE_TOOLS] markers
        let mut prompt = String::new();

        prompt.push_str("[AVAILABLE_TOOLS]\n");
        prompt.push_str(&serde_json::to_string(tools)?);
        prompt.push_str("\n[/AVAILABLE_TOOLS]\n\n");

        // Add conversation
        for msg in messages {
            prompt.push_str(&format!("[INST] {} [/INST]\n", msg.content.as_deref().unwrap_or("")));
        }

        Ok(prompt)
    }

    fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>> {
        // Parse [TOOL_CALLS] ... [/TOOL_CALLS]
        todo!("Parse Mistral tool call format")
    }

    fn has_native_support(&self) -> bool { true }
}

/// Qwen tool calling format
pub struct QwenToolTemplate;

/// Generic XML-based format for models without native support
pub struct GenericXmlToolTemplate;

impl ToolCallingTemplate for GenericXmlToolTemplate {
    fn format_with_tools(
        &self,
        messages: &[ChatMessage],
        tools: &[ToolDefinition],
        tool_choice: &ToolChoice,
    ) -> Result<String> {
        // Generic format using XML tags
        let mut prompt = String::from(
            "You have access to the following tools. To use a tool, respond with:\n\
             <tool_call>\n\
             <name>function_name</name>\n\
             <arguments>{\"arg1\": \"value1\"}</arguments>\n\
             </tool_call>\n\n"
        );

        prompt.push_str("Available tools:\n");
        for tool in tools {
            prompt.push_str(&format!("- {}: {}\n", tool.name, tool.description));
            prompt.push_str(&format!("  Parameters: {}\n",
                serde_json::to_string(&tool.parameters)?));
        }
        prompt.push_str("\n");

        // Add conversation
        for msg in messages {
            prompt.push_str(&format!("{}: {}\n", msg.role, msg.content.as_deref().unwrap_or("")));
        }

        Ok(prompt)
    }

    fn parse_tool_calls(&self, output: &str) -> Result<Vec<ToolCall>> {
        // Parse <tool_call>...</tool_call> blocks
        use regex::Regex;

        let re = Regex::new(
            r"<tool_call>\s*<name>([^<]+)</name>\s*<arguments>([^<]+)</arguments>\s*</tool_call>"
        )?;

        let mut calls = Vec::new();
        for cap in re.captures_iter(output) {
            calls.push(ToolCall {
                id: uuid::Uuid::new_v4().to_string(),
                call_type: "function".to_string(),
                function: FunctionCall {
                    name: cap[1].to_string(),
                    arguments: serde_json::from_str(&cap[2])?,
                },
            });
        }

        Ok(calls)
    }

    fn has_native_support(&self) -> bool { false }
}

Constrained Generation Engine

For guaranteed valid JSON output, implement constrained decoding:

use serde_json::Value as JsonValue;

/// Constrained generation for tool calls
pub struct ConstrainedToolGenerator {
    /// JSON Schema grammar compiler
    grammar_compiler: GrammarCompiler,

    /// Logit processor for constraint enforcement
    logit_processor: LogitProcessor,
}

impl ConstrainedToolGenerator {
    /// Generate tool calls with grammar constraints
    pub fn generate_tool_calls(
        &self,
        model: &LlmBackend,
        prompt: &str,
        tools: &[ToolDefinition],
        params: GenerateParams,
    ) -> Result<Vec<ToolCall>> {
        // Compile JSON Schema to GBNF grammar
        let grammar = self.compile_tool_grammar(tools)?;

        // Generate with logit masking to enforce grammar
        let output = model.generate_constrained(prompt, &grammar, params)?;

        // Parse guaranteed-valid JSON
        let calls: Vec<ToolCall> = serde_json::from_str(&output)?;

        Ok(calls)
    }

    /// Compile JSON Schema into GBNF grammar
    fn compile_tool_grammar(&self, tools: &[ToolDefinition]) -> Result<Grammar> {
        // Build grammar that only allows valid tool calls
        // Example: tool_call ::= "{" ws "\"name\"" ws ":" ws name ws "," ws "\"arguments\"" ws ":" ws arguments ws "}"
        // name ::= "\"tool1\"" | "\"tool2\"" | ...
        // arguments ::= { schema-specific grammar }

        self.grammar_compiler.compile_tool_schema(tools)
    }
}

/// GBNF (GGML BNF) grammar for constrained generation
#[derive(Debug, Clone)]
pub struct Grammar {
    /// Grammar rules in GBNF format
    pub rules: String,
}

/// Logit processor for grammar enforcement
pub struct LogitProcessor {
    /// Current parse state
    state: ParseState,
}

impl LogitProcessor {
    /// Mask logits to only allow valid next tokens
    pub fn process_logits(
        &mut self,
        logits: &mut [f32],
        grammar: &Grammar,
        tokenizer: &Tokenizer,
    ) -> Result<()> {
        // Get valid next tokens from grammar state
        let valid_tokens = self.state.get_valid_next_tokens(grammar)?;

        // Mask out invalid tokens (set logit to -inf)
        for (token_id, logit) in logits.iter_mut().enumerate() {
            if !valid_tokens.contains(&(token_id as u32)) {
                *logit = f32::NEG_INFINITY;
            }
        }

        Ok(())
    }
}

#[derive(Debug)]
struct ParseState {
    /// Current position in grammar
    position: usize,

    /// Parse stack for nested structures
    stack: Vec<String>,
}

Multi-Turn Tool Conversations

Support iterative tool use:

/// Multi-turn conversation with tool calls
pub struct ToolConversation {
    /// Conversation history
    messages: Vec<ChatMessage>,

    /// Available tools
    tools: Vec<ToolDefinition>,

    /// Backend for generation
    backend: Box<dyn LlmBackend>,
}

impl ToolConversation {
    /// Add user message and generate response (may include tool calls)
    pub fn send_message(&mut self, content: &str) -> Result<ConversationTurn> {
        // Add user message
        self.messages.push(ChatMessage {
            role: "user".to_string(),
            content: Some(content.to_string()),
            tool_calls: None,
            tool_call_id: None,
        });

        // Generate response with tool calls
        let request = ToolCallRequest {
            messages: self.messages.clone(),
            tools: self.tools.clone(),
            tool_choice: ToolChoice::Auto,
            parallel_tool_calls: true,
            params: GenerateParams::default(),
        };

        let response = self.backend.generate_with_tools(request)?;

        // Add assistant response to history
        self.messages.push(ChatMessage {
            role: "assistant".to_string(),
            content: response.content.clone(),
            tool_calls: response.tool_calls.clone(),
            tool_call_id: None,
        });

        Ok(ConversationTurn {
            content: response.content,
            tool_calls: response.tool_calls,
        })
    }

    /// Submit tool results and continue conversation
    pub fn submit_tool_results(&mut self, results: Vec<ToolResult>) -> Result<ConversationTurn> {
        // Add tool result messages
        for result in results {
            self.messages.push(ChatMessage {
                role: "tool".to_string(),
                content: Some(result.output),
                tool_calls: None,
                tool_call_id: Some(result.tool_call_id),
            });
        }

        // Generate next response
        self.send_message("")
    }
}

#[derive(Debug, Clone)]
pub struct ConversationTurn {
    /// Text content
    pub content: Option<String>,

    /// Tool calls (if any)
    pub tool_calls: Option<Vec<ToolCall>>,
}

#[derive(Debug, Clone)]
pub struct ToolResult {
    /// Tool call ID this result corresponds to
    pub tool_call_id: String,

    /// Tool output (JSON or text)
    pub output: String,
}

Implementation Plan

Phase 1: Core Infrastructure (Week 1-2)

  1. Define Tool Schema Types

    • Implement ToolDefinition, ToolCall, ToolChoice types
    • Add JSON Schema validation
    • Create builder APIs for ergonomic tool definitions
  2. Chat Template Integration

    • Implement ToolCallingTemplate trait
    • Add Llama 3.1, Mistral, Qwen templates
    • Create generic XML fallback template
  3. Request/Response API

    • Extend LlmBackend with generate_with_tools method
    • Add tool call parsing logic
    • Implement OpenAI-compatible API surface

Deliverables:

// User-facing API
let tools = vec![
    ToolDefinition::new("get_weather")
        .description("Get current weather for a location")
        .parameter("location", JsonSchema::string())
        .parameter("units", JsonSchema::enum_values(&["celsius", "fahrenheit"]))
        .required(&["location"])
];

let request = ToolCallRequest {
    messages: vec![
        ChatMessage::user("What's the weather in San Francisco?")
    ],
    tools,
    tool_choice: ToolChoice::Auto,
    parallel_tool_calls: true,
    params: GenerateParams::default(),
};

let response = backend.generate_with_tools(request)?;
for call in response.tool_calls.unwrap_or_default() {
    println!("Tool: {}, Args: {}", call.function.name, call.function.arguments);
}

Phase 2: Constrained Generation (Week 3-4)

  1. Grammar Compiler

    • Implement JSON Schema to GBNF compiler
    • Support nested objects, arrays, enums
    • Add grammar caching for performance
  2. Logit Processor

    • Implement parse state machine
    • Add logit masking for valid tokens
    • Optimize for streaming generation
  3. Integration

    • Wire constrained generation to LlmBackend
    • Add fallback logic (native template → constrained generation)
    • Benchmark performance impact

Deliverables:

// Constrained generation ensures valid JSON
let generator = ConstrainedToolGenerator::new();
let calls = generator.generate_tool_calls(
    &backend,
    &prompt,
    &tools,
    params,
)?;

// Guaranteed to parse successfully
assert!(calls.iter().all(|c| tools.iter().any(|t| t.name == c.function.name)));

Phase 3: Multi-Turn Conversations (Week 5-6)

  1. Conversation Manager

    • Implement ToolConversation for stateful interactions
    • Add automatic tool result integration
    • Support parallel tool call orchestration
  2. Agent Framework Integration

    • LangChain adapter
    • LlamaIndex integration
    • CrewAI support
  3. Examples and Documentation

    • Multi-turn conversation examples
    • Agent framework integration guides
    • Performance tuning documentation

Deliverables:

// Multi-turn conversation with tool use
let mut conv = ToolConversation::new(backend, tools);

let turn1 = conv.send_message("Book a flight to NYC")?;
// Model calls search_flights(destination="NYC")

let results = vec![ToolResult {
    tool_call_id: turn1.tool_calls[0].id.clone(),
    output: r#"{"flights": [{"price": 250, "time": "10am"}]}"#.to_string(),
}];

let turn2 = conv.submit_tool_results(results)?;
// Model responds with flight options

Compatibility Matrix

API Compatibility

API Style RuvLLM Support Notes
OpenAI Function Calling ✅ Full Drop-in replacement for functions and tools parameters
Anthropic Tool Use ✅ Full Map tool_use blocks to OpenAI format
LangChain Tools ✅ Full Direct integration via BaseTool adapter
LlamaIndex Tools ✅ Full Implement BaseToolSpec interface
CrewAI Tools ✅ Full Compatible with Tool decorator

Model Support

Model Family Native Support Template Constrained Fallback
Llama 3.1+ ✅ Yes Llama31ToolTemplate
Llama 3.0 and earlier ❌ No GenericXmlToolTemplate
Mistral 7B+ ✅ Yes MistralToolTemplate
Qwen 2.5+ ✅ Yes QwenToolTemplate
CodeLlama ❌ No GenericXmlToolTemplate
Custom Models ❌ No GenericXmlToolTemplate

Framework Integration

// LangChain integration example
use langchain_rs::{Tool, ToolInput, ToolOutput};

struct RuvLlmTool {
    definition: ToolDefinition,
    executor: Box<dyn Fn(JsonValue) -> Result<String>>,
}

impl Tool for RuvLlmTool {
    fn name(&self) -> &str {
        &self.definition.name
    }

    fn description(&self) -> &str {
        &self.definition.description
    }

    fn run(&self, input: ToolInput) -> Result<ToolOutput> {
        let args = serde_json::to_value(input)?;
        let output = (self.executor)(args)?;
        Ok(ToolOutput::Text(output))
    }
}

Performance Characteristics

Latency Overhead

Component Latency Notes
Tool schema compilation <1ms Cached after first use
Grammar compilation 5-10ms Cached per tool set
Logit processing (per token) <0.1ms Minimal impact on generation
JSON parsing <1ms Standard serde_json
Total overhead <10ms Amortized across conversation

Memory Overhead

Component Memory Notes
Tool definitions ~1KB per tool Scales with number of tools
Grammar cache ~10KB per tool set One-time cost
Parse state ~1KB per request Freed after generation
Total overhead ~10KB + 1KB/tool Negligible for typical use

Throughput Comparison

Method Tools/sec Reliability Use Case
Prompt engineering only 1000+ 70-80% Development/testing
Chat template (native) 800-1000 90-95% Production (supported models)
Constrained generation 200-500 99.9%+ Production (all models), critical systems

Consequences

Positive Consequences

  1. Agent Framework Integration: Direct compatibility with LangChain, LlamaIndex, CrewAI enables rich agent ecosystems
  2. Type Safety: JSON Schema validation prevents invalid tool calls at generation time
  3. Reliability: Constrained generation guarantees valid outputs for production systems
  4. OpenAI Compatibility: Drop-in replacement for OpenAI API reduces migration friction
  5. Multi-Modal Agents: Foundation for RAG, web search, database access, API integration
  6. Parallel Execution: Multiple tool calls enable efficient multi-step reasoning

Negative Consequences

  1. Complexity: Grammar compilation and constrained generation add implementation complexity
  2. Performance Impact: Logit processing adds 5-10% latency for constrained generation
  3. Model Requirements: Best performance requires models with native tool calling support
  4. Testing Burden: Must validate across multiple model families and templates

Neutral Consequences

  1. Template Maintenance: Each new model family may require new chat template
  2. Schema Limitations: Complex schemas (recursive types, unions) may be challenging to constrain
  3. Backward Compatibility: Existing text generation API unchanged, tool calling is additive

Risk Mitigation

Risk Mitigation
Invalid JSON output Constrained generation with grammar enforcement
Template incompatibility Generic XML fallback for unsupported models
Performance regression Benchmark suite, caching, optional constrained mode
Schema complexity Comprehensive test suite with edge cases
Framework API changes Version pinning, adapter pattern for isolation

Alternatives Considered

Text Parsing Only (Rejected)

Use prompt engineering with regex/JSON parsing.

  • Rejected: Unreliable for production; 20-30% failure rate for complex schemas
  • Consideration: Useful for prototyping and development

Python Backend (vLLM, Outlines) (Rejected)

Integrate vLLM or Outlines Python libraries via FFI.

  • Rejected: Cross-language complexity, deployment burden, latency overhead
  • Consideration: Reference implementation for grammar compilation logic

Custom DSL for Tool Definitions (Rejected)

Create a Rust macro-based DSL for tool definitions.

  • Rejected: JSON Schema is industry standard, better tooling support
  • Consideration: Could add as syntactic sugar on top of JSON Schema

Related Decisions

  • ADR-002: RuvLLM Integration with Ruvector (foundation for tool-enhanced RAG)
  • ADR-008: mistral-rs Integration (backend for high-performance tool calling)
  • ADR-009: Streaming Architecture (streaming tool calls in progress)

References

  1. OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling

    • Industry-standard API for tool use
    • functions parameter (deprecated) and tools parameter
    • Parallel tool calls and tool choice modes
  2. Anthropic Tool Use: https://docs.anthropic.com/claude/docs/tool-use

    • Alternative API design with tool_use blocks
    • Computer use (bash, editor) as specialized tools
    • Multi-step tool orchestration patterns
  3. LangChain Tool Documentation: https://python.langchain.com/docs/modules/agents/tools/

    • Agent framework integration patterns
    • BaseTool interface and tool decorators
    • Tool result schemas
  4. LlamaIndex Tools: https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/

    • BaseToolSpec interface
    • Function tools and query engine tools
  5. Constrained Decoding:

  6. Model-Specific Tool Formats:


Implementation Status

Component Status Notes
Tool schema types Pending Define ToolDefinition, ToolCall, ToolChoice
JSON Schema validation Pending Integrate schemars crate
Chat templates Pending Llama 3.1, Mistral, Qwen, Generic XML
Request/Response API Pending generate_with_tools method on LlmBackend
Grammar compiler Pending JSON Schema → GBNF compiler
Logit processor Pending Parse state machine and masking logic
Constrained generation Pending Integration with backend
Multi-turn conversations Pending ToolConversation manager
LangChain integration Pending BaseTool adapter
LlamaIndex integration Pending BaseToolSpec implementation
CrewAI support Pending Tool decorator compatibility
OpenAI API compatibility Pending /v1/chat/completions endpoint
Anthropic format mapping Pending tool_use block conversion
Streaming tool calls Pending Stream partial JSON as generated
Parallel tool execution Pending Concurrent tool call orchestration
Documentation Pending API docs, examples, integration guides

Revision History

Version Date Author Changes
1.0 2026-01-20 Ruvector Architecture Team Initial proposal