GitHub - mentat-ai/mentat: open source language model in rust

    __  ___           __        __ 
   /  |/  /__  ____  / /_____ _/ /_
  / /|_/ / _ \/ __ \/ __/ __ `/ __/
 / /  / /  __/ / / / /_/ /_/ / /_  
/_/  /_/\___/_/ /_/\__/\__,_/\__/

A Sovereign, Rust-Native Inference Engine for High-Performance Reasoning Models.

🌟 Vision

Mentat is a completely independent, fast, and secure platform for running AI models locally. Built from the ground up in Rust, it extraction maximum performance from consumer hardware while ensuring absolute data privacy and architectural sovereignty.

🗺️ Architectural Overview

graph TD
    subgraph Input ["Input Layer"]
        Prompt["User Prompt"]
        Weights["Safetensors (.bin / .safetensors)"]
    end

    subgraph Core ["Mentat Inference Engine (Rust)"]
        direction TB
        Loader["Weight Loader (mmap)"]
        Tokenizer["BPE Tokenizer"]
        Parser["Harmony Parser"]
        
        subgraph Brain ["Transformer Model"]
            direction LR
            Attn["Attention (GQA)"]
            MoE["Mixture of Experts"]
            KV["KV Cache"]
        end
        
        Math["Tensor Ops (MatMul, Add, Mul)"]
    end

    subgraph Tools ["Agentic Layer (Phase 4)"]
        Python["Python (WASM Sandbox)"]
        Browser["Headless Browser"]
        FS["File Patcher"]
    end

    subgraph Hardware ["Hardware Acceleration (Phase 6)"]
        Metal["Apple Metal"]
        CUDA["NVIDIA CUDA"]
    end

    Prompt --> Tokenizer
    Weights --> Loader
    Loader --> Math
    Tokenizer --> Brain
    Brain --> Math
    Math --> Attn
    Math --> MoE
    Attn --> KV
    Brain --> Parser
    Parser --> Python
    Parser --> Browser
    Parser --> FS
    Math -.-> Hardware

🏗️ Technical Architecture

Mentat is designed with a modular, "purity-first" approach, separating the mathematical engine from the agentic capabilities.

1. The Core Engine (`src/tensor`)

The foundation of Mentat is a custom tensor library implemented in pure Rust.

Tensor Ops: Efficient implementation of MatMul, Add, and Mul.
Memory Management: Leverages memmap2 for zero-copy weight loading, allowing multi-gigabyte models to be loaded with minimal RAM overhead.
Precision Support: Native support for F32, F16, and BF16 (Brain-Float 16), ensuring compatibility with modern models like Llama 3 and GPT-OSS.

2. The Neural Brain (`src/model`)

A sovereign implementation of the Transformer architecture:

Transformer Blocks: Modular blocks featuring RMSNorm (Root Mean Square Normalization) for stability.
Grouped-Query Attention (GQA): Optimized attention mechanism for high-speed context processing.
Mixture of Experts (MoE): Implementation of Gated Routing logic, enabling massive models to run efficiently by activating only a subset of parameters (Experts) per token.
KV Cache: Advanced caching of Key-Value pairs to ensure O(1) inference time relative to sequence length.

3. The Communication Layer (`src/tokenizer`)

BPE Tokenizer: A high-performance Byte Pair Encoding implementation for text-to-ID conversion.
Harmony Parser: A specialized parser for structured outputs, capable of live-extracting reasoning chains (<think>) and agentic tool calls (<python>, <browser>) from the model's stream.

🗺️ Roadmap & Future Implementations

Phase 4: Agentic Tools (Current Focus)

Secure Sandbox: A WASM-based or Docker-isolated environment for executing model-generated Python code.
Sovereign Browser: A headless navigation tool for real-time web research.
Atomic File Patcher: Safe filesystem operations for direct codebase modifications.

Phase 5: Distribution & APIs

OpenAI-Compatible API: A local HTTP server that acts as a drop-in replacement for OpenAI endpoints.
Static Binaries: Ensuring Mentat can be distributed as a single, dependency-free executable for Mac, Linux, and Windows.

Phase 6: Hardware Acceleration & Performance

Apple Metal Support: Native GPU acceleration for Apple Silicon via metal-rs.
CUDA Integration: High-performance kernels for NVIDIA hardware via cudarc.
Deep Benchmarking: Built-in performance and memory profiling using criterion and dhat.

Phase 7: Sovereignty & Local Fine-Tuning

Data Collection Pipelines: Local, opt-in privacy-first data recording.
Native LoRA: Implementation of Low-Rank Adaptation to allow users to adapt models to their own data locally without Python.

🚀 Getting Started

Installation

git clone https://github.com/mentat-ai/mentat
cd mentat
cargo build --release

Interactive Commands

Mentat provides a suite of tools for inspecting and testing models:

# 🔍 Inspect a model's internal architecture and tensors
cargo run --release -- inspect --model ./models/model.safetensors

# 📖 Test the BPE Tokenizer
cargo run --release -- tokenize "Hello, world!"

# 🧩 Test the Harmony Parser
cargo run --release -- parse "<think>Reasoning...</think> <python>print(1)</python>"

# 🛡️ Run with local, opt-in data collection for future fine-tuning
cargo run --release -- --opt-in-data-collection true tokenize "Hello, world!"

📜 License

Apache 2.0 - See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
benches		benches
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
dhat-heap.json		dhat-heap.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Vision

🗺️ Architectural Overview

🏗️ Technical Architecture

1. The Core Engine (`src/tensor`)

2. The Neural Brain (`src/model`)

3. The Communication Layer (`src/tokenizer`)

🗺️ Roadmap & Future Implementations

Phase 4: Agentic Tools (Current Focus)

Phase 5: Distribution & APIs

Phase 6: Hardware Acceleration & Performance

Phase 7: Sovereignty & Local Fine-Tuning

🚀 Getting Started

Installation

Interactive Commands

📜 License

📈 Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌟 Vision

🗺️ Architectural Overview

🏗️ Technical Architecture

1. The Core Engine (src/tensor)

2. The Neural Brain (src/model)

3. The Communication Layer (src/tokenizer)

🗺️ Roadmap & Future Implementations

Phase 4: Agentic Tools (Current Focus)

Phase 5: Distribution & APIs

Phase 6: Hardware Acceleration & Performance

Phase 7: Sovereignty & Local Fine-Tuning

🚀 Getting Started

Installation

Interactive Commands

📜 License

📈 Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. The Core Engine (`src/tensor`)

2. The Neural Brain (`src/model`)

3. The Communication Layer (`src/tokenizer`)

Packages