AI agents waste tokens reading entire files. mq lets them query structure first, then extract only what they need. The agent's context window becomes the working index.
Embedding-based retrieval is provably limited by dimensionality — a fundamental ceiling, not a training problem. Stuffing full documents into context degrades performance 14–85% even with perfect retrieval. And keyword search + agent reasoning matches 90%+ of RAG without a vector database. Anthropic themselves replaced RAG with agentic search in Claude Code.
mq is built on this: expose structure, let the agent reason. No embeddings, no vector DB, no external APIs.
Results:
- 123 PDFs (365MB) triaged in 2.97s with warm cache — full structural map of every document
- 83% fewer tokens for markdown when scoped correctly
- ~20ms per PDF on warm cache — sub-second
.treeon 60+ PDFs, ~3s on 123 - 50x more PDFs searchable (800 vs 16 in 200k context) via structure-first approach
One query model works across markdown, HTML, PDF, JSON, JSONL, and YAML.
Install | Agent Skill | Usage | Query Language
| Format | Extensions | Structure Extraction |
|---|---|---|
| Markdown | .md |
Headings, sections, code blocks, links, tables |
| HTML | .html, .htm |
Headings, readable content (Readability algorithm) |
.pdf |
Headings (font-size inference), page numbers, tables, text | |
| JSON | .json |
Top-level keys as headings, nested structure |
| JSONL | .jsonl, .ndjson |
Line-level search, per-record drill-in |
| YAML | .yaml, .yml |
Keys as headings, nested structure |
When browsing directories, mq uses format-aware labels and expands per-file structure when available:
$ mq project/ .tree
project/ (6 files)
├── config.json (12 lines, 3 keys)
│ ├── key name
│ └── key database
├── config.yaml (15 lines, 4 keys)
│ ├── key name
│ └── key database
├── README.md (80 lines, 5 sections)
│ ├── # Overview
│ │ "Complete reference for..."
│ └── ## Install
│ "Run the install script..."
├── report.pdf (24 pages, 8 sections)
│ ├── H1 Introduction (p. 1)
│ │ "This report covers Q4 results..."
│ └── H2 Methodology (p. 5)
│ "We used a mixed-methods approach..."
├── events.jsonl (100 lines, 98 records)
└── index.html (45 lines, 3 sections)
└── H1 Welcome
"Needle in html content."| Format | Count Label | Heading Label |
|---|---|---|
| Markdown | sections | # Heading |
| HTML/PDF | sections | H1 Heading |
| JSON/YAML | keys | key name / subkey field |
| JSONL | records | field name |
Any AI agent or coding assistant that can execute shell commands.
| mq | qmd | PageIndex | |
|---|---|---|---|
| Zero external API calls | Yes | No | No |
| No pre-built index | Yes | No | No |
| Single binary, no deps | Yes | No | No |
| Deterministic output | Yes | No | No |
See full comparison
# Markdown - structure and extraction
mq docs/ .tree
mq docs/auth.md ".section('OAuth Flow') | .text"
# HTML - readable content from web pages
mq page.html '.headings'
mq page.html '.text'
# PDF - extract structure from papers
mq paper.pdf '.headings'
mq paper.pdf '.tables'
# JSON/YAML - query data files
mq config.json '.headings' # Top-level keys
mq data.yaml '.text' # Flattened path:value text
mq data.yaml '.raw' # Original source text
# JSONL - search logs and session files
mq session.jsonl '.search("auth")' # Line-level search with record context
mq session.jsonl '.search("auth") | .text' # Flatten all matched records
mq session.jsonl '.search("auth") | .nth(0)' # Show one raw matched record
mq session.jsonl '.search("auth") | .nth(0) | .raw' # Explicit raw record
mq sessions/ '.search("requires OAuth") | .tree' # Search whole session directories with structured record outputTraditional retrieval adds external API hops. mq keeps everything in the agent's context:
┌─────────────────────────────────────────────────────────────────────────┐
│ Traditional RAG │
│ │
│ Agent → Embedding API → Vector DB → Reranker API → back to Agent │
│ (hop 1) (hop 2) (hop 3) (hop 4) │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ mq │
│ │
│ Agent ←→ mq (local binary) │
│ ↓ │
│ Agent reasons over structure in its own context │
│ │
│ No external APIs. No round trips. One context. │
└─────────────────────────────────────────────────────────────────────────┘
mq is grep for a caller that already understands meaning.
Traditional semantic search pre-computes embeddings and finds "nearness" — how close a query is to stored documents in vector space. Smart index, dumb query. mq inverts this: dumb index, smart caller.
An LLM already knows that "token refresh" is semantically near "OAuth," "session expiry," "credential rotation." It doesn't need a vector database to tell it that. So instead of pre-computing embeddings, let the model generate the right exact-match search terms itself:
- Read structure (
.tree) — see what each document contains and how it's organized - Reason about nearness — which terms would appear close to the target concept in these documents
- Search (
.search("term")) — fast, exact, deterministic - Read matched sections, narrow further — iterate until found
The semantic computation moves from a pre-built index to the model's inference pass. The LLM performs the "embedding" and "similarity search" implicitly when it decides what to search for. No pre-processing step, because the model that searches is the model that understands.
Structure is what makes this work. A flat text dump doesn't tell the model what's near what. Section headings, document hierarchy, and content previews give the model context to reason about better queries. mq exposes that structure; the model does the rest.
And unlike static embeddings, the model's sense of nearness is contextual. A vector embedding for "authentication" is the same vector regardless of what you're doing. A model searching for "authentication" while debugging logouts will look for different terms than one adding SSO. The search adapts to the task. Pre-computed embeddings can't.
mq is an interface, not an answer engine. It extracts structure into the agent's context, where the agent can reason over it directly. Agents like Claude Code and Codex are already LLMs with reasoning capability. Adding embedding APIs and rerankers just adds latency and cost. The agent can find what it needs — it just needs to see the structure.
Recent research validates the structure-first, agent-driven approach over traditional embedding pipelines.
Weller et al. (2025) prove mathematically that the number of distinct top-k result sets an embedding model can return is bounded by its dimensionality — a fundamental limit of the single-vector paradigm, not a training problem. State-of-the-art models fail on straightforward retrieval tasks in their LIMIT benchmark, even when embeddings are optimized directly on test data.
"These theoretical limits manifest in realistic settings with simple queries... requiring entirely new approaches rather than incremental improvements." — On the Theoretical Limitations of Embedding-Based Retrieval
Benescu & de Jong (2026) argue that "similarity is a short-sighted interpretation of relevance" and that LLM-based reasoning should theoretically outperform embedding retrieval — but current benchmarks can't measure the difference because human annotations contain the same short-sightedness.
— Why LLMs can Secretly Outperform Embedding Similarity in IR
Longer context doesn't mean better results. Du et al. (EMNLP 2025) show that even when models can perfectly retrieve all relevant information, performance still degrades 13.9–85% as input length increases — sheer token volume hurts reasoning regardless of retrieval quality.
"Even when all relevant evidence is placed immediately before the question, performance degrades substantially." — Context Length Alone Hurts LLM Performance Despite Perfect Retrieval
Chroma Research (2025) tested 18 models (Claude Opus 4, GPT-4.1, Gemini 2.5 Pro) and found performance declined with increasing context across all of them. A single distractor reduces accuracy. Models performed better on randomly shuffled haystacks than coherent ones — meaning how you organize context matters more than having it all.
This is why mq loads ~1KB of structure per document instead of ~50KB of full text. The agent sees more documents and reasons better over less noise.
Subramanian et al. at Amazon (2025) show that tool-based keyword search within an agentic framework achieves over 90% of traditional RAG performance — without a vector database. Simpler to implement, cheaper to run, and no index to maintain.
Wang et al. (2025) propose ELITE, an embedding-less retrieval system using iterative LLM reasoning. It outperforms embedding baselines on long-context QA with over an order of magnitude reduction in storage and runtime:
"Embedding-based retrieval can retrieve content that is semantically similar in form but misaligned with the question's true intent." — ELITE: Embedding-Less Retrieval with Iterative Text Exploration
Anthropic built a full RAG pipeline for Claude Code with embeddings and vector DB, then replaced it with agentic search (grep, glob, file reads). Boris Cherny, creator of Claude Code: "We found pretty quickly that agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."
Google DeepMind's LOFT benchmark (2024) found that long-context LLMs show "surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks" on tasks requiring up to millions of tokens of context.
— Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Microsoft Research's Code Researcher (2025) validates the Map-Narrow-Extract pattern: agents that explore 10 unique files per trajectory achieve 58% crash resolution vs 37.5% for agents that explore 1.33 files. Depth of structural exploration directly correlates with success.
Li et al. at Google (EMNLP 2024) found that "when resourced sufficiently, long-context consistently outperforms RAG in average performance." Their Self-Route hybrid routes queries to RAG or long-context based on model self-reflection, using only 38–61% of tokens while matching full long-context performance.
The Agentic RAG survey (Singh et al., 2025) establishes the taxonomy: traditional RAG operates through "static workflows and lacks adaptability for multi-step reasoning." Agentic RAG uses "reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies."
The research consensus is clear: naive single-shot embedding lookup is being superseded. The future is agents that reason over structure iteratively — which is exactly what mq enables. Expose structure, let the agent reason, extract only what's needed.
We benchmarked agents answering questions about the LangChain monorepo (50+ markdown files):
| Metric | Without mq | With mq | Improvement |
|---|---|---|---|
| Best case (scoped) | 147,070 | 24,000* | 83% fewer |
| Typical case | 412,668 | 108,225 | 74% fewer |
| Naive (tree entire repo) | 147,070 | 166,501 | -13% (worse) |
*When agent narrows down to specific file before running .tree
Running .tree on an entire repo is expensive. For 50 files, the tree output alone is ~22,000 characters before extracting any content.
Naive: .tree on /repo → 22K chars just for tree
Scoped: .tree on /repo/docs/auth.md → 500 chars, then extract
The fix: Agents should explore directory structure first, identify the likely subdirectory, then run .tree only on that target.
For repositories with thousands of files, use depth() and limit() to bound traversal:
# Level 0: See top-level structure (max 50 entries per directory)
mq corpus/ ".tree | depth(2) | limit(50)"
# Output shows what's truncated:
# corpus/ (10247 files, 500000 lines total)
# ├── auth/ (234 files, depth limit)
# ├── api/
# │ ├── v1/ (45 files, depth limit)
# │ ├── v2/ (38 files, depth limit)
# │ └── ... (12 more)
# └── ... (103 more)
# Level 1: Narrow to likely area
mq corpus/auth/ ".tree | limit(20)"
# Level 2: Extract what you need
mq corpus/auth/oauth.md ".section('Token Refresh') | .text"The agent reasons at each level. No 10k-file index needed - this mirrors how humans explore large codebases.
Full benchmark results
| Question | Mode | Chars Read | Savings |
|---|---|---|---|
| Commit standards | without mq | 9,115 | - |
| with mq (naive) | 12,877 | -41% | |
| with mq (scoped) | 2,144 | 76% | |
| Package installation | without mq | 10,407 | - |
| with mq | 3,200 | 74% |
Run it yourself: ./scripts/bench.sh
Benchmarked on LangChain monorepo (36 markdown files, 1,804 lines). Full logs.
| Metric | mq | qmd | PageIndex |
|---|---|---|---|
| Setup time | 0 | 29s + 3.1GB models | 6s/file (API) |
| Query latency | 3-22ms | 154ms (BM25) / 74s (semantic) | 6.3s |
| Cost per query | $0 | $0 (local) | ~$0.01-0.10 |
| Dependencies | Single binary | Bun, SQLite, node-llama-cpp | Python, OpenAI API |
| Pre-indexing | No | Yes (embed step) | Yes (tree generation) |
| Works offline | Yes | Yes (after model download) | No |
mq: 22ms ████
qmd BM25: 154ms ███████████████████████████
qmd semantic: 74s ████████████████████████████████████████████████████████ (CPU, no GPU)
PageIndex: 6.3s ████████████████████████████████████████████
Core insight: qmd and PageIndex compute results for you. mq doesn't - it exposes structure so the agent reasons to results itself:
- qmd: System computes similarity scores → returns ranked files
- PageIndex: System's LLM reasons over tree → returns relevant nodes
- mq: Exposes structure → agent reasons → agent finds what it needs
When the consumer is an LLM, it already has reasoning capability. mq leverages that instead of adding redundant computation layers.
Markdown structure is explicit. Headings, code blocks, links, tables, and lists can be parsed directly from the AST with stable line ranges.
PDFs are supported too, but their structure is inferred from layout cues like font size, boldness, and page position. That makes PDF parsing slower and more heuristic than markdown, even though the query interface stays the same once the Document is built.
This is the tradeoff mq makes: keep one query language, but let each parser extract the strongest deterministic structure it can for that format.
Text PDFs already go through the built-in PDF parser. The remaining frontier is image-heavy inputs: scanned PDFs, screenshots, diagrams, and pages where layout matters more than extracted text.
For those cases, we're exploring a sub-agent architecture:
Main Agent (Opus/Sonnet)
└── spawns Explorer Sub-Agent (Haiku with vision)
└── examines scanned page / image
└── returns structured summary to main context
The insight: vision-capable models can recover structure when text extraction and layout heuristics stop being enough. Instead of pre-processing everything with a separate service, reuse the agent infrastructure only for the hard cases:
- No pre-processing step - explore on demand
- Cheaper models for exploration - Haiku has vision but costs less
- Disposable context - sub-agent's work doesn't pollute main context
- Unified interface - same high-level workflow: structure, search, extract
This extends the mq philosophy: ordinary markdown, HTML, JSON, YAML, JSONL, and text PDFs stay on the fast local path; sub-agents are reserved for inputs that do not expose usable structure directly.
curl -fsSL https://raw.githubusercontent.com/muqsitnawaz/mq/main/install.sh | bashOr with Go (works on Windows too):
go install github.com/muqsitnawaz/mq@latestInstall the mq skill for Claude Code, Cursor, Codex, and other agents:
npx skills add muqsitnawaz/mqSee skills.sh for more.
Skills aren't always loaded into context. Add this line to your CLAUDE.md for optimal performance:
Use `mq` to query markdown files. Narrow down to a specific file/subdir first, then run `mq <path> .tree` to see structure before reading.Shell quoting: Examples use double quotes for the outer string (
"..."), which works on all platforms including Windows. On macOS and Linux, single quotes also work:mq doc.md '.section("API")'.
The CLI shape does not change by format: mq <path> [query].
The same three-step pattern works on every format: structure -> search -> extract.
# Any single file
mq README.md .tree
mq paper.pdf .tree
mq page.html .tree
# Directory overview (all formats, with previews)
mq docs/ .tree# Works the same across formats
mq README.md ".search('OAuth')"
mq paper.pdf ".search('methodology')"
mq docs/ ".search('authentication')"
# JSONL: line-level search with record type + structure
mq session.jsonl ".search('auth')"
# → [line 3] assistant/tool_use: Grep
# ts: 2026-02-01T20:25:34Z
# > ...searching for auth configuration...
# Expand matching records directly
mq session.jsonl ".search('auth') | .text"
# Tree view of matched records
mq sessions/ ".search('requires OAuth') | .tree"
# Expand all matched records across a directory
mq sessions/ ".search('requires OAuth') | .text"
# Pick one matched record only if you need to narrow (0-based), jq-style
mq session.jsonl ".search('auth') | .nth(0)"# Same selectors, any format
mq doc.md ".section('API') | .text"
mq paper.pdf ".section('Results') | .text"
mq page.html ".section('Features') | .text"
# Format-specific content
mq doc.md ".code('python')" # Code blocks (Markdown, HTML)
mq doc.md ".section('Examples') | .code('go')" # Code within a section
mq doc.md .links # Links
mq doc.md .metadata # YAML frontmatter
# Data formats
mq config.json .tree # Keys as structure
mq data.yaml ".section('database') | .text" # YAML sectionsPDFs show page numbers instead of line numbers:
$ mq paper.pdf .tree
paper.pdf (12 pages)
├── H1 Abstract (p. 1)
│ "We propose a new architecture for..."
├── H1 Introduction (p. 1)
│ "Recent advances in deep learning..."
├── H1 Methodology (p. 3)
│ "Our approach builds on transformer..."
│ ├── H2 Data Collection (p. 3)
│ └── H2 Model Architecture (p. 5)
└── H1 Results (p. 8)
"Table 1 shows the comparison..."
$ mq paper.pdf ".section('Methodology') | .text"
# Returns the full text of that sectionRun .tree on a directory of PDFs to get a structural map of every document:
$ mq papers/ .tree
papers/ (9 files, 11143 lines total)
├── ai_2301.00001.pdf (11 pages, 20 sections)
│ └── H2 NFTrig: Using Blockchain Technologies for Math Education
│ "JORDAN THOMPSON, Augustana College, USA"
├── cl_2302.00001.pdf (20 pages, 27 sections)
│ ├── H2 Quantum Computing for Plasma Physics
│ │ "Oscar Amaro and Diogo Cruz"
│ ├── H2 Introduction
│ │ "Quantum Computing (QC) is a branch of computing..."
│ ├── H2 Conclusions
│ └── H2 References
├── govt_nist_ai_risk.pdf (48 pages, 131 sections)
│ ├── H1 Artificial Intelligence Risk Management
│ ├── H1 Framework (AI RMF 1.0)
│ │ "NIST AI 100-1"
│ ├── H2 Executive Summary
│ └── H2 How AI Risks Differ from Traditional Software Risks
├── govt_nist_cybersecurity.pdf (55 pages, 696 sections)
│ ├── H1 Critical Infrastructure Cybersecurity
│ ├── H2 Executive Summary
│ │ "The United States depends on the reliable..."
│ └── H2 Appendix A: Framework Core
└── govt_nist_zero_trust.pdf (59 pages, 100 sections)
├── H1 NIST Special Publication 800-207
└── H1 Zero Trust ArchitectureOne call. Title, authors, page count, section count, and heading hierarchy for every PDF. With warm cache, this runs in <1s for 60 PDFs and ~3s for 123 PDFs.
mq uses a jq-inspired query syntax with piping and selectors. If you're familiar with jq, see docs/syntax.md for differences and design rationale.
The query language stays the same across formats. What changes is the structure that the parser can populate for a given document.
| Selector | Description |
|---|---|
.tree |
Document structure (adapts to file vs directory) |
.search("term") |
Find sections containing term (JSONL: line-level) |
.nth(N) |
Pick the Nth item from current results (0-based) |
.text |
Extract text content / flattened structured text |
.raw |
Extract source text / raw matched record |
.section("name") |
Section by heading |
.sections |
All sections |
.headings |
All headings |
.headings(2) |
H2 headings only |
.code / .code("lang") |
Code blocks |
.links / .images / .tables |
Other elements |
.metadata / .owner / .tags |
Frontmatter |
.md / .html / .json / .yaml |
Format cast: reparse string as another format |
| Operation | Description |
|---|---|
.text |
Extract raw content |
| .tree |
Pipe to tree view |
filter(.level == 2) |
Filter results |
depth(N) |
Limit tree traversal to N levels |
limit(N) |
Show max N entries per directory |
Cast operators reinterpret a string value as a different document format mid-pipeline. Use when structured content is embedded inside another format (e.g. markdown inside JSONL).
| Cast | Parses as | Example |
|---|---|---|
.md |
Markdown | .text | .md | .headings |
.html |
HTML | .text | .html | .links |
.json |
JSON | .raw | .json | .section("key") |
.yaml |
YAML | .text | .yaml | .tree |
# JSON field containing markdown -> extract headings
mq data.json '.section("readme") | .text | .md | .headings'
# JSONL record -> parse as JSON -> drill to a field -> cast to markdown
mq log.jsonl '.search("report") | .nth(0) | .raw | .json | .section("content") | .text | .md | .section("Summary") | .text'
# Claude session files: search conversations, extract structured content
mq ~/.claude/projects/-Users-you-project/ '.search("auth")'
mq session.jsonl '.search("AUDIT") | .nth(0) | .raw | .json | .section("content") | .text | .md | .headings'mq doc.md ".headings | filter(.level == 2) | .text"
mq doc.md ".section('Examples') | .code('python')"
mq doc.md ".section('API') | .tree"mq is built on a Structural AST Pattern: different formats are parsed into a common structural representation.
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Markdown │ │ HTML │ │ PDF │ │JSON/YAML │
│ Parser │ │ Parser │ │ Parser │ │ Parser │
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
└─────────────┴──────┬──────┴─────────────┘
▼
┌───────────────────────────────┐
│ Unified Document │
│ - Headings (h1-h6 levels) │
│ - Sections (hierarchical) │
│ - CodeBlocks (with lang) │
│ - Links, Images, Tables │
│ - ReadableText (for LLM) │
└───────────────┬───────────────┘
▼
┌───────────────────────────────┐
│ MQL Query Engine │
│ .headings | .section("API") │
└───────────────────────────────┘
lib/- Core document engine and unified typesmql/- Query language (lexer, parser, executor)html/- HTML parser with Readability extractionpdf/- PDF parser using PyMuPDF for structuredata/- JSON, JSONL, YAML parsers
| Type | Markdown | HTML | JSON/YAML | |
|---|---|---|---|---|
| Heading | # Title |
<h1> |
Large/bold text | Top-level keys |
| Section | Under heading | <section> |
Chapter/page | Nested objects |
| CodeBlock | Triple backticks | <pre><code> |
Monospace | N/A |
| Table | Pipe syntax | <table> |
Aligned grid | Uniform arrays |
| ReadableText | Full content | Main content | All text | Pretty-printed |
import mq "github.com/muqsitnawaz/mq/lib"
engine := mq.New()
doc, _ := engine.LoadDocument("README.md")
// Direct API
headings := doc.GetHeadings(1, 2) // H1 and H2 only
section, _ := doc.GetSection("Install") // Get specific section
code := doc.GetCodeBlocks("go") // Go code blocksFor MQL string queries, use the mql package:
import "github.com/muqsitnawaz/mq/mql"
engine := mql.New()
doc, _ := engine.LoadDocument("README.md")
result, _ := engine.Query(doc, `.section("API") | .code("go")`)See docs/library.md for the full API reference.
// Load and parse document
engine := mql.New()
doc, err := engine.LoadDocument("doc.md")
// Direct access methods
headings := doc.GetHeadings() // All headings
section, _ := doc.GetSection("Intro") // Specific section
codeBlocks := doc.GetCodeBlocks("go") // Go code blocks
links := doc.GetLinks() // All links
tables := doc.GetTables() // All tables
// Metadata access
if owner, ok := doc.GetOwner(); ok {
fmt.Printf("Owner: %s\n", owner)
}Benchmarked on Apple M3 Max, Go 1.24. The tables below only include benchmark paths that currently hit the real parser/query implementations.
| Path | Current benchmark result |
|---|---|
| Markdown parse | 100KB: 2.70ms, 1MB: 23.48ms, 10MB: 224.74ms |
| Markdown throughput | ~38-47 MB/s across 100KB-10MB |
| HTML parse | 1KB: 0.98ms, 10KB: 10.63ms, 100KB: 157.77ms |
| HTML throughput | ~0.65-1.09 MB/s |
| YAML parse | 1KB: 0.12ms, 10KB: 0.88ms, 100KB: 12.39ms |
| YAML throughput | ~8.28-11.65 MB/s |
| PDF cold parse | 10.86s-13.42s on 757KB-6.6MB real PDFs |
| PDF warm cache hit | 11.16ms-16.68ms |
| PDF BuildTree | 0.216ms-0.567ms |
| PDF Search | 0.754ms-0.973ms |
MQL .section("X") | .text |
9.58us after parse |
Measured with:
go test ./pdf/... -bench=BenchmarkPDF -benchmem -count=1| File | Size | Cold parse | Warm cache hit | BuildTree | Search |
|---|---|---|---|---|---|
bert.pdf |
757KB | 13.25s | 16.68ms | 0.377ms | 0.973ms |
attention.pdf |
2.1MB | 10.86s | 11.16ms | 0.567ms | 0.845ms |
raft.pdf |
6.6MB | 13.42s | 12.00ms | 0.216ms | 0.754ms |
Cold parse covers the full PDF pipeline. Warm cache hit measures Cache.LookupFile, which skips parsing and deserializes the cached Document.
Structure-first approach - load structure, not full text:
| Format | Traditional | mq Structure-First | Improvement |
|---|---|---|---|
| 16 papers | 800 PDFs | 50x | |
| Markdown | 16 docs | 80 docs | 5x |
| HTML | 8 pages | 40 pages | 5x |
| JSON/JSONL | - | 800KB / 8000 lines | - |
The agent loads ~1KB structure per PDF (vs ~50KB full text), reasons over 800 structures, then extracts only the sections it needs.
| Query | Time | Notes |
|---|---|---|
| GetSection | 9.2ns | O(1) exact title lookup |
| GetSectionFuzzy | 10.5ns | O(1) fuzzy title lookup |
| ReadableText | 0.28ns | O(1) cached string access |
| GetHeadings | 0.14us (1KB) to 8.34us (1MB) | Scales with heading count |
| GetCodeBlocks | 28ns (1KB) to 1.86us (1MB) | Scales with code block count |
MQL .headings |
0.55us | Full lex/parse/compile/exec |
MQL .section("X") | .text |
9.58us | Piped query with extraction |
Tested on Apple M3 Max. Corpus: 123 PDFs, 365MB, 317K lines across arXiv papers, NIST reports, and OpenStax textbooks.
| Query | Files | Cold | Warm (cached) | Speedup |
|---|---|---|---|---|
.tree |
9 | 24.5s | 0.25s | 98x |
.tree |
29 | 2:24 | 0.62s | 233x |
.tree |
58 | 1:40 | 0.96s | 104x |
.tree |
123 | 5:02 | 2.97s | 101x |
.search("algorithm") |
123 | — | 4.0s | — |
.search("security") |
123 | — | 4.4s | 3,311 match lines |
.section("risk") | .text |
1 (48pg) | — | 0.2s | — |
Cold parse is the one-time cost (PDF text + structure extraction). The cache (56MB bbolt DB for 123 PDFs) persists across sessions. Per-file warm cost: ~20ms.
Parsed documents and directory search results are cached in a content-addressed bbolt database (~/Library/Caches/mq/cache.db on macOS). Subsequent queries on the same file skip parsing, and repeated directory searches can skip the full scan when the tree hash is unchanged.
On the PDF corpus above, repeated loads drop from roughly 10.9-13.4 seconds to roughly 11-17 milliseconds once the cache is warm.
Warm cache hits still validate the file and deserialize the cached Document, so the main user-visible win is latency, not just throughput.
Measured with:
go test ./mql -bench 'BenchmarkDirectorySearch$' -run '^$' -benchtime=1x -count=1| Corpus | Cold | Warm exact repeat | Partial invalidation |
|---|---|---|---|
private-manuscript (185 files, 178 Markdown docs, 65,175 Markdown lines) |
2.21s | 11.98ms | 1.62s |
~/.rush/sessions (4.2GB) |
51.86s | 440.34ms | - |
Warm exact-repeat is still not free on very large trees because LookupDirSearch first recomputes the current directory hash before reusing cached results.
How it works:
- Parse cache: SHA256 content hash keys the parsed
Document, so repeated file queries skip reparsing and deserialize the cached structure instead. - Directory search cache:
(directory hash, query)keys exact-repeat directory searches, so unchanged trees can return cachedSearchResultsimmediately. - Per-file search cache:
(path, query, mtime, size)caches file-level matches so partially changed trees only reread the files that actually changed. - Byte reuse on matched files: directory search reuses bytes already read during the scan instead of rereading matched files before parse.
- Merkle directory tree: each directory stores a hash of its children's metadata, so repeated searches can detect unchanged trees without re-reading file contents first.
- Auto-eviction: entries unused for 5+ days are trimmed on startup
Cache can be cleared by deleting the database file or running rm ~/Library/Caches/mq/cache.db.
See bench/results.md for full benchmarks.
- Markdown: goldmark - extensible markdown parser
- HTML: x/net/html + custom Readability
- PDF: PyMuPDF - structure extraction via Python
- JSON/YAML: Go standard library + yaml.v3
- Cache: bbolt - single-file embedded database
- Serialization: msgpack - fast binary encoding (5x faster than gob)
# Run tests
go test ./...
# Build CLI
go build -o mq .
# Install locally
go install .MIT