Local RAG (Retrieval-Augmented Generation) System for Claude Code
Hybrid search (Semantic + BM25) with keyword routing for your personal knowledge base
Features | Installation | Usage | API Reference | Architecture
hybrid_alpha=0now skips Ollama entirely - pure BM25 keyword search is instant (no embedding generation)hybrid_alpha=1.0skips BM25 - pure semantic search without keyword overhead- Default changed from 0.5 to 0.3 - keyword-heavy by default for faster responses
- 40+ new keywords for
redteamcategory: GTFOBins, LOLBAS, BYOVD, SQLi, XSS, SSTI, hashcat, Kerberoast, BloodHound, ADCS, etc. - Better routing accuracy for offensive security queries
- Hybrid search (Semantic + BM25) with Reciprocal Rank Fusion
- Word boundary keyword routing (no false positives)
- Parallel embeddings (4x faster indexing)
Knowledge RAG is a 100% local semantic search system that integrates with Claude Code via MCP (Model Context Protocol). It enables Claude to search through your documents (PDFs, Markdown, code, etc.) and retrieve relevant context for answering questions.
- Privacy First: All processing happens locally - no data leaves your machine
- Hybrid Search: Combines semantic understanding with exact keyword matching
- Multi-Format: Supports MD, PDF, TXT, Python, JSON files
- Smart Routing: Keyword-based routing with word boundaries ensures accurate category matching
- Claude Integration: Native MCP tools for seamless Claude Code integration
- Fast: Parallel embedding generation + vector search with ChromaDB
| Feature | Description |
|---|---|
| Hybrid Search | Semantic + BM25 keyword search with RRF fusion |
| Keyword Routing | Word-boundary aware routing for domain-specific queries |
| Multi-Format Parser | PDF, Markdown, TXT, Python, JSON support |
| Chunking with Overlap | Smart text splitting with context preservation |
| Category Organization | Organize docs by security, development, logscale, etc. |
| MCP Integration | Native Claude Code tools |
| Persistent Storage | ChromaDB with DuckDB backend |
| Local Embeddings | Ollama + nomic-embed-text (768 dimensions) |
| Parallel Processing | Multi-threaded embedding generation |
flowchart TB
subgraph MCP["π MCP SERVER (FastMCP)"]
direction TB
TOOLS["MCP Tools<br/>search_knowledge | get_document<br/>reindex_documents | list_categories"]
end
subgraph SEARCH["π HYBRID SEARCH ENGINE"]
direction LR
ROUTER["Keyword Router<br/>(word boundaries)"]
SEMANTIC["Semantic Search<br/>(ChromaDB)"]
BM25["BM25 Keyword<br/>(rank-bm25)"]
RRF["Reciprocal Rank<br/>Fusion (RRF)"]
ROUTER --> SEMANTIC
ROUTER --> BM25
SEMANTIC --> RRF
BM25 --> RRF
end
subgraph STORAGE["πΎ STORAGE LAYER"]
direction LR
CHROMA[("ChromaDB<br/>Vector Database")]
COLLECTIONS["Collections<br/>security | ctf<br/>logscale | development"]
CHROMA --- COLLECTIONS
end
subgraph EMBED["π§ EMBEDDINGS"]
OLLAMA["Ollama<br/>nomic-embed-text<br/>(768 dimensions)"]
PARALLEL["Parallel Processing<br/>(4 workers)"]
OLLAMA --- PARALLEL
end
subgraph INGEST["π DOCUMENT INGESTION"]
PARSERS["Parsers<br/>MD | PDF | TXT | PY | JSON"]
CHUNKER["Chunking<br/>1000 chars + 200 overlap"]
PARSERS --> CHUNKER
end
CLAUDE["βοΈ Claude Code"] --> MCP
MCP --> SEARCH
SEARCH --> STORAGE
STORAGE --> EMBED
INGEST --> EMBED
EMBED --> STORAGE
flowchart LR
subgraph INPUT["π Input"]
FILES["documents/<br/>βββ security/<br/>βββ logscale/<br/>βββ ctf/<br/>βββ development/"]
end
subgraph PARSE["π Parse"]
MD["Markdown<br/>Parser"]
PDF["PDF Parser<br/>(PyMuPDF)"]
TXT["Text<br/>Parser"]
CODE["Code Parser<br/>(PY/JSON)"]
end
subgraph CHUNK["βοΈ Chunk"]
SPLIT["Text Splitter<br/>1000 chars"]
OVERLAP["Overlap<br/>200 chars"]
SPLIT --> OVERLAP
end
subgraph EMBED["π§ Embed"]
PARALLEL["ThreadPoolExecutor<br/>(4 workers)"]
OLLAMA["Ollama API<br/>nomic-embed-text"]
PARALLEL --> OLLAMA
end
subgraph STORE["πΎ Store"]
CHROMADB[("ChromaDB")]
BM25IDX["BM25 Index"]
end
FILES --> MD & PDF & TXT & CODE
MD & PDF & TXT & CODE --> CHUNK
CHUNK --> EMBED
EMBED --> STORE
flowchart TB
QUERY["π User Query<br/>'mimikatz credential dump'"] --> ROUTER
subgraph ROUTING["π Keyword Routing"]
ROUTER["Keyword Router"]
MATCH{"Word Boundary<br/>Match?"}
CATEGORY["Filter: redteam"]
NOFILTER["No Filter"]
ROUTER --> MATCH
MATCH -->|Yes| CATEGORY
MATCH -->|No| NOFILTER
end
subgraph HYBRID["β‘ Hybrid Search"]
direction LR
SEMANTIC["Semantic Search<br/>(ChromaDB)<br/>Conceptual similarity"]
BM25["BM25 Search<br/>(rank-bm25)<br/>Exact term matching"]
end
subgraph FUSION["π Result Fusion"]
RRF["Reciprocal Rank Fusion<br/>score = Ξ£ 1/(k + rank)"]
COMBINE["Combine Rankings<br/>+ Deduplicate"]
SORT["Sort by<br/>Combined Score"]
RRF --> COMBINE --> SORT
end
CATEGORY --> HYBRID
NOFILTER --> HYBRID
SEMANTIC --> RRF
BM25 --> RRF
SORT --> RESULTS["π Results<br/>search_method: hybrid|semantic|keyword<br/>semantic_rank + bm25_rank"]
flowchart LR
subgraph ALPHA["hybrid_alpha values"]
A0["0.0<br/>Pure BM25<br/>β‘ INSTANT"]
A3["0.3 (default)<br/>Keyword-heavy<br/>β‘ Fast"]
A5["0.5<br/>Balanced"]
A7["0.7<br/>Semantic-heavy"]
A10["1.0<br/>Pure Semantic<br/>π’ Slower"]
end
subgraph USE["Best For"]
U0["CVEs, tool names<br/>exact matches<br/>NO Ollama needed"]
U3["Technical queries<br/>specific terms"]
U5["General queries"]
U7["Conceptual queries<br/>related topics"]
U10["'How to...' questions<br/>Requires Ollama"]
end
A0 --- U0
A3 --- U3
A5 --- U5
A7 --- U7
A10 --- U10
- Windows 10/11
- uv (Python package manager, automatically manages Python)
- Ollama (for local embeddings)
- Claude Code CLI
# Clone the repository
git clone https://github.com/lyonzin/knowledge-rag.git
cd knowledge-rag
# Run the installer
.\install.ps1-
Install uv
# Using PowerShell: powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Or use winget: winget install astral-sh.uv
-
Install Ollama
# Download from https://ollama.com # Or use winget: winget install Ollama.Ollama
-
Pull the embedding model
ollama pull nomic-embed-text
-
Clone and setup the project
git clone https://github.com/lyonzin/knowledge-rag.git cd knowledge-rag # Install dependencies (uv will automatically create a virtual environment) uv sync
-
Configure MCP for Claude Code
Add to
~/.claude.jsonundermcpServers:{ "mcpServers": { "knowledge-rag": { "type": "stdio", "command": "cmd", "args": ["/c", "cd /d C:\\path\\to\\knowledge-rag && uv run -m mcp_server.server"], "env": {} } } }Note: We use
cmd /cwithcd /dto ensure the working directory is set correctly before starting the Python server. This is required because Claude Code may not respect thecwdproperty in MCP configurations. -
Restart Claude Code
Place your documents in the documents/ directory, organized by category:
documents/
βββ security/ # Pentest, exploit, vulnerability docs
β βββ redteam/ # Red team specific
β βββ blueteam/ # Blue team specific
β βββ RTFM.pdf
βββ logscale/ # LogScale/LQL documentation
β βββ LQL_REFERENCE.md
βββ ctf/ # CTF writeups and methodology
βββ development/ # Code, APIs, frameworks
β βββ api-docs.md
βββ general/ # Everything else
βββ notes.txt
Documents are automatically indexed when Claude Code starts. To manually reindex:
# In Claude Code chat:
Use the reindex_documents tool with force=true to rebuild the index
Simply ask Claude questions! The RAG system automatically provides context:
User: How do I use formatTime in LogScale?
Claude: [Uses search_knowledge internally, retrieves relevant chunks]
Based on your documentation, formatTime in LogScale...
You can control the balance between semantic and keyword search:
// Pure keyword - INSTANT, no Ollama needed (default for exact terms)
search_knowledge("gtfobins suid", hybrid_alpha=0.0)
// Keyword-heavy (default) - fast, slight semantic boost
search_knowledge("lolbas certutil", hybrid_alpha=0.3)
// Balanced hybrid - both engines equally weighted
search_knowledge("SQL injection techniques", hybrid_alpha=0.5)
// Semantic-heavy - better for conceptual queries
search_knowledge("how to escalate privileges", hybrid_alpha=0.7)
// Pure semantic - Ollama only, no keyword matching
search_knowledge("how to bypass authentication", hybrid_alpha=1.0)Hybrid search combining semantic search + BM25 keyword search.
Parameters:
| Name | Type | Default | Description |
|---|---|---|---|
query |
string | required | Search query text |
max_results |
int | 5 | Maximum results (1-20) |
category |
string | null | Filter by category |
hybrid_alpha |
float | 0.3 | Balance: 0.0=keyword only, 1.0=semantic only |
Returns: JSON with search results including content, source, relevance score, and search method.
Example:
{
"status": "success",
"query": "mimikatz credential dump",
"hybrid_alpha": 0.5,
"result_count": 3,
"results": [
{
"content": "Mimikatz can extract credentials from memory...",
"source": "C:/docs/security/redteam/credential-attacks.pdf",
"filename": "credential-attacks.pdf",
"category": "redteam",
"score": 0.016393,
"semantic_rank": 2,
"bm25_rank": 1,
"search_method": "hybrid",
"keywords": ["mimikatz", "credential", "lsass"],
"routed_by": "redteam"
}
]
}Search Method Values:
hybrid: Found by both semantic and BM25 search (highest confidence)semantic: Found only by semantic searchkeyword: Found only by BM25 keyword search
Retrieve the full content of a specific document.
Parameters:
| Name | Type | Description |
|---|---|---|
filepath |
string | Path to the document |
Returns: JSON with document content and metadata.
Index or reindex all documents in the knowledge base.
Parameters:
| Name | Type | Default | Description |
|---|---|---|---|
force |
bool | false | If true, clears and rebuilds entire index (both ChromaDB and BM25) |
Returns: JSON with indexing statistics.
List all document categories with their document counts.
Returns:
{
"status": "success",
"categories": {
"security": 52,
"Detections_Rules ": 12,
"redteam": 3,
"blueteam": 3,
"ctf": 2,
"general": 1
},
"total_documents": 73
}List all indexed documents, optionally filtered by category.
Parameters:
| Name | Type | Description |
|---|---|---|
category |
string | Optional category filter |
Get statistics about the knowledge base index.
Returns:
{
"status": "success",
"stats": {
"total_documents": 73,
"total_chunks": 9256,
"categories": {"security": 52, "logscale": 12, ...},
"embedding_model": "nomic-embed-text",
"chunk_size": 1000,
"chunk_overlap": 200
}
}The system uses keyword routing with word boundaries to improve search accuracy.
flowchart TB
QUERY["Query: 'CVE-2021-44228 log4j'"] --> EXTRACT["Extract Keywords"]
subgraph ROUTES["π·οΈ Keyword Routes (config.py)"]
SEC["security<br/>anti-bot, waf bypass, cloudflare..."]
RED["redteam<br/>pentest, exploit, payload..."]
BLUE["blueteam<br/>detection, sigma, yara..."]
CTF["ctf<br/>ctf, flag, hackthebox..."]
LOG["logscale<br/>logscale, humio, lql..."]
DEV["development<br/>python, javascript, api..."]
end
EXTRACT --> CHECK{"Word Boundary<br/>Check (\\b)"}
CHECK -->|"'api' in query?"| BOUNDARY["Does NOT match<br/>'RAPID' or 'capital'"]
CHECK -->|"'log4j' matches"| MATCHED["β Matches 'security'<br/>route"]
BOUNDARY --> NOROUTE["No routing applied"]
MATCHED --> WEIGHT["Weighted Scoring<br/>Multiple matches = higher confidence"]
WEIGHT --> FILTER["Filter to 'security' category"]
Configure routes in mcp_server/config.py:
keyword_routes = {
"security": ["anti-bot", "waf bypass", "cloudflare", ...],
"redteam": ["pentest", "exploit", "payload", "reverse shell", ...],
"blueteam": ["detection", "sigma", "yara", "incident response", ...],
"ctf": ["ctf", "flag", "hackthebox", "tryhackme", ...],
"Detections_Rules": ["logscale", "humio", "lql", "formatTime", ...],
"development": ["python", "javascript", "api", "docker", ...]
}Word Boundary Matching: Single-word keywords use regex word boundaries (\b) to prevent false positives. For example, "api" won't match "RAPID".
Weighted Scoring: When multiple keywords match, the category with the most matches wins.
Adjust chunk size and overlap in config.py:
chunk_size = 1000 # Characters per chunk
chunk_overlap = 200 # Overlap between chunksThe default model is nomic-embed-text. To change:
- Pull a different model:
ollama pull <model-name> - Update
config.py:ollama_model = "<model-name>"
The hybrid_alpha parameter controls the balance:
| hybrid_alpha | Behavior | Speed | Best For |
|---|---|---|---|
| 0.0 | Pure BM25 keyword | Instant (no Ollama) | Exact terms, CVEs, tool names |
| 0.3 | Keyword-heavy (default) | Fast | Technical queries with specific terms |
| 0.5 | Balanced | Medium | General queries |
| 0.7 | Semantic-heavy | Slower | Conceptual queries |
| 1.0 | Pure semantic | Slower (Ollama required) | "How to..." questions |
flowchart TB
subgraph ROOT["π knowledge-rag/"]
direction TB
subgraph SERVER["π mcp_server/"]
INIT["__init__.py"]
CONFIG["config.py<br/>(settings, routes)"]
INGEST["ingestion.py<br/>(parsing, chunking)"]
SERV["server.py<br/>(MCP, ChromaDB, BM25)"]
end
subgraph DOCS["π documents/"]
SEC["security/<br/>Pentest, exploits"]
LOG["logscale/<br/>LQL docs"]
DEV["development/<br/>Code, APIs"]
GEN["general/<br/>Everything else"]
end
subgraph DATA["πΎ data/"]
CHROMA["chroma_db/<br/>(Vector storage)"]
META["index_metadata.json"]
end
subgraph FILES["π Root Files"]
INSTALL["install.ps1"]
PYPROJECT["pyproject.toml"]
UVLOCK["uv.lock"]
README["README.md"]
CHANGE["CHANGELOG.md"]
end
end
knowledge-rag/
βββ mcp_server/
β βββ __init__.py
β βββ config.py # Configuration settings
β βββ ingestion.py # Document parsing & chunking
β βββ server.py # MCP server, ChromaDB, BM25
βββ documents/ # Your documents go here
β βββ security/
β βββ Detections_Rules/
β βββ development/
β βββ general/
βββ data/
β βββ chroma_db/ # Vector database storage
β βββ index_metadata.json
βββ .claude/
β βββ mcp.json # Project MCP config
βββ .venv/ # Python virtual environment (created by uv)
βββ install.ps1 # Automated installer
βββ pyproject.toml # Project metadata & dependencies
βββ uv.lock # Locked dependency versions
βββ CHANGELOG.md # Version history
βββ README.md # This file
# Start Ollama
ollama serve
# Or check if running
curl http://localhost:11434/api/tagsChromaDB requires Python 3.12. Python 3.13+ is NOT supported due to onnxruntime compatibility. uv will automatically install and use the correct Python version based on .python-version file.
# Reinstall with correct Python version
uv sync# Check documents directory
ls documents/
# Force reindex
# In Claude Code: use reindex_documents(force=true)- Check
~/.claude.jsonexists and hasmcpServerssection with valid JSON - Verify paths use double backslashes (
\\) on Windows - Restart Claude Code completely
- Run
claude mcp listto check connection status
Install the BM25 dependency:
uv syncThis error occurs when Claude Code doesn't set the working directory correctly. Solution: Use the cmd /c "cd /d ... && python" wrapper in your config:
{
"knowledge-rag": {
"type": "stdio",
"command": "cmd",
"args": ["/c", "cd /d C:\\path\\to\\knowledge-rag && uv run -m mcp_server.server"],
"env": {}
}
}- FIX:
hybrid_alpha=0now completely skips Ollama embedding - pure BM25 is instant - FIX:
hybrid_alpha=1.0now skips BM25 search - no unnecessary keyword computation - CHANGED: Default
hybrid_alphafrom 0.5 to 0.3 (keyword-heavy for faster responses) - NEW: 40+ keyword routes for
redteamcategory (GTFOBins, LOLBAS, BYOVD, SQLi, XSS, SSTI, hashcat, etc.)
- NEW: Interactive Mermaid flowcharts for architecture visualization
- NEW: Query processing flow diagram showing hybrid search internals
- NEW: Document ingestion pipeline diagram
- NEW: Keyword routing visual explanation
- NEW: hybrid_alpha parameter effect diagram
- NEW: Project structure visual diagram
- IMPROVED: Documentation now GitHub-native with dark/light theme support
- NEW: Hybrid search combining semantic + BM25 keyword search
- NEW: Reciprocal Rank Fusion (RRF) for optimal result ranking
- NEW:
hybrid_alphaparameter to control search balance - NEW:
search_methodfield in results (hybrid/semantic/keyword) - IMPROVED: Keyword routing with word boundaries (no more false positives)
- IMPROVED: Weighted scoring for multiple keyword matches
- IMPROVED: Parallel embedding generation (4x faster indexing)
- IMPROVED: Input validation on MCP tools
- FIXED: Chunk loop infinite loop potential
- FIXED: Embedding error handling
- Auto-cleanup orphan UUID folders on reindex
- Removed hardcoded user paths
- Made install.ps1 plug-and-play
- Initial release
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ChromaDB - Vector database
- Ollama - Local LLM & embeddings
- FastMCP - MCP framework
- PyMuPDF - PDF parsing
- rank-bm25 - BM25 implementation
Ailton Rocha (Lyon)
AI Operator | Security Researcher | Developer