-
-
Notifications
You must be signed in to change notification settings - Fork 103
05 Performance Optimization
Henry edited this page Aug 23, 2025
·
1 revision
Guide to optimizing MCP Memory Service for maximum performance and scalability.
- Quick Wins
- Database Optimization
- Query Performance
- Memory Management
- Monitoring & Metrics
- Troubleshooting Performance Issues
# SQLite-vec (Recommended for single-client)
export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
# Average read time: ~5ms
# ChromaDB (For multi-client)
export MCP_MEMORY_STORAGE_BACKEND=chroma
# Average read time: ~15ms
# Cloudflare (For distributed/production)
export MCP_MEMORY_STORAGE_BACKEND=cloudflare
# Network dependent
export MCP_HTTP_ENABLED=true
export MCP_HTTPS_ENABLED=true
export MCP_HTTP_PORT=8443
# ❌ Slow: Individual operations
for memory in memories:
await store_memory(memory)
# ✅ Fast: Batch operation
await store_memories_batch(memories)
# Optimize SQLite settings
export SQLITE_PRAGMA_CACHE_SIZE=10000
export SQLITE_PRAGMA_SYNCHRONOUS=NORMAL
export SQLITE_PRAGMA_WAL_AUTOCHECKPOINT=1000
# Optimize ChromaDB settings
chroma_settings = {
"anonymized_telemetry": False,
"allow_reset": False,
"is_persistent": True,
"persist_directory": "/path/to/chroma_db"
}
# SQLite maintenance (weekly)
sqlite3 memory.db "VACUUM;"
sqlite3 memory.db "REINDEX;"
sqlite3 memory.db "ANALYZE;"
# Check database size
sqlite3 memory.db "SELECT page_count * page_size as size FROM pragma_page_count(), pragma_page_size();"
# ❌ Slow: Vague search
results = await search("thing")
# ✅ Fast: Specific search
results = await search("authentication JWT token")
# For quick browsing
results = await search(query, limit=10)
# For existence check
exists = len(await search(query, limit=1)) > 0
# For comprehensive analysis
results = await search(query, limit=100)
# Most efficient: Tag search first (indexed)
tagged = await search_by_tag(["python", "error"])
# Then refine with text search
refined = await search("authentication", memories=tagged)
-- Ensure tag indexes exist
CREATE INDEX IF NOT EXISTS idx_memory_tags ON memories(tags);
CREATE INDEX IF NOT EXISTS idx_memory_created_at ON memories(created_at);
CREATE INDEX IF NOT EXISTS idx_memory_content_hash ON memories(content_hash);
# Use full-text search when available
results = await search_fts("authentication error python")
# Fall back to semantic search for complex queries
results = await search_semantic("how to fix JWT timeout issues")
# ❌ Memory intensive
all_memories = await get_all_memories()
filtered = [m for m in all_memories if condition(m)]
# ✅ Stream processing
async for memory in stream_memories():
if condition(memory):
yield memory
# Configure embedding cache
EMBEDDING_CACHE_SIZE = 1000 # Number of embeddings to cache
EMBEDDING_CACHE_TTL = 3600 # Cache TTL in seconds
# Query result caching
QUERY_CACHE_SIZE = 100 # Number of query results to cache
QUERY_CACHE_TTL = 300 # Cache TTL in seconds
# Limit memory usage
export MCP_MAX_MEMORY_MB=2048
# Limit concurrent operations
export MCP_MAX_CONCURRENT_OPERATIONS=10
# Limit embedding batch size
export MCP_EMBEDDING_BATCH_SIZE=50
# Query performance
query_time = time.time()
results = await search(query)
duration = time.time() - query_time
print(f"Query took {duration:.2f}s")
# Memory usage
import psutil
memory_usage = psutil.Process().memory_info().rss / 1024 / 1024
print(f"Memory usage: {memory_usage:.1f}MB")
# Database stats
stats = await get_database_stats()
print(f"Total memories: {stats.count}")
print(f"Database size: {stats.size_mb}MB")
# Health check endpoint
curl https://localhost:8443/api/health
# Stats endpoint
curl https://localhost:8443/api/stats
# Performance metrics
curl https://localhost:8443/api/metrics
# Enable performance logging
export MCP_LOG_LEVEL=INFO
export MCP_LOG_PERFORMANCE=true
# Monitor slow queries
export MCP_SLOW_QUERY_THRESHOLD=1000 # Log queries > 1s
Symptoms: Search takes >2 seconds Diagnosis:
# Check database size
stats = await get_db_stats()
if stats.size_mb > 1000:
print("Large database detected")
# Check index usage
explain_plan = await explain_query(search_query)
if "SCAN" in explain_plan:
print("Full table scan detected")
Solutions:
- Add missing indexes
- Optimize query patterns
- Consider database partitioning
Symptoms: Process using >4GB RAM Diagnosis:
# Check embedding cache
cache_stats = await get_embedding_cache_stats()
print(f"Cache size: {cache_stats.size}")
# Check for memory leaks
memory_trend = get_memory_usage_trend(hours=24)
if memory_trend.slope > 0.1:
print("Potential memory leak")
Solutions:
- Reduce cache sizes
- Enable garbage collection
- Restart service periodically
Symptoms: "Database is locked" errors Diagnosis:
# Check for long-running transactions
sqlite3 memory.db "SELECT * FROM sqlite_master WHERE type='table';"
# Check WAL file size
ls -la *.db-wal
Solutions:
- Enable WAL mode
- Reduce transaction scope
- Add connection pooling
# Benchmark search performance
async def benchmark_search():
queries = ["python", "error", "authentication", "database"]
times = []
for query in queries:
start = time.time()
results = await search(query, limit=10)
duration = time.time() - start
times.append(duration)
print(f"Query '{query}': {duration:.2f}s ({len(results)} results)")
avg_time = sum(times) / len(times)
print(f"Average search time: {avg_time:.2f}s")
# Run benchmark
await benchmark_search()
- Check query response times (<1s average)
- Monitor memory usage (<2GB)
- Verify database health
- Review slow query logs
- Run database VACUUM
- Update query statistics
- Review performance metrics
- Clean up old logs
- Analyze performance trends
- Update optimization settings
- Review capacity planning
- Performance regression testing
- Search queries: <500ms average
- Memory storage: <100ms average
- Health checks: <50ms average
- Bulk operations: <5s for 100 items
- Memory usage: <2GB for 100K memories
- Disk space: <1GB for 100K memories
- CPU usage: <10% average load
- Network: <1MB/s average throughput
Following these optimization guidelines will ensure your MCP Memory Service performs efficiently at any scale.