-
-
Notifications
You must be signed in to change notification settings - Fork 11
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add a persistent context memory store that accumulates knowledge across sessions, automatically deduplicating, compressing, and expiring stale context. Think of it as a vector DB with built-in context intelligence.
Problem
Today's AI agents are stateless between sessions. Each new conversation starts from scratch - re-fetching the same docs, re-reading the same code, re-discovering the same patterns. Memory solutions like Mem0 store raw conversation history, but they don't understand redundancy. After 100 sessions, you have 100 copies of "this project uses React with TypeScript."
What Distill should do differently
Distill already knows how to deduplicate, compress, and cluster. Apply that to persistent storage:
- Write: Agent pushes context (code snippets, decisions, errors, learnings)
- Deduplicate on write: New context is compared against existing memory. If semantically redundant, it's merged or discarded.
- Read: Agent queries memory. Results are deduplicated, compressed, and ranked by relevance + recency.
- Decay: Old, unreferenced memories get progressively compressed (full text -> summary -> keywords -> evicted)
API Design
POST /v1/memory/store
{
"session_id": "session_abc",
"entries": [
{"text": "The auth service uses JWT with RS256", "source": "code_review", "tags": ["auth"]},
{"text": "We switched from HS256 to RS256 in PR #142", "source": "git", "tags": ["auth", "security"]}
]
}
Response:
{
"stored": 1, // 1 new entry (the other was deduplicated against existing)
"merged": 1, // 1 entry merged with existing memory
"total_memories": 847
}
POST /v1/memory/recall
{
"query": "How does authentication work in this project?",
"max_tokens": 2000,
"recency_weight": 0.3
}
Response:
{
"memories": [
{"text": "Auth service uses JWT with RS256 (switched from HS256 in PR #142)", "relevance": 0.94, "last_referenced": "2026-02-14T..."},
...
],
"stats": {
"candidates": 23,
"deduplicated": 8,
"returned": 5,
"token_count": 1840
}
}
DELETE /v1/memory/forget
{
"tags": ["deprecated"],
"older_than": "2025-01-01"
}
Storage backends
| Backend | Use case |
|---|---|
| In-memory (default) | Development, single-session |
| SQLite | Local persistent storage |
| Redis | Shared across instances |
| Postgres + pgvector | Production, multi-tenant |
Key design decisions
- Dedup on write, not just read - prevents unbounded growth
- Hierarchical decay - memories compress over time (full -> summary -> keywords -> evicted)
- Source tracking - every memory knows where it came from (file, commit, conversation)
- Tag-based organization - enables scoped recall ("only auth-related memories")
- Token-budgeted recall - caller specifies max tokens, Distill fills the budget optimally
How this connects to existing Distill
- Uses
pkg/dedupfor write-time deduplication - Uses
pkg/compressfor hierarchical decay - Uses
pkg/contextlab(clustering + MMR) for read-time retrieval - Uses
pkg/cachefor hot-path acceleration - Exposes Prometheus metrics and OTEL traces
Deliverables
-
pkg/memory/store.go- Memory store interface -
pkg/memory/sqlite.go- SQLite backend -
pkg/memory/memory_test.go- Tests -
cmd/memory.go- CLI commands (distill memory store,distill memory recall,distill memory stats) - API endpoints:
/v1/memory/store,/v1/memory/recall,/v1/memory/forget,/v1/memory/stats - MCP tools:
memory_store,memory_recall - Decay worker (background goroutine that compresses old memories)
Acceptance Criteria
- Store 10K memories, recall in <50ms
- Write-time dedup prevents duplicate storage
- Hierarchical decay reduces storage over time
- Token-budgeted recall fills context window optimally
- Works as MCP tool in Claude Desktop
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request