Skip to content

[Product] Context Memory Store - persistent deduplicated memory across sessions #29

@Siddhant-K-code

Description

@Siddhant-K-code

Summary

Add a persistent context memory store that accumulates knowledge across sessions, automatically deduplicating, compressing, and expiring stale context. Think of it as a vector DB with built-in context intelligence.

Problem

Today's AI agents are stateless between sessions. Each new conversation starts from scratch - re-fetching the same docs, re-reading the same code, re-discovering the same patterns. Memory solutions like Mem0 store raw conversation history, but they don't understand redundancy. After 100 sessions, you have 100 copies of "this project uses React with TypeScript."

What Distill should do differently

Distill already knows how to deduplicate, compress, and cluster. Apply that to persistent storage:

  • Write: Agent pushes context (code snippets, decisions, errors, learnings)
  • Deduplicate on write: New context is compared against existing memory. If semantically redundant, it's merged or discarded.
  • Read: Agent queries memory. Results are deduplicated, compressed, and ranked by relevance + recency.
  • Decay: Old, unreferenced memories get progressively compressed (full text -> summary -> keywords -> evicted)

API Design

POST /v1/memory/store
{
  "session_id": "session_abc",
  "entries": [
    {"text": "The auth service uses JWT with RS256", "source": "code_review", "tags": ["auth"]},
    {"text": "We switched from HS256 to RS256 in PR #142", "source": "git", "tags": ["auth", "security"]}
  ]
}

Response:
{
  "stored": 1,        // 1 new entry (the other was deduplicated against existing)
  "merged": 1,        // 1 entry merged with existing memory
  "total_memories": 847
}
POST /v1/memory/recall
{
  "query": "How does authentication work in this project?",
  "max_tokens": 2000,
  "recency_weight": 0.3
}

Response:
{
  "memories": [
    {"text": "Auth service uses JWT with RS256 (switched from HS256 in PR #142)", "relevance": 0.94, "last_referenced": "2026-02-14T..."},
    ...
  ],
  "stats": {
    "candidates": 23,
    "deduplicated": 8,
    "returned": 5,
    "token_count": 1840
  }
}
DELETE /v1/memory/forget
{
  "tags": ["deprecated"],
  "older_than": "2025-01-01"
}

Storage backends

Backend Use case
In-memory (default) Development, single-session
SQLite Local persistent storage
Redis Shared across instances
Postgres + pgvector Production, multi-tenant

Key design decisions

  • Dedup on write, not just read - prevents unbounded growth
  • Hierarchical decay - memories compress over time (full -> summary -> keywords -> evicted)
  • Source tracking - every memory knows where it came from (file, commit, conversation)
  • Tag-based organization - enables scoped recall ("only auth-related memories")
  • Token-budgeted recall - caller specifies max tokens, Distill fills the budget optimally

How this connects to existing Distill

  • Uses pkg/dedup for write-time deduplication
  • Uses pkg/compress for hierarchical decay
  • Uses pkg/contextlab (clustering + MMR) for read-time retrieval
  • Uses pkg/cache for hot-path acceleration
  • Exposes Prometheus metrics and OTEL traces

Deliverables

  • pkg/memory/store.go - Memory store interface
  • pkg/memory/sqlite.go - SQLite backend
  • pkg/memory/memory_test.go - Tests
  • cmd/memory.go - CLI commands (distill memory store, distill memory recall, distill memory stats)
  • API endpoints: /v1/memory/store, /v1/memory/recall, /v1/memory/forget, /v1/memory/stats
  • MCP tools: memory_store, memory_recall
  • Decay worker (background goroutine that compresses old memories)

Acceptance Criteria

  • Store 10K memories, recall in <50ms
  • Write-time dedup prevents duplicate storage
  • Hierarchical decay reduces storage over time
  • Token-budgeted recall fills context window optimally
  • Works as MCP tool in Claude Desktop

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions