Skip to content

Commit 784e898

Browse files
Reuvenruvnet
andcommitted
docs(adr): ADR-148 brain hypothesis engine — Gemini + DiskANN + auto-experimentation
Proposes four additive capabilities for the pi.ruv.io brain: 1. Hypothesis generation via Gemini 2.5 Flash on cross-domain edges 2. Quality scoring via DiskANN + PageRank (ForwardPush sublinear) 3. Noise filtering (ingestion gate + meta-mincut on knowledge graph) 4. Self-improvement tracking (50-query benchmark suite + auto-rollback) All feature-gated. No changes to running brain. Separate Cloud Run service for hypothesis engine. DiskANN is fallback-only (HNSW stays primary <50K). 5-week phased implementation. ~$0.03/day Gemini cost. Co-Authored-By: claude-flow <ruv@ruv.net>
1 parent a40e2be commit 784e898

File tree

1 file changed

+238
-0
lines changed

1 file changed

+238
-0
lines changed
Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# ADR-148: Brain Hypothesis Engine — Self-Improving Knowledge System with Gemini, DiskANN, and Auto-Experimentation
2+
3+
## Status
4+
5+
Proposed
6+
7+
## Date
8+
9+
2026-04-13
10+
11+
## Context
12+
13+
The pi.ruv.io brain (10,300+ memories, 38M graph edges, LoRA epoch 41) stores and retrieves knowledge but cannot:
14+
1. Generate hypotheses from cross-domain connections
15+
2. Evaluate quality beyond embedding similarity (quality scores mostly 0.0)
16+
3. Filter noise from curated knowledge (random IEEE events alongside real patterns)
17+
4. Measure whether LoRA training actually improves search quality
18+
19+
The brain runs on Google Cloud Run (`ruvbrain` service, us-central1) backed by `crates/mcp-brain-server/` (Rust/Axum). Current embedding: `ruvllm::RlmEmbedder` at 128-dim. Current index: flat HNSW.
20+
21+
## Decision
22+
23+
Add four capabilities as **additive layers** — no changes to the running brain's core path. All new code is behind feature flags or in separate Cloud Run services.
24+
25+
### Architecture: Three New Components
26+
27+
```
28+
┌─────────────────────────────────────────────────────────┐
29+
│ EXISTING (untouched) │
30+
│ mcp-brain-server: store, search, graph, drift, LoRA │
31+
│ Embedder: ruvllm::RlmEmbedder (128-dim) │
32+
│ Index: flat HNSW │
33+
└──────────────┬──────────────────────────────────────────┘
34+
│ (reads from, writes back to)
35+
v
36+
┌─────────────────────────────────────────────────────────┐
37+
│ NEW: Hypothesis Engine (separate Cloud Run service) │
38+
│ │
39+
│ 1. HYPOTHESIS GENERATOR │
40+
│ - Watches for new cross-domain graph edges │
41+
│ - Templates: "If X works in domain A, │
42+
│ then X should work in domain B" │
43+
│ - Uses Gemini 2.5 Flash for hypothesis formulation │
44+
│ and experiment design │
45+
│ - Stores hypotheses as "untested" memories │
46+
│ │
47+
│ 2. QUALITY SCORER │
48+
│ - DiskANN index over all 10K+ memory embeddings │
49+
│ - PageRank via ruvector-solver ForwardPush │
50+
│ - Multi-signal: centrality + citations + verdicts │
51+
│ + contributor rep + temporal + surprise │
52+
│ - Updates quality field via brain API │
53+
│ │
54+
│ 3. NOISE FILTER │
55+
│ - Ingestion gate: regex + embedding dedup │
56+
│ - Weekly cleanup: archive orphan low-quality │
57+
│ - Meta-mincut: ruvector-mincut on knowledge graph │
58+
│ to find noise partition │
59+
│ │
60+
│ 4. BENCHMARK SUITE │
61+
│ - 50 curated test queries with known-good answers │
62+
│ - Runs before/after each LoRA epoch │
63+
│ - Tracks MRR, precision@5, cross-domain recall │
64+
│ - Auto-rollback if MRR drops > 5% │
65+
│ │
66+
└─────────────────────────────────────────────────────────┘
67+
```
68+
69+
### Component Details
70+
71+
#### Gemini 2.5 Flash for Hypothesis Generation
72+
73+
**Why Gemini, not local LLM:**
74+
- Hypothesis generation is infrequent (triggered by new cross-domain edges, ~10/day)
75+
- Requires reasoning about domain transfer ("if mincut detects seizures, could it detect X?")
76+
- Gemini 2.5 Flash: fast, cheap (~$0.15/1M input tokens), 1M context window
77+
- Local RLM embedder stays for indexing (it's tuned to the corpus) — Gemini is for reasoning only
78+
79+
**API integration:**
80+
```rust
81+
// New module: crates/mcp-brain-server/src/hypothesis.rs
82+
// Feature-gated: #[cfg(feature = "hypothesis")]
83+
84+
use google_generativeai::Client; // or raw REST via reqwest
85+
86+
async fn generate_hypothesis(edge: &CrossDomainEdge) -> Hypothesis {
87+
let prompt = format!(
88+
"Given this cross-domain connection:\n\
89+
Domain A: {}\nDomain B: {}\nBridge concept: {}\n\n\
90+
Generate a testable hypothesis: if the pattern from domain A \
91+
works, what specific prediction does it make in domain B? \
92+
Include: hypothesis statement, test method, expected outcome, \
93+
null hypothesis, required data.",
94+
edge.domain_a, edge.domain_b, edge.bridge_concept
95+
);
96+
// Call Gemini 2.5 Flash
97+
let response = gemini_client.generate(&prompt).await?;
98+
parse_hypothesis(response)
99+
}
100+
```
101+
102+
**Cost estimate:** ~10 hypotheses/day × ~500 tokens each = ~5K tokens/day = ~$0.001/day. Negligible.
103+
104+
#### DiskANN for Scalable Quality Scoring
105+
106+
**Why DiskANN, not current flat HNSW:**
107+
- Current HNSW is in-memory, fine for 10K memories
108+
- At 100K+ memories (projected within months), memory pressure becomes real
109+
- DiskANN stores the graph on SSD, loads only neighbors on demand
110+
- Product Quantization (PQ) compresses vectors 4-8x for candidate filtering
111+
- `ruvector-diskann` already implements Vamana graph + PQ (ADR-146)
112+
113+
**Integration plan:**
114+
```rust
115+
// New module: crates/mcp-brain-server/src/diskann_index.rs
116+
// Feature-gated: #[cfg(feature = "diskann")]
117+
118+
use ruvector_diskann::{DiskAnnIndex, DiskAnnConfig};
119+
120+
pub struct HybridIndex {
121+
hnsw: HnswIndex, // Existing, stays as primary for <50K
122+
diskann: DiskAnnIndex, // New, activates at >50K memories
123+
threshold: usize, // Switch point (default: 50_000)
124+
}
125+
126+
impl HybridIndex {
127+
pub fn search(&self, query: &[f32], k: usize) -> Vec<(usize, f32)> {
128+
if self.hnsw.len() < self.threshold {
129+
self.hnsw.search(query, k)
130+
} else {
131+
self.diskann.search(query, k)
132+
}
133+
}
134+
}
135+
```
136+
137+
**Benchmark plan:** Run both HNSW and DiskANN on the current 10K corpus, measure:
138+
- Recall@10 (should be >95% for both)
139+
- Query latency (HNSW: ~1ms, DiskANN: ~5-10ms expected)
140+
- Memory usage (HNSW: ~50MB, DiskANN: ~5MB + SSD)
141+
- Index build time
142+
143+
#### Quality Scorer with ForwardPush PageRank
144+
145+
```rust
146+
// crates/mcp-brain-server/src/quality.rs
147+
148+
pub fn compute_quality_scores(brain: &Brain) -> Vec<(MemoryId, f64)> {
149+
// 1. Build CSR graph from memory edges
150+
let graph = brain.graph_to_csr();
151+
152+
// 2. Run ForwardPush PageRank (sublinear, O(1/epsilon))
153+
let pr = ForwardPushSolver::new(0.85, 0.001);
154+
let pagerank = pr.solve(&graph)?;
155+
156+
// 3. Compute multi-signal quality
157+
brain.memories().map(|m| {
158+
let centrality = pagerank[m.id];
159+
let citations = m.inbound_edge_count as f64 / max_citations;
160+
let verdict = match m.verdict {
161+
Confirmed => 1.0,
162+
Refuted => -0.5,
163+
Untested => 0.0,
164+
};
165+
let surprise = 1.0 - m.max_similarity_to_existing;
166+
let temporal = recency_weight(m.created_at);
167+
let bridge = if m.crosses_domains { 0.3 } else { 0.0 };
168+
169+
let quality = 0.25 * centrality
170+
+ 0.20 * citations
171+
+ 0.20 * verdict
172+
+ 0.15 * surprise
173+
+ 0.10 * temporal
174+
+ 0.10 * bridge;
175+
176+
(m.id, quality.clamp(0.0, 1.0))
177+
}).collect()
178+
}
179+
```
180+
181+
### Safety Constraints (don't break the running system)
182+
183+
1. **All new code is feature-gated.** The existing `mcp-brain-server` binary is unchanged unless `--features hypothesis,diskann,benchmark` is explicitly enabled.
184+
185+
2. **Hypothesis engine runs as a SEPARATE Cloud Run service.** It calls the brain's API; it doesn't modify the brain's process. If it crashes, the brain keeps running.
186+
187+
3. **DiskANN is a fallback, not a replacement.** HNSW stays as primary for <50K memories. DiskANN only activates when memory count exceeds the threshold. Both can be queried in parallel for benchmark comparison.
188+
189+
4. **Quality scores are written to a NEW field (`quality_v2`).** The existing `quality` field is untouched until v2 scores are validated.
190+
191+
5. **Noise filtering is archive-only.** Memories are archived (moved to cold storage), never deleted. Full rollback possible.
192+
193+
6. **Benchmark auto-rollback.** If LoRA epoch N+1 degrades MRR by >5%, the epoch is discarded and the EWC checkpoint is restored automatically.
194+
195+
7. **Gemini API key stored in gcloud secrets.** Already available as `GEMINI_API_KEY`. Rate-limited to 10 calls/hour to avoid cost surprises.
196+
197+
### Implementation Phases
198+
199+
| Phase | What | Risk | Timeline |
200+
|-------|------|------|----------|
201+
| **P0: ADR + Branch** | This document + feature branch | None | Done |
202+
| **P1: Benchmark suite** | 50 test queries, MRR tracking | None (read-only) | 3 days |
203+
| **P2: Quality scorer** | PageRank + multi-signal scoring | Low (writes to new field) | 1 week |
204+
| **P3: Noise filter** | Ingestion gate + weekly cleanup | Low (archive-only) | 3 days |
205+
| **P4: DiskANN integration** | Hybrid index behind feature flag | Low (fallback only) | 1 week |
206+
| **P5: Hypothesis engine** | Gemini integration + auto-test | Medium (new service) | 2 weeks |
207+
208+
**Total: ~5 weeks, phased. P1-P3 can run in parallel.**
209+
210+
## Consequences
211+
212+
### Positive
213+
- Brain evolves from "smart database" to "scientific reasoner"
214+
- Quality scores become meaningful (currently all 0.0)
215+
- Noise filtering reduces graph pollution
216+
- LoRA training becomes measurable and rollback-safe
217+
- DiskANN prepares for 100K+ memory scale
218+
- Gemini hypothesis generation is the first step toward autonomous discovery
219+
220+
### Negative
221+
- New dependency: Google Gemini API (adds cost, ~$0.03/day estimated)
222+
- DiskANN adds complexity to the index path
223+
- Hypothesis engine needs curation — false hypotheses could pollute if not filtered
224+
- More Cloud Run services to monitor
225+
226+
### Risks
227+
- Gemini may generate low-quality hypotheses → mitigated by verdict system (untested until confirmed)
228+
- DiskANN recall may be lower than HNSW at small corpus → mitigated by hybrid approach with threshold
229+
- Quality scoring may be gamed by circular citations → mitigated by PageRank dampening
230+
231+
## References
232+
233+
- ADR-146: DiskANN Vamana Implementation
234+
- ADR-131: Consciousness Metrics Crate
235+
- ADR-048: Sublinear Graph Attention
236+
- Subramanya et al., "DiskANN: Fast Accurate Billion-point Nearest Neighbor Search" (NeurIPS 2019)
237+
- Google Gemini API: https://ai.google.dev/gemini-api
238+
- ForwardPush PPR: Andersen, Chung, Lang 2006

0 commit comments

Comments
 (0)