Skip to content

Commit 6e9c558

Browse files
monotykamarydosco
authored andcommitted
feat(gepa): GEPA/GEPA-Flow Pareto optimizers + docs alignment (ax-llm#341)
* feat(optimizer): introduce GEPA reflective evolution with Pareto sampling Adds a sample-efficient optimizer with reflective mutation, Pareto-based candidate selection, optional crossover, and multi-objective support. Includes typed options and examples, and improves progress logging of total rounds. * feat(optimizer): add GEPA-Flow for multi-module reflective evolution Implements a Flow-aware GEPA variant that selects modules round-robin and supports system-aware merge across candidates. Adds AxFlow helpers to expose/set node instructions and exports the optimizer. * feat(optimizer): make GEPA and GEPA-Flow Pareto-only to align with the paper and simplify the API; refactor examples to remove basic variants and promote Pareto examples * refactor(optimizer): remove legacy single-objective compile and rename GEPA-Pareto to GEPA in logs/metrics * docs(optimize): clarify that GEPA/GEPA-Flow use compile for Pareto; MiPRO continues to use compilePareto * chore(gepa-flow): use flow() factory and add OptimizationStart logging; align labels to GEPA-Flow * feat(gepa,gepa-flow): adopt per-instance Pareto selection (Alg. 2) Implement paper’s Algorithm 2 candidate selection: - Track per-instance scalar scores on validation/Pareto set (S matrix) - Sample parent from non-dominated set weighted by per-instance wins - Compute scalar as mean of multi-objective metrics per instance - GEPA-Flow also samples second parent for merge via Algorithm 2 - Persist S for each accepted candidate to drive subsequent sampling Rationale: align behavior with the GEPA paper and improve exploration vs. archive crowding-distance selection; sets the stage for optional scalar acceptance gating in a follow-up. * refactor(optimizer): centralize Pareto helpers for GEPA/GEPA-Flow Extract shared multi-objective utilities to paretoUtils and replace inline duplicates in gepa.ts and gepaFlow.ts. No functional changes; simplifies maintenance and sets up for paper parity. * feat(gepa,gepa-flow): add Merge strategy and guards; parametrize Pareto size GEPA: enable periodic instruction merge with cap and progress reporting. GEPA-Flow: add merge caps, ancestry/desirability guards per Appendix F, and make Pareto set size configurable via args. Keeps acceptance via minibatch Pareto dominance. * feat(gepa,gepa-flow): align with GEPA paper (splits, μf, σ-accept, guards)\n\nIntroduce explicit D_feedback/D_pareto splits to control rollout budget; plumb evaluator textual feedback μ_f into reflection; default to σ-based minibatch acceptance with configurable epsilon; add scalarizer/metric-key for per-instance S and Pareto selection; implement system-aware merge guards (ancestor/outperforms, desirability, tried merges). * feat(gepa,gepa-flow): source-parity merges, acceptance, and adapter path - Schedule merges via mergesDue/lastIterFoundNewProgram and skip reflective on merge attempts - Dominator-based pair + ancestor selection with desirability filter and duplicate-merge guard - Targeted subsample for merge acceptance (new_sum ≥ max(parent sums)); full eval on accept - Stricter minibatch acceptance; when adapter provided, also require minibatch sum(child) > sum(parent) - Parent selection via per-instance fronts; honor maxMetricCalls; preserve fallback behavior without adapter This aligns both GEPA and GEPA-Flow with the reference engine while keeping public API stable. * feat(gepa,gepa-flow): deterministic selection + strict acceptance for source parity Seed RNG across selection/minibatching, enforce maxMetricCalls budget, and add Flow merge guards/de-dup for stable improvements; re-export adapter types and update examples. * chore(cspell,gepa,gepa-flow): add GEPA terms; resolve lint warnings Add domain terms (GEPA, Traj, etc.) to cspell and ignore dist to keep spelling checks green. Rename unused vars and drop unused imports in GEPA optimizers to satisfy lint without behavior changes. * feat(gepa): source-parity single-module merges, guards, and safer defaults Align GEPA merges with source: replace LLM merge with parent-pick, add ancestor/desirability guards and de-dup, use seeded sampling, schedule merges after accepted improvements; default merges off and skipPerfectScore on to match reference behavior. * feat(gepa,gepa-flow,optimizer): epsilon ties, optional budget, aligned defaults Bring GEPA in line with source parity by tolerating score ties, removing the hard budget requirement, and defaulting skip‑perfect in flow to match single‑module; Pareto frontier now respects epsilon. * feat(gepa,gepa-flow): require maxMetricCalls for strict parity Enforce a positive `options.maxMetricCalls` in GEPA/GEPA-Flow compile loops to match the source implementation and avoid unbounded optimization runs. BREAKING CHANGE: compile now throws if `options.maxMetricCalls` is absent or non-positive. * fix(gepa): only skip reflective after an evaluated merge attempt\n\nAlign single-module merge gating with the reference engine so reflective mutation is skipped only when a merge is actually attempted, improving behavioral parity and avoiding lost reflective iterations when no valid merge pair exists. * docs(optimize): migrate multi-objective docs to GEPA/GEPA-Flow using compile (remove compilePareto) --------- Co-authored-by: Spacy <832235+dosco@users.noreply.github.com>
1 parent 6cee126 commit 6e9c558

16 files changed

Lines changed: 3051 additions & 170 deletions

File tree

.cspell/project-words.txt

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,5 +144,21 @@ anns
144144
annos
145145
mundo
146146
fmap
147+
GEPA
148+
gepa
149+
Traj
150+
vecs
151+
Scalarizer
152+
Scalarize
153+
scalarize
154+
scalarized
155+
idxs
156+
Instrs
157+
desirables
158+
subscores
159+
xorshift
160+
xorshift32
161+
Cand
162+
arrs
147163
PKCE
148-
MCPO
164+
MCPO

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,6 @@ site/dist
2323
*.ac3
2424
*.webm
2525
*.txt
26-
__pycache__
26+
__pycache__
27+
28+
gepa/

cspell.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
"./src/examples",
1515
"./scripts",
1616
"./build",
17-
"./dist"
17+
"./dist",
18+
"src/**/dist"
1819
]
1920
}

docs/OPTIMIZE.md

Lines changed: 91 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -750,13 +750,16 @@ const optimizer = new AxBootstrapFewShot({
750750
});
751751
```
752752

753-
### 2. Multi-Objective Optimization with `compilePareto`
753+
### 2. Multi-Objective Optimization with GEPA and GEPA-Flow
754754

755755
**The Problem**: Sometimes you care about multiple things at once - accuracy AND
756756
speed AND cost. Traditional optimization only handles one objective at a time.
757757

758-
**The Solution**: `compilePareto` finds the optimal trade-offs between multiple
759-
objectives using Pareto frontier analysis.
758+
**The Solution**: Use `AxGEPA` (single-module) or `AxGEPAFlow` (multi-module)
759+
with a multi-objective metric. Both use `compile(...)` and return a Pareto
760+
frontier of trade-offs plus hypervolume metrics.
761+
762+
> Note: Pass `maxMetricCalls` in `compile` options to bound evaluation cost.
760763
761764
#### What is Pareto Optimization?
762765

@@ -774,7 +777,7 @@ Solutions A and B are both Pareto optimal (A is more accurate but
774777
slower/expensive, B is faster/cheaper but less accurate). Solution C is
775778
dominated by both A and B.
776779

777-
#### When to Use `compilePareto`
780+
#### When to Use GEPA / GEPA-Flow
778781

779782
**Perfect for:**
780783

@@ -785,130 +788,96 @@ dominated by both A and B.
785788

786789
**Skip for:**
787790

788-
- Single clear objective (use regular `compile`)
791+
- Single clear objective (use regular `AxMiPRO.compile`)
789792
- When one objective is clearly most important
790-
- Quick prototyping (more complex than single-objective)
793+
- Quick prototyping (multi-objective adds complexity)
791794

792-
#### Complete Working Example
795+
#### Complete Working Example (GEPA)
793796

794797
```typescript
795-
import { ai, ax, AxMiPRO } from "@ax-llm/ax";
798+
import { ai, ax, AxGEPA } from "@ax-llm/ax";
796799

797-
// Content moderation with multiple objectives
798-
const contentModerator = ax(`
799-
userPost:string "User-generated content" ->
800-
isSafe:class "safe, unsafe" "Content safety",
801-
confidence:number "Confidence 0-1",
802-
reason:string "Explanation if unsafe"
800+
// Two-objective demo: accuracy (classification) + brevity (short rationale)
801+
const moderator = ax(`
802+
userPost:string "User content" ->
803+
isSafe:class "safe, unsafe" "Safety",
804+
rationale:string "One concise sentence"
803805
`);
804806

805-
// Training examples
806-
const examples = [
807-
{
808-
userPost: "Great weather today!",
809-
isSafe: "safe",
810-
confidence: 0.95,
811-
reason: "",
812-
},
813-
{
814-
userPost: "This product sucks and the company is terrible!",
815-
isSafe: "unsafe",
816-
confidence: 0.8,
817-
reason: "Aggressive language",
818-
},
819-
// ... more examples
807+
const train = [
808+
{ userPost: "Great weather today!", isSafe: "safe" },
809+
{ userPost: "This product sucks and the company is terrible!", isSafe: "unsafe" },
810+
// ...
820811
];
821812

822-
// Multi-objective metric function
823-
const multiMetric = ({ prediction, example }) => {
824-
// Calculate multiple scores
825-
const accuracy = prediction.isSafe === example.isSafe ? 1 : 0;
826-
827-
// Reward high confidence when correct, penalize when wrong
828-
const confidenceScore = prediction.isSafe === example.isSafe
829-
? (prediction.confidence || 0)
830-
: (1 - (prediction.confidence || 0));
831-
832-
// Reward explanations for unsafe content
833-
const explanationScore = example.isSafe === "unsafe"
834-
? (prediction.reason && prediction.reason.length > 10 ? 1 : 0)
835-
: 1; // No penalty for safe content
813+
const val = [
814+
{ userPost: "Reminder: submit timesheets", isSafe: "safe" },
815+
{ userPost: "Data breach follow-up actions required", isSafe: "unsafe" },
816+
// ...
817+
];
836818

837-
// Return multiple objectives
838-
return {
839-
accuracy, // Correctness of safety classification
840-
confidence: confidenceScore, // Quality of confidence calibration
841-
explanation: explanationScore, // Quality of reasoning
842-
};
819+
// Multi-objective metric
820+
const multiMetric = ({ prediction, example }: any) => {
821+
const accuracy = prediction?.isSafe === example?.isSafe ? 1 : 0;
822+
const rationale: string = typeof prediction?.rationale === 'string' ? prediction.rationale : '';
823+
const len = rationale.length;
824+
const brevity = len <= 30 ? 1 : len <= 60 ? 0.7 : len <= 100 ? 0.4 : 0.1;
825+
return { accuracy, brevity } as Record<string, number>;
843826
};
844827

845-
// Set up optimizer
846-
const optimizer = new AxMiPRO({
847-
studentAI: ai({
848-
name: "openai",
849-
apiKey: process.env.OPENAI_APIKEY!,
850-
config: { model: "gpt-4o-mini" },
851-
}),
852-
examples,
853-
options: { verbose: true },
854-
});
828+
const student = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY!, config: { model: 'gpt-4o-mini' } });
829+
const optimizer = new AxGEPA({ studentAI: student, numTrials: 16, minibatch: true, minibatchSize: 6, seed: 42, verbose: true });
855830

856-
// Run multi-objective optimization
857-
console.log("🔄 Finding optimal trade-offs...");
858-
const result = await optimizer.compilePareto(
859-
contentModerator,
860-
examples,
861-
multiMetric,
831+
console.log("🔄 Finding Pareto trade-offs...");
832+
const result = await optimizer.compile(
833+
moderator as any,
834+
train,
835+
multiMetric as any,
836+
{
837+
validationExamples: val,
838+
// Required to bound evaluation cost
839+
maxMetricCalls: 200,
840+
// Optional: provide a tie-break scalarizer for selection logic
841+
// paretoMetricKey: 'accuracy',
842+
// or
843+
// paretoScalarize: (s) => 0.7*s.accuracy + 0.3*s.brevity,
844+
} as any
862845
);
863846

864-
console.log(`✅ Found ${result.paretoFrontSize} optimal solutions!`);
865-
console.log(`📊 Hypervolume: ${result.hypervolume?.toFixed(4) || "N/A"}`);
847+
console.log(`✅ Found ${result.paretoFrontSize} Pareto points`);
848+
console.log(`📊 Hypervolume (2D): ${result.hypervolume ?? 'N/A'}`);
866849

867-
// Explore the Pareto frontier
868-
result.paretoFront.forEach((solution, index) => {
869-
console.log(`\n🎯 Solution ${index + 1}:`);
870-
console.log(` Accuracy: ${(solution.scores.accuracy * 100).toFixed(1)}%`);
871-
console.log(
872-
` Confidence: ${(solution.scores.confidence * 100).toFixed(1)}%`,
873-
);
874-
console.log(
875-
` Explanation: ${(solution.scores.explanation * 100).toFixed(1)}%`,
876-
);
877-
console.log(` Strategy: ${solution.configuration.strategy}`);
878-
console.log(` Dominates: ${solution.dominatedSolutions} other solutions`);
850+
// Inspect a few points
851+
for (const [i, p] of [...result.paretoFront].entries()) {
852+
if (i >= 3) break;
853+
console.log(` #${i+1}: acc=${(p.scores as any).accuracy?.toFixed(3)}, brev=${(p.scores as any).brevity?.toFixed(3)}, config=${JSON.stringify(p.configuration)}`);
854+
}
855+
856+
// Choose a compromise by weighted sum (example)
857+
const weights = { accuracy: 0.7, brevity: 0.3 };
858+
const best = result.paretoFront.reduce((best, cur) => {
859+
const s = weights.accuracy * ((cur.scores as any).accuracy ?? 0) + weights.brevity * ((cur.scores as any).brevity ?? 0);
860+
const b = weights.accuracy * ((best.scores as any).accuracy ?? 0) + weights.brevity * ((best.scores as any).brevity ?? 0);
861+
return s > b ? cur : best;
879862
});
863+
console.log(`🎯 Chosen config: ${JSON.stringify(best.configuration)}`);
880864
```
881865

882-
#### Choosing the Best Solution
866+
#### GEPA-Flow (Multi-Module)
883867

884868
```typescript
885-
// Option 1: Pick the solution that dominates the most others
886-
const mostDominant = result.paretoFront.reduce((best, current) =>
887-
current.dominatedSolutions > best.dominatedSolutions ? current : best
888-
);
889-
890-
// Option 2: Pick based on your priorities (weighted combination)
891-
const priorities = { accuracy: 0.6, confidence: 0.3, explanation: 0.1 };
892-
const bestWeighted = result.paretoFront.reduce((best, current) => {
893-
const currentScore = Object.entries(current.scores)
894-
.reduce((sum, [obj, score]) => sum + score * (priorities[obj] || 0), 0);
895-
const bestScore = Object.entries(best.scores)
896-
.reduce((sum, [obj, score]) => sum + score * (priorities[obj] || 0), 0);
897-
return currentScore > bestScore ? current : best;
898-
});
899-
900-
// Option 3: Interactive selection based on business requirements
901-
const businessOptimal = result.paretoFront.find((solution) =>
902-
solution.scores.accuracy >= 0.85 && // Must be at least 85% accurate
903-
solution.scores.confidence >= 0.7 && // Must be well-calibrated
904-
solution.scores.explanation >= 0.8 // Must explain unsafe content well
905-
);
906-
907-
// Apply the chosen solution
908-
if (businessOptimal?.demos) {
909-
contentModerator.setDemos(businessOptimal.demos);
910-
console.log("🎯 Applied business-optimal solution");
911-
}
869+
import { AxGEPAFlow, flow, ai } from "@ax-llm/ax";
870+
871+
const pipeline = flow<{ emailText: string }>()
872+
.n('classifier', 'emailText:string -> priority:class "high, normal, low"')
873+
.n('rationale', 'emailText:string, priority:string -> rationale:string "One concise sentence"')
874+
.e('classifier', (s) => ({ emailText: s.emailText }))
875+
.e('rationale', (s) => ({ emailText: s.emailText, priority: s.classifierResult.priority }))
876+
.m((s) => ({ priority: s.classifierResult.priority, rationale: s.rationaleResult.rationale }));
877+
878+
const optimizer = new AxGEPAFlow({ studentAI: ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY!, config: { model: 'gpt-4o-mini' } }), numTrials: 16 });
879+
const result = await optimizer.compile(pipeline as any, train, multiMetric as any, { validationExamples: val, maxMetricCalls: 240 } as any);
880+
console.log(`Front size: ${result.paretoFrontSize}, Hypervolume: ${result.hypervolume}`);
912881
```
913882

914883
#### Advanced Multi-Objective Patterns
@@ -966,45 +935,36 @@ const multiMetric = ({ prediction, example }) => ({
966935
#### Understanding the Results
967936

968937
```typescript
969-
const result = await optimizer.compilePareto(program, multiMetric);
938+
const result = await optimizer.compile(program, examples, multiMetric, { maxMetricCalls: 200 } as any);
970939

971940
// Key properties of AxParetoResult:
972941
console.log(`Pareto frontier size: ${result.paretoFrontSize}`);
973-
console.log(
974-
`Total solutions generated: ${result.finalConfiguration?.numSolutions}`,
975-
);
976-
console.log(`Best single score: ${result.bestScore}`);
942+
console.log(`Best scalarized score on frontier: ${result.bestScore}`);
977943
console.log(`Hypervolume (2D only): ${result.hypervolume}`);
944+
console.log(`Total candidates evaluated: ${result.finalConfiguration?.candidates}`);
978945

979-
// Each solution on the frontier contains:
946+
// Each frontier solution contains:
980947
result.paretoFront.forEach((solution) => {
981-
solution.demos; // Optimized examples for this solution
982948
solution.scores; // Scores for each objective
983-
solution.configuration; // How this solution was generated
984-
solution.dominatedSolutions; // How many other solutions this beats
949+
solution.configuration; // Candidate identifier for this solution
950+
solution.dominatedSolutions; // How many others this point dominates
985951
});
986952
```
987953

988954
#### Performance Considerations
989955

990-
- **Runtime**: `compilePareto` runs multiple single-objective optimizations, so
991-
it takes 3-10x longer than regular `compile`
992-
- **Cost**: Uses more API calls due to multiple optimization runs
993-
- **Complexity**: Only use when you genuinely need multiple objectives
994-
- **Scalability**: Works best with 2-4 objectives; more objectives =
995-
exponentially more solutions
956+
- **Runtime**: GEPA/GEPA-Flow perform reflective evolution with Pareto sampling; time scales with `numTrials`, validation size, and `maxMetricCalls`.
957+
- **Cost**: Bound evaluations with `maxMetricCalls`; consider minibatching.
958+
- **Scalability**: Works best with 2–4 objectives; hypervolume reporting is 2D.
959+
- **Determinism**: Provide `seed` for reproducibility; `tieEpsilon` resolves near-ties.
996960

997961
#### Tips for Success
998962

999-
1. **Start with 2-3 objectives**: More objectives make it harder to choose
1000-
solutions
1001-
2. **Make objectives independent**: Avoid highly correlated objectives
1002-
3. **Scale objectives similarly**: Ensure all objectives range 0-1 for fair
1003-
comparison
1004-
4. **Use business constraints**: Filter the Pareto frontier by minimum
1005-
requirements
1006-
5. **Validate solutions**: Test multiple Pareto-optimal solutions in practice
1007-
963+
1. **Start with 2-3 objectives**: More objectives make selection harder.
964+
2. **Scale objectives similarly (0–1)** for fair comparison.
965+
3. **Use `paretoMetricKey` or `paretoScalarize`** to guide selection/tie-breaks.
966+
4. **Validate chosen trade-offs** on a holdout set aligned to business constraints.
967+
5. **Keep validation small** to control cost; use `validationExamples` and `feedbackExamples` splits.
1008968
### 3. Chain Multiple Programs
1009969

1010970
```typescript

src/ax/dsp/common_types.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import type { AxAIService, AxLoggerFunction } from '../ai/types.js';
22
import type { AxOptimizerLoggerData } from './optimizerTypes.js';
33
import type { AxFieldValue, AxResultPickerFunction } from './types.js';
4+
import type { AxGEPAAdapter } from './optimizers/gepaAdapter.js';
45

56
export type AxExample = Record<string, AxFieldValue>;
67

@@ -173,4 +174,9 @@ export interface AxCompileOptions {
173174
overrideCheckpointLoad?: AxCheckpointLoadFn;
174175
overrideCheckpointInterval?: number;
175176
saveCheckpointOnComplete?: boolean;
177+
// GEPA core options (adapter-based)
178+
gepaAdapter?: AxGEPAAdapter<any, any, any>;
179+
skipPerfectScore?: boolean;
180+
perfectScore?: number;
181+
maxMetricCalls?: number;
176182
}

src/ax/dsp/optimizerLogging.ts

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,17 @@ export const axCreateDefaultOptimizerColorLogger = (
7171
? cl.red(` ↓${Math.abs(improvement).toFixed(3)}`)
7272
: '';
7373

74+
const totalRounds =
75+
typeof data.value.totalRounds === 'number' &&
76+
data.value.totalRounds > 0
77+
? data.value.totalRounds
78+
: typeof (config as any).totalRounds === 'number' &&
79+
(config as any).totalRounds > 0
80+
? (config as any).totalRounds
81+
: 0;
82+
7483
formattedMessage =
75-
`${cl.yellow('● ')}${cl.whiteBright(`Round ${data.value.round}/${data.value.totalRounds}`)}` +
84+
`${cl.yellow('● ')}${cl.whiteBright(`Round ${data.value.round}/${totalRounds}`)}` +
7685
(config.trialNumber !== undefined
7786
? cl.gray(` [Trial #${config.trialNumber}]`)
7887
: '') +

0 commit comments

Comments
 (0)