Skip to content

v0.1.6

Latest

Choose a tag to compare

@davidberenstein1957 davidberenstein1957 released this 20 Mar 09:26
1afc528

Fix: Validation leakage and unfair baseline causing inflated metrics

Four bugs combined to produce artificially inflated optimization scores (e.g. baseline 89% → optimized 100% with no real improvement):

1. Validation data leak in sequential field optimization (critical)

_optimize_single_field always set val_single = train_single due to an off-by-one guard condition — DSPy's optimizer trained and validated on identical data, biasing candidate selection toward memorization.

2. Silent empty-validation fallback

When too few examples existed for a proper train/val split, the code silently used the training set as validation. Now emits a UserWarning.

3. Prompt metric used wrong field descriptions

Phase 2 prompt optimization evaluated candidates with the original field descriptions instead of Phase 1's optimized ones, causing DSPy to pick prompts in the wrong evaluation context.

4. Unfair baseline (no few-shot demos)

Baseline score was computed without few-shot demos, while all subsequent evaluations included up to 8 demos in the extraction prompt. The apparent improvement was from demos alone, not from description optimization. Baseline now includes demos for a fair comparison.

Install

uv pip install dspydantic==0.1.6