Skip to content

Latest commit

 

History

History
258 lines (203 loc) · 9.35 KB

File metadata and controls

258 lines (203 loc) · 9.35 KB

MiniPdf Self-Evolution Benchmark

Automatically compares PDFs generated by MiniPdf against LibreOffice (reference implementation), driving continuous rendering quality improvements.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                  run_benchmark.py (orchestrator)            │
│                  scripts/Run-Benchmark.ps1 (one-click entry point)  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Step 1: generate_classic_xlsx.py                           │
│          Generate 30 classic Excel test files with openpyxl │
│          → tests/MiniPdf.Scripts/output/*.xlsx               │
│                                                             │
│  Step 2: convert_xlsx_to_pdf.cs                             │
│          Convert xlsx to PDF using MiniPdf                  │
│          → tests/MiniPdf.Scripts/pdf_output/*.pdf            │
│                                                             │
│  Step 3: generate_reference_pdfs.py                         │
│          Convert xlsx to PDF using LibreOffice (reference)  │
│          → tests/MiniPdf.Benchmark/reference_pdfs/*.pdf      │
│                                                             │
│  Step 4: compare_pdfs.py                                    │
│          Compare text content + visual pixel differences    │
│          → tests/MiniPdf.Benchmark/reports/                  │
│            ├── comparison_report.md   (human-readable)      │
│            ├── comparison_report.json (machine-readable)    │
│            └── images/                (per-page renderings) │
│                                                             │
│  Step 5: Analyze report, identify lowest-scoring test cases │
│          and improve accordingly                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

# 1. Python 3.10+ & dependencies
pip install openpyxl pymupdf

# 2. LibreOffice (free, used to generate reference PDFs)
#    Windows: https://www.libreoffice.org/download/
#    or: winget install LibreOffice

# 3. .NET 9 SDK

One-Click Execution

# Windows PowerShell
.\scripts\Run-Benchmark.ps1

# Or run directly with Python
cd tests/MiniPdf.Benchmark
python run_benchmark.py

Step-by-Step Execution

# 1. Generate Excel test files
cd tests/MiniPdf.Scripts
python generate_classic_xlsx.py

# 2. Convert to PDF with MiniPdf
dotnet run convert_xlsx_to_pdf.cs

# 3. Generate reference PDFs with LibreOffice
cd ../MiniPdf.Benchmark
python generate_reference_pdfs.py

# 4. Compare and analyze
python compare_pdfs.py

# 5. Run comparison only (skip generation steps)
python run_benchmark.py --compare-only

Scoring System

Each test case receives a composite score from 0.0 to 1.0:

Dimension Weight Description
Text Similarity 40% Extracts text from both PDFs and compares via SequenceMatcher
Visual Similarity 40% Uses AI semantic scoring when available; falls back to pixel comparison
Page Count Match 20% 1.0 if page counts match, 0.5 otherwise

Score grades:

  • 🟢 ≥ 0.9 — Excellent
  • 🟡 0.7 ~ 0.9 — Good, room for improvement
  • 🔴 < 0.7 — Significant differences, needs attention

AI Visual Comparison (Optional)

Pure pixel comparison is highly sensitive to anti-aliasing and minor font differences, often producing low scores that are hard to interpret. When --ai-compare is enabled, the script sends rendered page images to GPT-4o (or Azure OpenAI), which identifies specific differences and provides actionable code improvement suggestions.

Credential Configuration

Option 1: OpenAI

$env:OPENAI_API_KEY = "sk-..."
$env:OPENAI_MODEL  = "gpt-4o"   # Optional, defaults to gpt-4o

Option 2: Azure OpenAI

$env:AZURE_OPENAI_ENDPOINT   = "https://your-resource.openai.azure.com"
$env:AZURE_OPENAI_KEY        = "your-key"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o"   # Optional, defaults to gpt-4o

Install dependency:

pip install openai

Usage

# Enable AI comparison (analyzes page 1 only, invoked when pixel score < 0.90)
python compare_pdfs.py --ai-compare

# Analyze first 2 pages, lower threshold to 0.85
python compare_pdfs.py --ai-compare --ai-max-pages 2 --ai-threshold 0.85

# Enable AI via the orchestrator script
python run_benchmark.py --compare-only --ai-compare

AI Report Content

The report comparison_report.md includes three AI-specific sections:

Section Description
🤖 AI Visual Analysis Findings Deduplicated summary of visual differences across all test cases
🤖 AI-Recommended Code Improvements Specific improvement suggestions for ExcelToPdfConverter.cs
AI Analysis Per Test Case Detailed per-page diff with severity (low/medium/high) and AI visual score

Scoring Changes

ai_visual_avg present? Visual dimension value
✅ Yes ai_visual_avg (AI semantic score)
❌ No visual_avg (pixel comparison)

The composite scoring formula remains unchanged: text×0.4 + visual×0.4 + page_count×0.2


Self-Evolution Iteration Flow

┌───────────────────────────────┐
│  1. Run Benchmark Pipeline     │
│     → Generate comparison      │
│       report                   │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  2. Analyze low-scoring cases  │
│     → Identify specific diffs  │
│       (text/visual)            │
│     → Review diff images       │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  3. Modify ExcelToPdfConverter │
│     → Improve rendering logic  │
│     → Fix bugs                 │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  4. Re-run Benchmark           │
│     → Verify score improvement │
│     → Ensure no regressions    │
└──────────┬────────────────────┘
           │
           ▼
       Back to Step 1
       (continuous iteration)

AI-Driven Self-Evolution Workflow

When using an AI assistant (e.g., GitHub Copilot), follow this workflow:

  1. Run the Benchmark:

    .\scripts\Run-Benchmark.ps1
    
  2. Feed the report to the AI:

    Review tests/MiniPdf.Benchmark/reports/comparison_report.md
    Identify the lowest-scoring test cases, analyze the differences,
    and automatically modify ExcelToPdfConverter.cs to improve them.
    
  3. Re-validate after AI makes changes:

    .\scripts\Run-Benchmark.ps1 --SkipGenerate --SkipReference
    
  4. Iterate until all scores ≥ 0.9.

Extending Tests

Add new test cases in generate_classic_xlsx.py:

def classic31_your_new_case():
    wb = Workbook()
    ws = wb.active
    # ... your new scenario ...
    save(wb, "classic31_your_new_case.xlsx")

Then add classic31_your_new_case to the generators list in main() and re-run the pipeline.

File Structure

tests/
├── MiniPdf.Scripts/
│   ├── generate_classic_xlsx.py    # Generates 30 test Excel files
│   ├── convert_xlsx_to_pdf.cs      # MiniPdf-to-PDF conversion script
│   ├── output/                     # Generated .xlsx files
│   └── pdf_output/                 # MiniPdf-generated .pdf files
│
├── MiniPdf.Benchmark/
│   ├── run_benchmark.py            # Orchestrator script
│   ├── generate_reference_pdfs.py  # LibreOffice reference conversion
│   ├── compare_pdfs.py             # PDF comparison engine
│   ├── reference_pdfs/             # LibreOffice reference .pdf files
│   ├── reports/                    # Comparison report output
│   │   ├── comparison_report.md
│   │   ├── comparison_report.json
│   │   └── images/                 # Per-page rendering comparisons
│   └── README.md                   # This document
│
scripts/Run-Benchmark.ps1                   # Windows one-click entry point