MiniPdf Self-Evolution Benchmark

Automatically compares PDFs generated by MiniPdf against LibreOffice (reference implementation), driving continuous rendering quality improvements.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                  run_benchmark.py (orchestrator)            │
│                  scripts/Run-Benchmark.ps1 (one-click entry point)  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Step 1: generate_classic_xlsx.py                           │
│          Generate 30 classic Excel test files with openpyxl │
│          → tests/MiniPdf.Scripts/output/*.xlsx               │
│                                                             │
│  Step 2: convert_xlsx_to_pdf.cs                             │
│          Convert xlsx to PDF using MiniPdf                  │
│          → tests/MiniPdf.Scripts/pdf_output/*.pdf            │
│                                                             │
│  Step 3: generate_reference_pdfs.py                         │
│          Convert xlsx to PDF using LibreOffice (reference)  │
│          → tests/MiniPdf.Benchmark/reference_pdfs/*.pdf      │
│                                                             │
│  Step 4: compare_pdfs.py                                    │
│          Compare text content + visual pixel differences    │
│          → tests/MiniPdf.Benchmark/reports/                  │
│            ├── comparison_report.md   (human-readable)      │
│            ├── comparison_report.json (machine-readable)    │
│            └── images/                (per-page renderings) │
│                                                             │
│  Step 5: Analyze report, identify lowest-scoring test cases │
│          and improve accordingly                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

# 1. Python 3.10+ & dependencies
pip install openpyxl pymupdf

# 2. LibreOffice (free, used to generate reference PDFs)
#    Windows: https://www.libreoffice.org/download/
#    or: winget install LibreOffice

# 3. .NET 9 SDK

One-Click Execution

# Windows PowerShell
.\scripts\Run-Benchmark.ps1

# Or run directly with Python
cd tests/MiniPdf.Benchmark
python run_benchmark.py

Step-by-Step Execution

# 1. Generate Excel test files
cd tests/MiniPdf.Scripts
python generate_classic_xlsx.py

# 2. Convert to PDF with MiniPdf
dotnet run convert_xlsx_to_pdf.cs

# 3. Generate reference PDFs with LibreOffice
cd ../MiniPdf.Benchmark
python generate_reference_pdfs.py

# 4. Compare and analyze
python compare_pdfs.py

# 5. Run comparison only (skip generation steps)
python run_benchmark.py --compare-only

Scoring System

Each test case receives a composite score from 0.0 to 1.0:

Dimension	Weight	Description
Text Similarity	40%	Extracts text from both PDFs and compares via SequenceMatcher
Visual Similarity	40%	Uses AI semantic scoring when available; falls back to pixel comparison
Page Count Match	20%	1.0 if page counts match, 0.5 otherwise

Score grades:

🟢 ≥ 0.9 — Excellent
🟡 0.7 ~ 0.9 — Good, room for improvement
🔴 < 0.7 — Significant differences, needs attention

AI Visual Comparison (Optional)

Pure pixel comparison is highly sensitive to anti-aliasing and minor font differences, often producing low scores that are hard to interpret. When --ai-compare is enabled, the script sends rendered page images to GPT-4o (or Azure OpenAI), which identifies specific differences and provides actionable code improvement suggestions.

Credential Configuration

Option 1: OpenAI

$env:OPENAI_API_KEY = "sk-..."
$env:OPENAI_MODEL  = "gpt-4o"   # Optional, defaults to gpt-4o

Option 2: Azure OpenAI

$env:AZURE_OPENAI_ENDPOINT   = "https://your-resource.openai.azure.com"
$env:AZURE_OPENAI_KEY        = "your-key"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o"   # Optional, defaults to gpt-4o

Install dependency:

pip install openai

Usage

# Enable AI comparison (analyzes page 1 only, invoked when pixel score < 0.90)
python compare_pdfs.py --ai-compare

# Analyze first 2 pages, lower threshold to 0.85
python compare_pdfs.py --ai-compare --ai-max-pages 2 --ai-threshold 0.85

# Enable AI via the orchestrator script
python run_benchmark.py --compare-only --ai-compare

AI Report Content

The report comparison_report.md includes three AI-specific sections:

Section	Description
🤖 AI Visual Analysis Findings	Deduplicated summary of visual differences across all test cases
🤖 AI-Recommended Code Improvements	Specific improvement suggestions for `ExcelToPdfConverter.cs`
AI Analysis Per Test Case	Detailed per-page diff with severity (low/medium/high) and AI visual score

Scoring Changes

`ai_visual_avg` present?	Visual dimension value
✅ Yes	`ai_visual_avg` (AI semantic score)
❌ No	`visual_avg` (pixel comparison)

The composite scoring formula remains unchanged: text×0.4 + visual×0.4 + page_count×0.2

Self-Evolution Iteration Flow

┌───────────────────────────────┐
│  1. Run Benchmark Pipeline     │
│     → Generate comparison      │
│       report                   │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  2. Analyze low-scoring cases  │
│     → Identify specific diffs  │
│       (text/visual)            │
│     → Review diff images       │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  3. Modify ExcelToPdfConverter │
│     → Improve rendering logic  │
│     → Fix bugs                 │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  4. Re-run Benchmark           │
│     → Verify score improvement │
│     → Ensure no regressions    │
└──────────┬────────────────────┘
           │
           ▼
       Back to Step 1
       (continuous iteration)

AI-Driven Self-Evolution Workflow

When using an AI assistant (e.g., GitHub Copilot), follow this workflow:

Run the Benchmark:
```
.\scripts\Run-Benchmark.ps1
```

Feed the report to the AI:

Review tests/MiniPdf.Benchmark/reports/comparison_report.md
Identify the lowest-scoring test cases, analyze the differences,
and automatically modify ExcelToPdfConverter.cs to improve them.

Re-validate after AI makes changes:

.\scripts\Run-Benchmark.ps1 --SkipGenerate --SkipReference

Iterate until all scores ≥ 0.9.

Extending Tests

Add new test cases in generate_classic_xlsx.py:

def classic31_your_new_case():
    wb = Workbook()
    ws = wb.active
    # ... your new scenario ...
    save(wb, "classic31_your_new_case.xlsx")

Then add classic31_your_new_case to the generators list in main() and re-run the pipeline.

File Structure

tests/
├── MiniPdf.Scripts/
│   ├── generate_classic_xlsx.py    # Generates 30 test Excel files
│   ├── convert_xlsx_to_pdf.cs      # MiniPdf-to-PDF conversion script
│   ├── output/                     # Generated .xlsx files
│   └── pdf_output/                 # MiniPdf-generated .pdf files
│
├── MiniPdf.Benchmark/
│   ├── run_benchmark.py            # Orchestrator script
│   ├── generate_reference_pdfs.py  # LibreOffice reference conversion
│   ├── compare_pdfs.py             # PDF comparison engine
│   ├── reference_pdfs/             # LibreOffice reference .pdf files
│   ├── reports/                    # Comparison report output
│   │   ├── comparison_report.md
│   │   ├── comparison_report.json
│   │   └── images/                 # Per-page rendering comparisons
│   └── README.md                   # This document
│
scripts/Run-Benchmark.ps1                   # Windows one-click entry point

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiniPdf Self-Evolution Benchmark

Architecture Overview

Quick Start

Prerequisites

One-Click Execution

Step-by-Step Execution

Scoring System

AI Visual Comparison (Optional)

Credential Configuration

Usage

AI Report Content

Scoring Changes

Self-Evolution Iteration Flow

AI-Driven Self-Evolution Workflow

Extending Tests

File Structure

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MiniPdf Self-Evolution Benchmark

Architecture Overview

Quick Start

Prerequisites

One-Click Execution

Step-by-Step Execution

Scoring System

AI Visual Comparison (Optional)

Credential Configuration

Usage

AI Report Content

Scoring Changes

Self-Evolution Iteration Flow

AI-Driven Self-Evolution Workflow

Extending Tests

File Structure