Automatically compares PDFs generated by MiniPdf against LibreOffice (reference implementation), driving continuous rendering quality improvements.
┌─────────────────────────────────────────────────────────────┐
│ run_benchmark.py (orchestrator) │
│ scripts/Run-Benchmark.ps1 (one-click entry point) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Step 1: generate_classic_xlsx.py │
│ Generate 30 classic Excel test files with openpyxl │
│ → tests/MiniPdf.Scripts/output/*.xlsx │
│ │
│ Step 2: convert_xlsx_to_pdf.cs │
│ Convert xlsx to PDF using MiniPdf │
│ → tests/MiniPdf.Scripts/pdf_output/*.pdf │
│ │
│ Step 3: generate_reference_pdfs.py │
│ Convert xlsx to PDF using LibreOffice (reference) │
│ → tests/MiniPdf.Benchmark/reference_pdfs/*.pdf │
│ │
│ Step 4: compare_pdfs.py │
│ Compare text content + visual pixel differences │
│ → tests/MiniPdf.Benchmark/reports/ │
│ ├── comparison_report.md (human-readable) │
│ ├── comparison_report.json (machine-readable) │
│ └── images/ (per-page renderings) │
│ │
│ Step 5: Analyze report, identify lowest-scoring test cases │
│ and improve accordingly │
│ │
└─────────────────────────────────────────────────────────────┘
# 1. Python 3.10+ & dependencies
pip install openpyxl pymupdf
# 2. LibreOffice (free, used to generate reference PDFs)
# Windows: https://www.libreoffice.org/download/
# or: winget install LibreOffice
# 3. .NET 9 SDK# Windows PowerShell
.\scripts\Run-Benchmark.ps1
# Or run directly with Python
cd tests/MiniPdf.Benchmark
python run_benchmark.py# 1. Generate Excel test files
cd tests/MiniPdf.Scripts
python generate_classic_xlsx.py
# 2. Convert to PDF with MiniPdf
dotnet run convert_xlsx_to_pdf.cs
# 3. Generate reference PDFs with LibreOffice
cd ../MiniPdf.Benchmark
python generate_reference_pdfs.py
# 4. Compare and analyze
python compare_pdfs.py
# 5. Run comparison only (skip generation steps)
python run_benchmark.py --compare-onlyEach test case receives a composite score from 0.0 to 1.0:
| Dimension | Weight | Description |
|---|---|---|
| Text Similarity | 40% | Extracts text from both PDFs and compares via SequenceMatcher |
| Visual Similarity | 40% | Uses AI semantic scoring when available; falls back to pixel comparison |
| Page Count Match | 20% | 1.0 if page counts match, 0.5 otherwise |
Score grades:
- 🟢 ≥ 0.9 — Excellent
- 🟡 0.7 ~ 0.9 — Good, room for improvement
- 🔴 < 0.7 — Significant differences, needs attention
Pure pixel comparison is highly sensitive to anti-aliasing and minor font differences, often producing low scores that are hard to interpret.
When --ai-compare is enabled, the script sends rendered page images to GPT-4o (or Azure OpenAI),
which identifies specific differences and provides actionable code improvement suggestions.
Option 1: OpenAI
$env:OPENAI_API_KEY = "sk-..."
$env:OPENAI_MODEL = "gpt-4o" # Optional, defaults to gpt-4oOption 2: Azure OpenAI
$env:AZURE_OPENAI_ENDPOINT = "https://your-resource.openai.azure.com"
$env:AZURE_OPENAI_KEY = "your-key"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o" # Optional, defaults to gpt-4oInstall dependency:
pip install openai# Enable AI comparison (analyzes page 1 only, invoked when pixel score < 0.90)
python compare_pdfs.py --ai-compare
# Analyze first 2 pages, lower threshold to 0.85
python compare_pdfs.py --ai-compare --ai-max-pages 2 --ai-threshold 0.85
# Enable AI via the orchestrator script
python run_benchmark.py --compare-only --ai-compareThe report comparison_report.md includes three AI-specific sections:
| Section | Description |
|---|---|
| 🤖 AI Visual Analysis Findings | Deduplicated summary of visual differences across all test cases |
| 🤖 AI-Recommended Code Improvements | Specific improvement suggestions for ExcelToPdfConverter.cs |
| AI Analysis Per Test Case | Detailed per-page diff with severity (low/medium/high) and AI visual score |
ai_visual_avg present? |
Visual dimension value |
|---|---|
| ✅ Yes | ai_visual_avg (AI semantic score) |
| ❌ No | visual_avg (pixel comparison) |
The composite scoring formula remains unchanged: text×0.4 + visual×0.4 + page_count×0.2
┌───────────────────────────────┐
│ 1. Run Benchmark Pipeline │
│ → Generate comparison │
│ report │
└──────────┬────────────────────┘
│
▼
┌───────────────────────────────┐
│ 2. Analyze low-scoring cases │
│ → Identify specific diffs │
│ (text/visual) │
│ → Review diff images │
└──────────┬────────────────────┘
│
▼
┌───────────────────────────────┐
│ 3. Modify ExcelToPdfConverter │
│ → Improve rendering logic │
│ → Fix bugs │
└──────────┬────────────────────┘
│
▼
┌───────────────────────────────┐
│ 4. Re-run Benchmark │
│ → Verify score improvement │
│ → Ensure no regressions │
└──────────┬────────────────────┘
│
▼
Back to Step 1
(continuous iteration)
When using an AI assistant (e.g., GitHub Copilot), follow this workflow:
-
Run the Benchmark:
.\scripts\Run-Benchmark.ps1 -
Feed the report to the AI:
Review tests/MiniPdf.Benchmark/reports/comparison_report.md Identify the lowest-scoring test cases, analyze the differences, and automatically modify ExcelToPdfConverter.cs to improve them. -
Re-validate after AI makes changes:
.\scripts\Run-Benchmark.ps1 --SkipGenerate --SkipReference -
Iterate until all scores ≥ 0.9.
Add new test cases in generate_classic_xlsx.py:
def classic31_your_new_case():
wb = Workbook()
ws = wb.active
# ... your new scenario ...
save(wb, "classic31_your_new_case.xlsx")Then add classic31_your_new_case to the generators list in main() and re-run the pipeline.
tests/
├── MiniPdf.Scripts/
│ ├── generate_classic_xlsx.py # Generates 30 test Excel files
│ ├── convert_xlsx_to_pdf.cs # MiniPdf-to-PDF conversion script
│ ├── output/ # Generated .xlsx files
│ └── pdf_output/ # MiniPdf-generated .pdf files
│
├── MiniPdf.Benchmark/
│ ├── run_benchmark.py # Orchestrator script
│ ├── generate_reference_pdfs.py # LibreOffice reference conversion
│ ├── compare_pdfs.py # PDF comparison engine
│ ├── reference_pdfs/ # LibreOffice reference .pdf files
│ ├── reports/ # Comparison report output
│ │ ├── comparison_report.md
│ │ ├── comparison_report.json
│ │ └── images/ # Per-page rendering comparisons
│ └── README.md # This document
│
scripts/Run-Benchmark.ps1 # Windows one-click entry point