Skip to content

Tags: neoncapy/doc2md

Tags

v3.5.0

Toggle v3.5.0's commit message
v3.5.0: marker as default extractor, Step 3b image analysis, QC fixes

- Add convert-paper-marker.py: marker extractor wrapper with page-count timeout
- Marker is now default for digital PDFs (docling for scanned)
- Fallback chain: marker -> docling -> pymupdf4llm -> mineru -> tesseract
- Step 3b: re-run prepare-image-analysis after Step 6c creates manifest
- Quality gate triggers automatic extractor fallback
- fcntl registry locking for concurrent pipeline runs
- Fix: --no-images gate for all extractor paths
- Fix: run_command() timeout parameter
- Fix: figure_num type safety for mixed int/string values