Self Checks
RAGFlow workspace code commit ID
none
RAGFlow image version
v0.24.0
Other environment information
**Environment:**
* **RAGFlow Version:** v0.24.0 (Docker deployment)
* **MinerU Version:** (Installed via uv inside `/ragflow/uv_tools/.venv`)
* **OS/Deployment:** Docker (Local/Offline deployment)
Actual behavior
Description:
When configuring MinerU as the chunking method via the API backend (MINERU_APISERVER), RAGFlow successfully sends the PDF, and MinerU processes it and returns a ZIP file. However, RAGFlow fails to locate the expected JSON output file inside the unzipped directory, throwing a Missing output file error. This occurs even with purely English, simple filenames (e.g., Pre.pdf).
Additionally, after the file matching fails, RAGFlow incorrectly falls back to executing the local CLI command, which causes a misleading [ERROR]MinerU not found log because the mineru executable is inside the uv_tools virtual environment rather than the global $PATH.
Root Cause Analysis & Suggestions:
- Output File Naming Mismatch: It appears the latest version of MinerU has changed its ZIP output structure or JSON naming conventions. RAGFlow strictly expects
<filename>_content_list.json, which no longer exists in the MinerU API response. RAGFlow's extraction logic (LLMBundle) needs to be updated to match the current MinerU output schema.
- Fallback Mechanism Bug: If the API successfully returns a ZIP but RAGFlow fails to find the file, it shouldn't silently fall back to the local CLI mode. Furthermore, if CLI mode is intended as a fallback, RAGFlow should use the path defined in
MINERU_EXECUTABLE rather than assuming mineru is in the global $PATH.
Expected behavior
Error Logs:
2026-04-02 20:02:54,676 INFO 98 [MinerU] Extract zip: zip_path=/tmp/mineru_pdf_mckn9b8v/Pre_auto_z5hagbcn.zip, extract_to=/tmp/mineru_pdf_mckn9b8v/Pre_auto_z5hagbcn, root_hint=Pre/
2026-04-02 20:02:54,689 INFO 98 [MinerU] Api completed successfully.
2026-04-02 20:02:54,690 INFO 98 [MinerU] Expected output files: Pre_content_list.json
2026-04-02 20:02:54,690 INFO 98 [MinerU] Searching output in: /tmp/mineru_pdf_mckn9b8v/Pre_auto_z5hagbcn
2026-04-02 20:02:54,692 ERROR 98 Failed to parse pdf via LLMBundle MinerU (Mineru): [MinerU] Missing output file, tried: /tmp/mineru_pdf_mckn9b8v/Pre_auto_z5hagbcn/Pre_content_list.json, /tmp/mineru_pdf_mckn9b8v/Pre_auto_z5hagbcn/Pre/Pre_content_list.json
2026-04-02 20:02:54,705 INFO 98 [ERROR]MinerU not found.
Steps to reproduce
**Steps to Reproduce:**
1. Deploy RAGFlow via Docker Compose.
2. Start the MinerU API server manually inside the container (`mineru-api --host 0.0.0.0 --port 8886`).
3. Set environment variables in `entrypoint.sh` or `.env`: `MINERU_APISERVER=http://127.0.0.1:8886`, `MINERU_BACKEND=pipeline`.
4. Upload a simple English PDF (e.g., `Pre.pdf`) and set the Chunk Method to `MinerU`.
5. Observe the parsing failure in the task execution logs.
Additional information
No response
Self Checks
RAGFlow workspace code commit ID
none
RAGFlow image version
v0.24.0
Other environment information
Actual behavior
Description:
When configuring MinerU as the chunking method via the API backend (
MINERU_APISERVER), RAGFlow successfully sends the PDF, and MinerU processes it and returns a ZIP file. However, RAGFlow fails to locate the expected JSON output file inside the unzipped directory, throwing aMissing output fileerror. This occurs even with purely English, simple filenames (e.g.,Pre.pdf).Additionally, after the file matching fails, RAGFlow incorrectly falls back to executing the local CLI command, which causes a misleading
[ERROR]MinerU not foundlog because themineruexecutable is inside theuv_toolsvirtual environment rather than the global$PATH.Root Cause Analysis & Suggestions:
<filename>_content_list.json, which no longer exists in the MinerU API response. RAGFlow's extraction logic (LLMBundle) needs to be updated to match the current MinerU output schema.MINERU_EXECUTABLErather than assumingmineruis in the global$PATH.Expected behavior
Error Logs:
Steps to reproduce
Additional information
No response