feat: Support instance caching for docling #11030

erichare · 2025-12-15T18:39:35Z

This pull request introduces a major optimization to Docling document processing by caching heavy model objects and switching from multiprocessing to threading. The most important changes are the introduction of a global LRU cache for DocumentConverter instances (eliminating repeated 15-20 minute model load times), and refactoring the inline component to use threads instead of processes, enabling shared memory and cache reuse. Additional improvements include code cleanup, updated error handling, and improved thread monitoring.

Performance Optimization and Caching

Added a global LRU cache for DocumentConverter instances using @lru_cache, drastically reducing subsequent processing times from 15-20 minutes to seconds by reusing loaded models. (src/lfx/src/lfx/base/data/docling_utils.py)
Implemented logic to fall back to the old (non-cached) behavior when picture description with LLM is requested, as this is not yet supported with caching. (src/lfx/src/lfx/base/data/docling_utils.py)

Refactoring for Threading

Refactored the inline Docling component to use threads and queue.Queue instead of multiprocessing, enabling the use of the global cache and reducing memory overhead. (src/lfx/src/lfx/components/docling/docling_inline.py) [1] [2]
Rewrote process monitoring and cleanup logic to handle threads instead of processes, including new methods for waiting on and stopping threads gracefully. (src/lfx/src/lfx/components/docling/docling_inline.py)

Code Cleanup and Import Updates

Removed unused imports and cleaned up type checking code, simplifying the codebase. (src/lfx/src/lfx/base/data/docling_utils.py) [1] [2]
Updated imports in the worker function to use noqa: F401 for clarity and to avoid linter warnings. (src/lfx/src/lfx/base/data/docling_utils.py)

Testing

Added new imports and mocking utilities in the test file to support testing of the new threading and caching logic. (src/lfx/tests/unit/base/data/test_docling_utils.py)

Summary by CodeRabbit

Performance
- Optimized document conversion startup times through caching; subsequent runs now complete in seconds instead of 15-20 minutes.
Tests
- Added comprehensive test coverage for converter caching functionality.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Co-Authored-By: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

coderabbitai · 2025-12-15T18:39:56Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce LRU caching for DocumentConverter creation in docling_utils to improve performance by reusing converters across runs, and migrate docling_inline from multiprocessing to threading-based worker management with updated queue handling and helper function signatures.

Changes

Cohort / File(s)	Summary
Converter Caching `src/lfx/src/lfx/base/data/docling_utils.py`	Adds `_get_cached_converter()` function with LRU cache decorator to reuse DocumentConverter instances across calls, avoiding repeated model loading. Includes fallback non-cached path when picture description config is present. Updates imports to include `lru_cache`.
Threading Migration `src/lfx/src/lfx/components/docling/docling_inline.py`	Replaces multiprocessing-based worker management with threading. Renames helper methods: `_wait_for_result_with_process_monitoring()` → `_wait_for_result_with_thread_monitoring()` and `_terminate_process_gracefully()` → `_stop_thread_gracefully()`. Updates queue handling to use `queue.Queue` and thread monitoring logic. Adjusts imports: adds `queue` and `threading`, removes multiprocessing imports.
Caching Tests `src/lfx/tests/unit/base/data/test_docling_utils.py`	Introduces `TestDocumentConverterCaching` class with seven test methods validating cache existence, key-based hits/misses, LRU eviction, performance improvement, cache clearing, and cache differentiation by OCR engine and pipeline parameters.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant DoclingInline
    participant ThreadPool as Worker Thread
    participant Converter as _get_cached_converter()
    participant Cache as LRU Cache
    participant DocumentConverter

    Caller->>DoclingInline: process_files()
    activate DoclingInline
    
    DoclingInline->>ThreadPool: Thread(target=docling_worker, ...)
    activate ThreadPool
    
    ThreadPool->>Converter: _get_cached_converter(pipeline, ocr_engine, ...)
    activate Converter
    
    Note over Converter: Check cache key
    Converter->>Cache: Lookup (pipeline, ocr_engine, ...)
    
    alt Cache Hit
        Cache-->>Converter: Return cached DocumentConverter
        note over Converter: Reuse existing (seconds)
    else Cache Miss
        Converter->>DocumentConverter: Create new instance
        activate DocumentConverter
        DocumentConverter-->>Converter: Converter ready
        deactivate DocumentConverter
        note over Converter: Load models (15-20 min)
        Converter->>Cache: Store in LRU cache
    end
    
    Converter-->>ThreadPool: DocumentConverter instance
    deactivate Converter
    
    ThreadPool->>ThreadPool: docling_worker(converter, ...)
    note over ThreadPool: Process documents
    ThreadPool->>DoclingInline: result_queue.put(result)
    
    deactivate ThreadPool
    
    DoclingInline->>DoclingInline: _wait_for_result_with_thread_monitoring()
    note over DoclingInline: Monitor thread liveness
    DoclingInline->>DoclingInline: _stop_thread_gracefully()
    
    Caller-->>DoclingInline: Processing complete
    deactivate DoclingInline

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Thread safety considerations: Review queue and thread synchronization in docling_inline.py, ensuring result_queue operations and thread lifecycle management are correct
Caching behavior: Verify LRU eviction logic, cache key construction (parameter hashing), and fallback to non-cached path when picture description config is present
Converter lifecycle: Ensure converter instances are properly reused and that non-cached fallback doesn't introduce memory leaks or state issues
Test coverage validation: Confirm mocking strategy in tests accurately represents caching and thread behavior

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 4 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR lacks test coverage for critical threading refactor in docling_inline.py and has incorrect patch path in existing tests.	Create test_docling_inline.py with threading tests and fix patch path in test_docling_utils.py line 164 from 'lfx.base.data.docling_utils.DocumentConverter' to 'docling.document_converter.DocumentConverter'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Quality And Coverage	⚠️ Warning	Test suite has critical gaps: incorrect mock patch path prevents cache test mocks from working, threading refactor lacks test coverage, error handling paths untested, and fallback logic uncovered.	Fix patch path to 'docling.document_converter.DocumentConverter', add threading tests (queue, timeout, shutdown), add error path tests, add parameter validation tests, and test cache with realistic mocks.
Test File Naming And Structure	⚠️ Warning	Test file follows naming convention (test_*.py) and has descriptive test methods, but contains duplicate TestDocumentConverterCaching class definition and lacks negative/error scenario test coverage.	Remove duplicate TestDocumentConverterCaching class and add pytest.raises() tests for invalid parameters, missing dependencies, and error conditions to ensure comprehensive coverage.
Excessive Mock Usage Warning	⚠️ Warning	Test suite exhibits excessive mock usage (71% of tests) with incorrect patch paths causing silent failures and reduced test validity.	Fix patch paths to use correct import locations and refactor tests to replace mocks with lightweight test objects, separating cache unit tests from integration tests.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: Support instance caching for docling' accurately summarizes the main change: introducing instance caching for the Docling document converter.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-12-15T18:46:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.08%. Comparing base (53015c1) to head (934e6c9).

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11030      +/-   ##
==========================================
- Coverage   33.21%   33.08%   -0.14%     
==========================================
  Files        1389     1389              
  Lines       65682    65707      +25     
  Branches     9720     9721       +1     
==========================================
- Hits        21818    21736      -82     
- Misses      42749    42856     +107     
  Partials     1115     1115

Flag	Coverage Δ
backend	`52.39% <ø> (-0.18%)`	⬇️
frontend	`15.35% <ø> (ø)`
lfx	`39.16% <ø> (-0.29%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/lfx/src/lfx/base/data/docling_utils.py	`0.00% <ø> (ø)`

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lfx/src/lfx/components/docling/docling_inline.py (1)
220-223: Log message still references "process" instead of "thread".

The log message at line 222 says "Docling process cancelled" but this should be "Docling thread" to be consistent with the threading migration.
             if "Worker interrupted by SIGINT" in error_msg or "shutdown" in result:
-                self.log("Docling process cancelled by user")
+                self.log("Docling thread cancelled by user")
                 result = []

🧹 Nitpick comments (4)

src/lfx/src/lfx/base/data/docling_utils.py (2)
242-278: Signal handlers in worker threads won't receive signals as expected.

In Python, signal handlers are only delivered to the main thread. When docling_worker runs in a thread (not process), registering signal.signal(SIGTERM, ...) will either raise an error (caught at line 276) or have no effect. The signals will go to the main thread instead.

The current error handling gracefully degrades, but the comment and logic suggest signals are expected to work. Since this is now threading-based, consider:

Removing signal registration entirely in favor of a threading.Event for shutdown coordination

Updating the docstring/comments to reflect this limitation
-    # Register signal handlers early
-    try:
-        signal.signal(signal.SIGTERM, signal_handler)
-        signal.signal(signal.SIGINT, signal_handler)
-        logger.debug("Signal handlers registered for graceful shutdown")
-    except (OSError, ValueError) as e:
-        # Some signals might not be available on all platforms
-        logger.warning(f"Warning: Could not register signal handlers: {e}")
+    # Note: Signal handlers only work in the main thread.
+    # In a worker thread, we rely on the main thread to coordinate shutdown.
+    # The signal registration below will fail in threads, which is expected.
+    try:
+        signal.signal(signal.SIGTERM, signal_handler)
+        signal.signal(signal.SIGINT, signal_handler)
+        logger.debug("Signal handlers registered for graceful shutdown")
+    except (OSError, ValueError) as e:
+        # Expected in non-main threads - signals are handled by main thread
+        logger.debug(f"Signal handlers not registered (expected in threads): {e}")
319-353: Code duplication between cached and non-cached converter paths.

The non-cached fallback path duplicates significant logic from _get_cached_converter (OCR setup, format options). While understandable given the "known limitation" comment about picture description caching, this creates maintenance burden.

Consider extracting shared setup logic into a helper function that both paths can use, with the picture description configuration added only in the non-cached path.
src/lfx/tests/unit/base/data/test_docling_utils.py (2)
156-204: Consider using a pytest fixture for cache management.

Each test manually calls cache_clear() at the start. Using a pytest fixture with cleanup ensures cache is cleared even if tests fail mid-execution, improving test isolation.
@pytest.fixture(autouse=True)
def clear_converter_cache(self):
    """Clear the converter cache before and after each test."""
    from lfx.base.data.docling_utils import _get_cached_converter
    _get_cached_converter.cache_clear()
    yield
    _get_cached_converter.cache_clear()
237-276: Time-based test may be flaky in CI environments.

The assertion second_call_duration < first_call_duration / 10 could fail on heavily loaded CI runners where even a cache hit might experience scheduling delays. Consider:

Using a larger delay (e.g., 0.1s) with a more lenient ratio

Adding a minimum threshold check (e.g., second_call_duration < 0.01)

Marking the test with a retry decorator if flakiness occurs

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53015c1 and c46dcb4.

📒 Files selected for processing (3)

src/lfx/src/lfx/base/data/docling_utils.py (5 hunks)
src/lfx/src/lfx/components/docling/docling_inline.py (3 hunks)
src/lfx/tests/unit/base/data/test_docling_utils.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/test_*.py

📄 CodeRabbit inference engine (Custom checks)

**/test_*.py: Review test files for excessive use of mocks that may indicate poor test design - check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions, and suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies only, not for core logic
Backend test files should follow the naming convention test_*.py with proper pytest structure
Test files should have descriptive test function names that explain what is being tested
Tests should be organized logically with proper setup and teardown
Consider including edge cases and error conditions for comprehensive test coverage
Verify tests cover both positive and negative scenarios where appropriate
For async functions in backend tests, ensure proper async testing patterns are used with pytest
For API endpoints, verify both success and error response testing

Files:

src/lfx/tests/unit/base/data/test_docling_utils.py

🧬 Code graph analysis (2)

src/lfx/tests/unit/base/data/test_docling_utils.py (1)

src/lfx/src/lfx/base/data/docling_utils.py (1)

_get_cached_converter (155-224)

src/lfx/src/lfx/components/docling/docling_inline.py (1)

src/lfx/src/lfx/base/data/docling_utils.py (1)

docling_worker (226-463)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Update Component Index

🔇 Additional comments (6)

src/lfx/src/lfx/base/data/docling_utils.py (2)

284-289: Imports with noqa: F401 are appropriate for dependency checking.

These imports verify Docling dependencies are available before proceeding. The noqa comments correctly suppress unused import warnings since these are intentionally imported only for availability checking.

152-224: LRU cache does not prevent duplicate initialization on concurrent cache misses.

Python's @lru_cache is thread-safe for its internal state but does not guarantee "call-once" semantics. When multiple threads simultaneously request the cache with the same missing key, each will execute the wrapped function independently. Given that converter creation takes 15-20 minutes as noted in the docstring, concurrent requests with identical configurations (pipeline, ocr_engine, picture classification settings) could trigger redundant heavy initialization.

Current usage (single thread per operation) mitigates this, but if concurrent requests with identical converter configs become a requirement, implement a lock-based caching pattern (e.g., cachetools with condition or per-key locks) to prevent cache stampede.

src/lfx/tests/unit/base/data/test_docling_utils.py (1)

142-154: Tests appropriately mock external dependencies and focus on caching behavior.

The mock usage is appropriate per coding guidelines - DocumentConverter is an external dependency from the docling package, and the tests correctly focus on verifying the LRU caching mechanism rather than the converter's internals.

src/lfx/src/lfx/components/docling/docling_inline.py (3)

95-130: Thread monitoring logic is sound and handles edge cases well.

The implementation correctly:

Polls the queue with timeout to avoid blocking

Checks thread liveness to detect crashes

Raises appropriate errors with clear messages

This is a clean translation from process monitoring to thread monitoring.

132-145: Correct handling of Python's thread termination limitations.

The implementation correctly acknowledges that Python threads cannot be forcefully killed. The warning when a thread remains alive after timeout is appropriate for debugging.

Note: If the worker thread is stuck in a long-running operation (e.g., model loading), this could leave the thread running indefinitely. Consider documenting this behavior or implementing a cooperative shutdown mechanism using threading.Event.

167-182: Thread setup is appropriate for the caching use case.

Using daemon=False ensures the worker completes even if the main thread exits, preventing data loss. The queue.Queue is the correct choice for thread-safe communication, and this enables the global @lru_cache on _get_cached_converter to work as intended.

src/lfx/tests/unit/base/data/test_docling_utils.py

github-actions · 2025-12-15T18:50:19Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	16.65% (4686/28138)	10.02% (2179/21743)	10.93% (676/6180)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1829	0 💤	0 ❌	0 🔥	23.184s ⏱️

Co-Authored-By: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

feat: Support instance caching for docling

c46dcb4

Co-Authored-By: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

erichare requested a review from ogabrielluiz December 15, 2025 18:39

github-actions bot added the enhancement New feature or request label Dec 15, 2025

[autofix.ci] apply automated fixes

5e8b820