-
Notifications
You must be signed in to change notification settings - Fork 8.2k
feat: Support instance caching for docling #11030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-Authored-By: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce LRU caching for DocumentConverter creation in docling_utils to improve performance by reusing converters across runs, and migrate docling_inline from multiprocessing to threading-based worker management with updated queue handling and helper function signatures. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant DoclingInline
participant ThreadPool as Worker Thread
participant Converter as _get_cached_converter()
participant Cache as LRU Cache
participant DocumentConverter
Caller->>DoclingInline: process_files()
activate DoclingInline
DoclingInline->>ThreadPool: Thread(target=docling_worker, ...)
activate ThreadPool
ThreadPool->>Converter: _get_cached_converter(pipeline, ocr_engine, ...)
activate Converter
Note over Converter: Check cache key
Converter->>Cache: Lookup (pipeline, ocr_engine, ...)
alt Cache Hit
Cache-->>Converter: Return cached DocumentConverter
note over Converter: Reuse existing (seconds)
else Cache Miss
Converter->>DocumentConverter: Create new instance
activate DocumentConverter
DocumentConverter-->>Converter: Converter ready
deactivate DocumentConverter
note over Converter: Load models (15-20 min)
Converter->>Cache: Store in LRU cache
end
Converter-->>ThreadPool: DocumentConverter instance
deactivate Converter
ThreadPool->>ThreadPool: docling_worker(converter, ...)
note over ThreadPool: Process documents
ThreadPool->>DoclingInline: result_queue.put(result)
deactivate ThreadPool
DoclingInline->>DoclingInline: _wait_for_result_with_thread_monitoring()
note over DoclingInline: Monitor thread liveness
DoclingInline->>DoclingInline: _stop_thread_gracefully()
Caller-->>DoclingInline: Processing complete
deactivate DoclingInline
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touchesImportant Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 4 warnings)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #11030 +/- ##
==========================================
- Coverage 33.21% 33.08% -0.14%
==========================================
Files 1389 1389
Lines 65682 65707 +25
Branches 9720 9721 +1
==========================================
- Hits 21818 21736 -82
- Misses 42749 42856 +107
Partials 1115 1115
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lfx/src/lfx/components/docling/docling_inline.py (1)
220-223: Log message still references "process" instead of "thread".The log message at line 222 says "Docling process cancelled" but this should be "Docling thread" to be consistent with the threading migration.
if "Worker interrupted by SIGINT" in error_msg or "shutdown" in result: - self.log("Docling process cancelled by user") + self.log("Docling thread cancelled by user") result = []
🧹 Nitpick comments (4)
src/lfx/src/lfx/base/data/docling_utils.py (2)
242-278: Signal handlers in worker threads won't receive signals as expected.In Python, signal handlers are only delivered to the main thread. When
docling_workerruns in a thread (not process), registeringsignal.signal(SIGTERM, ...)will either raise an error (caught at line 276) or have no effect. The signals will go to the main thread instead.The current error handling gracefully degrades, but the comment and logic suggest signals are expected to work. Since this is now threading-based, consider:
- Removing signal registration entirely in favor of a threading.Event for shutdown coordination
- Updating the docstring/comments to reflect this limitation
- # Register signal handlers early - try: - signal.signal(signal.SIGTERM, signal_handler) - signal.signal(signal.SIGINT, signal_handler) - logger.debug("Signal handlers registered for graceful shutdown") - except (OSError, ValueError) as e: - # Some signals might not be available on all platforms - logger.warning(f"Warning: Could not register signal handlers: {e}") + # Note: Signal handlers only work in the main thread. + # In a worker thread, we rely on the main thread to coordinate shutdown. + # The signal registration below will fail in threads, which is expected. + try: + signal.signal(signal.SIGTERM, signal_handler) + signal.signal(signal.SIGINT, signal_handler) + logger.debug("Signal handlers registered for graceful shutdown") + except (OSError, ValueError) as e: + # Expected in non-main threads - signals are handled by main thread + logger.debug(f"Signal handlers not registered (expected in threads): {e}")
319-353: Code duplication between cached and non-cached converter paths.The non-cached fallback path duplicates significant logic from
_get_cached_converter(OCR setup, format options). While understandable given the "known limitation" comment about picture description caching, this creates maintenance burden.Consider extracting shared setup logic into a helper function that both paths can use, with the picture description configuration added only in the non-cached path.
src/lfx/tests/unit/base/data/test_docling_utils.py (2)
156-204: Consider using a pytest fixture for cache management.Each test manually calls
cache_clear()at the start. Using a pytest fixture with cleanup ensures cache is cleared even if tests fail mid-execution, improving test isolation.@pytest.fixture(autouse=True) def clear_converter_cache(self): """Clear the converter cache before and after each test.""" from lfx.base.data.docling_utils import _get_cached_converter _get_cached_converter.cache_clear() yield _get_cached_converter.cache_clear()
237-276: Time-based test may be flaky in CI environments.The assertion
second_call_duration < first_call_duration / 10could fail on heavily loaded CI runners where even a cache hit might experience scheduling delays. Consider:
- Using a larger delay (e.g., 0.1s) with a more lenient ratio
- Adding a minimum threshold check (e.g.,
second_call_duration < 0.01)- Marking the test with a retry decorator if flakiness occurs
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/lfx/src/lfx/base/data/docling_utils.py(5 hunks)src/lfx/src/lfx/components/docling/docling_inline.py(3 hunks)src/lfx/tests/unit/base/data/test_docling_utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/test_*.py
📄 CodeRabbit inference engine (Custom checks)
**/test_*.py: Review test files for excessive use of mocks that may indicate poor test design - check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions, and suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies only, not for core logic
Backend test files should follow the naming convention test_*.py with proper pytest structure
Test files should have descriptive test function names that explain what is being tested
Tests should be organized logically with proper setup and teardown
Consider including edge cases and error conditions for comprehensive test coverage
Verify tests cover both positive and negative scenarios where appropriate
For async functions in backend tests, ensure proper async testing patterns are used with pytest
For API endpoints, verify both success and error response testing
Files:
src/lfx/tests/unit/base/data/test_docling_utils.py
🧬 Code graph analysis (2)
src/lfx/tests/unit/base/data/test_docling_utils.py (1)
src/lfx/src/lfx/base/data/docling_utils.py (1)
_get_cached_converter(155-224)
src/lfx/src/lfx/components/docling/docling_inline.py (1)
src/lfx/src/lfx/base/data/docling_utils.py (1)
docling_worker(226-463)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Update Component Index
🔇 Additional comments (6)
src/lfx/src/lfx/base/data/docling_utils.py (2)
284-289: Imports withnoqa: F401are appropriate for dependency checking.These imports verify Docling dependencies are available before proceeding. The
noqacomments correctly suppress unused import warnings since these are intentionally imported only for availability checking.
152-224: LRU cache does not prevent duplicate initialization on concurrent cache misses.Python's
@lru_cacheis thread-safe for its internal state but does not guarantee "call-once" semantics. When multiple threads simultaneously request the cache with the same missing key, each will execute the wrapped function independently. Given that converter creation takes 15-20 minutes as noted in the docstring, concurrent requests with identical configurations (pipeline, ocr_engine, picture classification settings) could trigger redundant heavy initialization.Current usage (single thread per operation) mitigates this, but if concurrent requests with identical converter configs become a requirement, implement a lock-based caching pattern (e.g., cachetools with condition or per-key locks) to prevent cache stampede.
src/lfx/tests/unit/base/data/test_docling_utils.py (1)
142-154: Tests appropriately mock external dependencies and focus on caching behavior.The mock usage is appropriate per coding guidelines -
DocumentConverteris an external dependency from the docling package, and the tests correctly focus on verifying the LRU caching mechanism rather than the converter's internals.src/lfx/src/lfx/components/docling/docling_inline.py (3)
95-130: Thread monitoring logic is sound and handles edge cases well.The implementation correctly:
- Polls the queue with timeout to avoid blocking
- Checks thread liveness to detect crashes
- Raises appropriate errors with clear messages
This is a clean translation from process monitoring to thread monitoring.
132-145: Correct handling of Python's thread termination limitations.The implementation correctly acknowledges that Python threads cannot be forcefully killed. The warning when a thread remains alive after timeout is appropriate for debugging.
Note: If the worker thread is stuck in a long-running operation (e.g., model loading), this could leave the thread running indefinitely. Consider documenting this behavior or implementing a cooperative shutdown mechanism using
threading.Event.
167-182: Thread setup is appropriate for the caching use case.Using
daemon=Falseensures the worker completes even if the main thread exits, preventing data loss. Thequeue.Queueis the correct choice for thread-safe communication, and this enables the global@lru_cacheon_get_cached_converterto work as intended.
Co-Authored-By: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
This pull request introduces a major optimization to Docling document processing by caching heavy model objects and switching from multiprocessing to threading. The most important changes are the introduction of a global LRU cache for
DocumentConverterinstances (eliminating repeated 15-20 minute model load times), and refactoring the inline component to use threads instead of processes, enabling shared memory and cache reuse. Additional improvements include code cleanup, updated error handling, and improved thread monitoring.Performance Optimization and Caching
DocumentConverterinstances using@lru_cache, drastically reducing subsequent processing times from 15-20 minutes to seconds by reusing loaded models. (src/lfx/src/lfx/base/data/docling_utils.py)src/lfx/src/lfx/base/data/docling_utils.py)Refactoring for Threading
queue.Queueinstead of multiprocessing, enabling the use of the global cache and reducing memory overhead. (src/lfx/src/lfx/components/docling/docling_inline.py) [1] [2]src/lfx/src/lfx/components/docling/docling_inline.py)Code Cleanup and Import Updates
src/lfx/src/lfx/base/data/docling_utils.py) [1] [2]noqa: F401for clarity and to avoid linter warnings. (src/lfx/src/lfx/base/data/docling_utils.py)Testing
src/lfx/tests/unit/base/data/test_docling_utils.py)Summary by CodeRabbit
Performance
Tests
✏️ Tip: You can customize this high-level summary in your review settings.