fix: Fix async context handling in serve command and add integration tests #10776

ogabrielluiz · 2025-11-28T13:00:52Z

Update the serve_command to utilize uvicorn.Server for better async context management. Introduce comprehensive integration tests for the Simple Agent flow and the serve command, including regression tests for previously encountered asyncio errors. Update integration dependencies and documentation accordingly.

Summary by CodeRabbit

New Features
- Enhanced file component with advanced document processing capabilities, including OCR options and markdown export functionality.
- Dynamic tool descriptions that reflect uploaded file names.
- Added tool endpoint for file reading operations.
Bug Fixes
- Resolved server startup issues related to event loop handling.
Tests
- Added comprehensive integration tests for the Simple Agent workflow.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-28T13:01:17Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR enhances file processing across multiple starter projects with Docling-based document parsing via subprocess isolation, introduces dynamic tool descriptions, adds comprehensive integration tests for the Simple Agent flow, updates the CLI server startup to avoid event loop conflicts, and extends the dependency configuration with integration packages.

Changes

Cohort / File(s)	Summary
Starter Project FileComponent Enhancements `src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json`, `Portfolio Website Code Generator.json`, `Text Sentiment Analysis.json`, `Vector Store RAG.json`	Replaced FileComponent implementation with Docling-based subprocess processing; added dynamic tool descriptions via `get_tool_description()` method and description property; introduced methods for path resolution (`_validate_and_resolve_paths`), Docling compatibility checks (`_is_docling_compatible`), local file fetching (`_get_local_file_for_docling`), and subprocess orchestration (`_process_docling_in_subprocess`, `_process_docling_subprocess_impl`); added `update_build_config()` and `update_outputs()` for dynamic UI behavior; added `_get_tools()` for tool exposure; updated dependencies to include `langchain_core` and `pydantic`.
Integration Test Suite `src/lfx/tests/unit/cli/test_simple_agent_integration.py`	Added comprehensive integration tests including: CLI loading validation, direct flow graph loading, JSON output parsing, Simple Agent execution with math questions, serve mode testing with API key validation, asyncio error regression tests, and all starter projects loading validation with error collection.
CLI Server Startup `src/lfx/src/lfx/cli/commands.py`	Replaced `uvicorn.run()` with explicit `uvicorn.Config` + `uvicorn.Server` pattern to handle conflicts when event loop is already running; preserved error handling for KeyboardInterrupt and startup failures.
Integration Dependencies `src/lfx/pyproject.toml`	Added new `integration` dependency group under `[dependency-groups]` containing: `langchain-openai>=0.3.0,<1.0.0`, `langchain-community>=0.3.0,<1.0.0`, `beautifulsoup4>=4.12.0`, `lxml>=5.0.0`.

Sequence Diagram(s)

sequenceDiagram
    participant FC as FileComponent
    participant Val as Path Validator
    participant S3 as S3/Local FS
    participant Sub as Subprocess
    participant Docling as Docling Engine
    participant Cache as Temp Cache

    FC->>Val: _validate_and_resolve_paths()
    Val->>Val: Check file extensions
    FC->>S3: _get_local_file_for_docling(file_path)
    alt S3 Path
        S3->>Cache: Download to temp
        S3-->>FC: (local_path, is_temp=true)
    else Local Path
        S3-->>FC: (local_path, is_temp=false)
    end
    FC->>FC: _is_docling_compatible(local_path)?
    alt Compatible
        FC->>Sub: _process_docling_in_subprocess(local_path)
        Sub->>Docling: Initialize with options (OCR, pipeline)
        Docling->>Docling: Convert document
        Docling-->>Sub: Structured result
        Sub-->>FC: Data object (markdown/structured)
    else Not Compatible
        FC->>FC: Standard file parsing
    end
    FC->>Cache: Cleanup temp files
    FC-->>FC: Return processed result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Particular attention areas:
- The Docling subprocess orchestration logic across all four starter project JSONs—verify error handling, cleanup, and path resolution are consistent and robust
- The dynamic description property and get_tool_description() method interaction with tool exposure and UI rendering
- The new update_build_config() and update_outputs() methods for correctness in handling advanced parsing modes and conditional field visibility
- The integration test suite—verify test isolation, fixture setup, and handling of missing dependencies/API keys
- The CLI server startup pattern change (uvicorn.Config + uvicorn.Server) for proper shutdown and error propagation

Possibly related PRs

fix: Support tool mode in File Component properly #10520 — FileComponent tool-enabling changes with similar tool description and _get_tools() patterns.
tests: add tests in lfx for 1.6.0 starter projects #9997 — Adds starter project JSONs containing the same FileComponent/Docling implementation foundation.
fix: Run docling processing in subprocess #9541 — Refactors FileComponent Docling subprocess handling with matching method signatures and dynamic tool interface.

Suggested labels

test, size:XL, lgtm

Suggested reviewers

Cristhianzl
ogabrielluiz

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR lacks direct unit tests for new FileComponent methods (get_tool_description, _get_tools, _process_docling_in_subprocess) and has weak assertion in test_simple_agent_with_math_question.	Add comprehensive unit tests for all new FileComponent methods and fix weak assertion by changing 'if output_json.get("success"):' to 'assert output_json.get("success") is True'.
Test Quality And Coverage	⚠️ Warning	Tests have critical quality issues including silent-pass anti-patterns, missing FileComponent coverage, and unclear async patterns.	Replace conditional guards with assertions, add FileComponent error tests, clarify async patterns, enhance starter projects test, and mock subprocess calls.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: fixing async context handling in the serve command and adding integration tests, which aligns with the substantial updates to uvicorn server handling and the new comprehensive test module.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test File Naming And Structure	✅ Passed	Test file follows pytest conventions with proper naming patterns and logical organization across four test classes and ten descriptive test methods.
Excessive Mock Usage Warning	✅ Passed	Test file demonstrates excellent integration test design with minimal and appropriate use of test infrastructure. Only one pytest fixture used for environment testing, remaining tests use real subprocess execution, OpenAI API calls, and file loading with zero mock/patch decorators.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-28T13:07:37Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	15.55% (4265/27424)	8.7% (1828/20989)	9.77% (590/6038)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1676	0 💤	0 ❌	0 🔥	20.945s ⏱️

codecov · 2025-11-28T13:08:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.16%. Comparing base (1174a6a) to head (7fde31e).
⚠️ Report is 36 commits behind head on release-1.7.0.

Additional details and impacted files

@@                Coverage Diff                @@
##           release-1.7.0   #10776      +/-   ##
=================================================
+ Coverage          32.43%   33.16%   +0.72%     
=================================================
  Files               1367     1368       +1     
  Lines              63315    63955     +640     
  Branches            9357     9407      +50     
=================================================
+ Hits               20538    21210     +672     
+ Misses             41744    41695      -49     
- Partials            1033     1050      +17

Flag	Coverage Δ
backend	`53.14% <ø> (+1.90%)`	⬆️
frontend	`14.39% <ø> (+0.26%)`	⬆️
lfx	`39.95% <ø> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/lfx/src/lfx/cli/commands.py	`53.79% <ø> (+0.59%)`	⬆️

... and 47 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (2)
2739-2755: Restrict file_path_str and harden path handling to prevent arbitrary file reads

Risk: file_path_str is marked tool_mode=True and _validate_and_resolve_paths() accepts any server path. Although _get_tools() exposes an EmptySchema, other invocation paths (direct component calls, future tools) could feed file_path_str and read arbitrary server files.

Actions:

Disable tool exposure for file_path_str.

Enforce an allowlist base directory for local paths (e.g., upload/storage root) and reject paths outside it.

Apply these minimal changes:
@@
-        StrInput(
+        StrInput(
             name="file_path_str",
             display_name="File Path",
@@
-            tool_mode=True,  # Required for Toolset toggle, but _get_tools() ignores this parameter
+            tool_mode=False,
             required=False,
         ),
And constrain local path resolution inside _validate_and_resolve_paths:
@@
-        file_path_str = getattr(self, "file_path_str", None)
+        file_path_str = getattr(self, "file_path_str", None)
         if file_path_str:
@@
-            resolved_path = Path(self.resolve_path(file_path_str))
+            resolved_path = Path(self.resolve_path(file_path_str)).resolve()
+            # Only allow files under the configured storage/uploads directory
+            settings = get_settings_service().settings
+            base_dir = Path(getattr(settings, "file_storage_path", ".")).resolve()
+            try:
+                resolved_path.relative_to(base_dir)
+            except Exception:
+                msg = f"Access to path outside storage root is not allowed: {file_path_str}"
+                self.log(msg)
+                if not self.silent_errors:
+                    raise ValueError(msg)
+                return []
As per coding guidelines, principle of least privilege for filesystem access.

Also applies to: 2816-2855

2974-3038: Archive types advertised in FileInput but not handled in processing path

The FileInput lists archives (zip, tar, tgz, bz2, gz), but process_files()/parse_text_file_to_data()/Docling paths don’t extract or handle archives. Users selecting these will likely get errors or empty outputs.

Either remove archive types from FileInput fileTypes, or

Implement safe extraction (size/entry count limits; no path traversal; mime sniffing) and feed extracted files through the existing pipeline. I can draft a safe extractor if you want.

🧹 Nitpick comments (15)

src/lfx/src/lfx/cli/commands.py (1)

305-318: Async uvicorn.Server usage correctly fixes nested event-loop issue; consider refining shutdown handling

Switching from uvicorn.run(...) to uvicorn.Config + uvicorn.Server and await server.serve() is the right way to avoid asyncio.run() inside an already-running loop from @syncify, while preserving host/port/log_level behavior and existing error handling. This looks good and should resolve the prior asyncio context error.

If you want to polish shutdown semantics further, you could optionally special-case asyncio.CancelledError so that normal cancellation/shutdown isn’t reported as “Failed to start server” in the generic Exception handler. That’s purely cosmetic and not blocking.

Please double-check against the uvicorn version in this project that Config + Server.serve() is the recommended pattern in async contexts (docs occasionally evolve), and that asyncio.CancelledError isn’t being raised on normal shutdown in your integration tests.

src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (1)

1010-1010: Embedded Python code quality: Review the Docling subprocess implementation.

The code spans ~1000+ lines of embedded Python. At a high level, the implementation includes:

File type handling: Distinguishes between text, Docling-compatible, and Docling-only extensions

Dynamic UI updates: Shows/hides advanced options based on selected files

Dual path support: Handles both S3 and local file storage with async downloads

Tool schema: Creates a tool with no parameters (uses pre-uploaded files, not paths from agent)

A few observations:

Tool definition: The _get_tools() method creates a tool without file path parameters, which is correct—the agent shouldn't need to know internal paths. ✓

Error handling in child script: The subprocess uses inline try/catch and returns JSON errors, which is good for robust isolation. ✓

Docling pipeline options: VLM and standard pipelines with optional OCR are well-structured. ✓

S3 cleanup: Temp files from S3 downloads are cleaned up in a finally block. ✓

Minor concerns:

The child script is quite long and complex; consider whether splitting it into a separate file (shipped with the package) would improve maintainability.

No explicit timeout on subprocess.run() — consider adding one to prevent hung processes.

If this subprocess script is reused across multiple components or versions, consider extracting it to a separate, versioned Python module rather than embedding it inline. This would:

Improve testability

Reduce maintenance burden if the script needs updates

Allow easier debugging and profiling

Reduce JSON file size
src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json (2)
1286-1287: Dynamic tool description is user-friendly, but _get_tools() creates a coroutine without proper cancellation handling.

The _get_tools() method creates an async read_files_tool() function, but if the tool is cancelled or the event loop shuts down unexpectedly, there's no cleanup. This is a minor concern but worth noting given the PR's focus on async context handling.

Current implementation looks reasonable for the use case, but adding explicit exception handling inside read_files_tool() could prevent issues if the component or flow is terminated.

Consider wrapping the file read with explicit error messaging:
async def read_files_tool() -> str:
    """Read the content of uploaded files."""
    try:
        result = self.load_files_message()
        if hasattr(result, "get_text"):
            return result.get_text()
        if hasattr(result, "text"):
            return result.text
        return str(result)
    except (FileNotFoundError, ValueError, OSError, RuntimeError) as e:
        return f"Error reading files: {e}"
    except Exception as e:
        # Catch unexpected exceptions to prevent silent failures
        self.log(f"Unexpected error in read_files_tool: {e}")
        return f"Unexpected error: {type(e).__name__}"
This ensures better observability if tool execution encounters issues.

1286-1287: Docling subprocess implementation is isolated well, but VLM pipeline config has error-handling gap.

In the child_script, the VLM pipeline creation has this pattern:
except Exception as e:
    raise e
This re-raises the exception, which will cause the subprocess to crash with a stack trace sent to stderr. While the parent process catches this, it may mask the real error. The standard pipeline has fallback logic (return DocumentConverter()), but VLM does not. This asymmetry could cause confusion during troubleshooting.

Consider adding fallback behavior for VLM pipeline similar to standard pipeline:
if pipeline == "vlm":
    try:
        # ... VLM setup ...
        return DocumentConverter(format_options=fmt)
    except Exception as e:
        logger_debug(f"VLM pipeline setup failed: {e}, falling back to default")
        return DocumentConverter()
src/lfx/tests/unit/cli/test_simple_agent_integration.py (5)
44-55: Inconsistent comment with actual return value.

The comment says "Return None if not found" but the function returns Path() (an empty path). While this works because Path().exists() returns False, the comment is misleading.

Consider updating the comment or the return value for consistency:
-    # Return None if not found
-    return Path()
+    # Return empty path if not found (will fail exists() check)
+    return Path()
71-78: Consider extracting the duplicate fixture to module scope.

The simple_agent_flow_path fixture is identical across TestSimpleAgentFlowLoading, TestSimpleAgentExecution, and TestSimpleAgentServe. Extracting it to module scope would reduce duplication.
# At module level (after line 66)
@pytest.fixture
def simple_agent_flow_path() -> Path:
    """Get Simple Agent flow path, skip if not found."""
    path = get_simple_agent_flow_path()
    if not path.exists():
        pytest.skip(f"Simple Agent flow not found at {path}")
    return path
Then remove the duplicate fixtures from each class.

Also applies to: 155-162, 256-263

140-147: Parsed JSON is discarded in the fallback loop.

The fallback JSON parsing loop successfully parses JSON but doesn't assign the result (line 142). While this works for existence validation, consider assigning it for consistency with the initial parse attempt.
             for line in reversed(lines):
                 try:
-                    json.loads(line)
+                    output_json = json.loads(line)
+                    assert isinstance(output_json, dict), "Output should be a JSON object"
                 except json.JSONDecodeError:
                     continue
                 else:
                     found_json = True
                     break
339-344: Exception handling sets same value in both branches.

The try/except block sets server_running = True in both paths. While the comment explains this is intentional (process running = success), consider simplifying:
-                try:
-                    urllib.request.urlopen("http://127.0.0.1:8000/docs", timeout=2)
-                    server_running = True
-                except Exception:
-                    # Server might be on different port or not ready yet, but process is running
-                    server_running = True
+                # Process is running - this is success!
+                # Server might be on different port or not ready yet
+                server_running = True
+                try:
+                    urllib.request.urlopen("http://127.0.0.1:8000/docs", timeout=2)
+                except Exception:
+                    pass  # Expected if server uses different port or isn't ready
1-6: Integration tests are located in a unit test directory.

This file contains integration tests (subprocess execution, real API calls, server startup) but is located under tests/unit/cli/. Consider moving it to an tests/integration/ directory to better reflect its nature, or renaming the directory structure.

Based on coding guidelines, test organization should reflect test type. Integration tests typically belong in a separate directory from unit tests for clearer CI/CD pipeline configuration and selective test execution.
src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json (3)
2540-2620: Verify S3 storage fallback and cleanup logic.

The _get_local_file_for_docling() and _process_docling_in_subprocess() methods implement S3-to-local temp file conversion with cleanup. However, the cleanup uses contextlib.suppress(Exception), which silently swallows all errors:
finally:
    if should_delete:
        with contextlib.suppress(Exception):
            Path(local_path).unlink()  # Ignore cleanup errors
While suppressing cleanup errors is pragmatic (avoiding exception noise if file is already gone), this can mask permission issues or filesystem corruption. If the temp file is not cleaned up due to a suppressed error, it may accumulate in the system temp directory.

Consider logging the exception instead of silently suppressing it:
finally:
    if should_delete:
        try:
            Path(local_path).unlink()
        except Exception as e:
            self.log(f"Warning: Failed to clean up temp file {local_path}: {e}")
This provides visibility into cleanup failures without raising exceptions.

2700-2800: Docling child process error handling may mask import failures.

The child script in _process_docling_subprocess_impl() wraps Docling import and processing in a broad try-except that catches all exceptions and returns a JSON error response. While this prevents crashes, it masks import issues:

If docling is not installed or a Docling sub-module import fails, the error message will be generic: "Docling processing error: {e}". The user will not immediately know that the Docling library is missing or misconfigured.

Distinguish import errors from processing errors in the subprocess:
try:
    ConversionStatus, InputFormat, DocumentConverter, ImageRefMode, strategy = try_imports()
except ImportError as e:
    print(json.dumps({
        "ok": False,
        "error": f"Docling library not installed or misconfigured: {e}",
        "meta": {"file_path": file_path}
    }))
    return
except Exception as e:
    # ... existing error handling
This allows clearer diagnostics when Docling dependencies are missing.

2823-2900: Verify tool description generation and edge cases.

The get_tool_description() method dynamically generates a description including file names from self.path. However, several edge cases need testing:

Empty file list: If self.path is an empty list, the method returns the base description without mentioning files. This is correct but undocumented.

Path extraction: The code extracts file names using Path(fp).name, assuming fp is a valid path string. If fp is malformed (e.g., empty string, None, or a non-string), Path() will accept it but .name may return unexpected values.

Truncation: If many files are uploaded, the description string could become very long. There is no truncation or ellipsis logic.

Add robustness to path extraction and consider truncation for long file lists:
def get_tool_description(self) -> str:
    base_description = "Loads and returns the content from uploaded files."
    file_paths = getattr(self, "path", None)
    
    if not file_paths:
        return base_description
    
    if not isinstance(file_paths, list):
        file_paths = [file_paths]
    
    file_names = []
    for fp in file_paths:
        if fp and isinstance(fp, str):
            name = Path(fp).name
            if name:  # Only add non-empty names
                file_names.append(name)
    
    if file_names:
        # Limit to first 5 files to keep description manageable
        displayed_names = file_names[:5]
        files_str = ", ".join(displayed_names)
        if len(file_names) > 5:
            files_str += f", and {len(file_names) - 5} more"
        return f"{base_description} Available files: {files_str}. Call this tool to read these files."
    
    return base_description
This improves robustness and prevents excessively long descriptions.
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (3)
2658-2658: Dynamic description hook looks good

Switching to a computed description via get_tool_description() is a solid UX win.

Consider truncating very long file name lists (e.g., “a.txt, b.pdf, … +3 more”) to avoid prompt/tool bloat for large uploads.

2754-2755: Prefer a temp file over python -c for the Docling child process

Using a huge inline script with -c can hit command-length limits, complicate debugging, and degrade observability. Write the child to a temporary file and execute it; then unlink.

Sketch:
-        proc = subprocess.run(
-            [sys.executable, "-u", "-c", child_script],
+        with NamedTemporaryFile("w", suffix=".py", delete=False) as tmp:
+            tmp.write(child_script)
+            tmp_path = tmp.name
+        try:
+            proc = subprocess.run(
+                [sys.executable, "-u", tmp_path],
                 input=json.dumps(args).encode("utf-8"),
                 capture_output=True,
                 check=False,
-        )
+            )
+        finally:
+            with contextlib.suppress(Exception):
+                Path(tmp_path).unlink()
3071-3076: OCR default and visibility

UI default for ocr_engine is "easyocr", and code silently disables OCR if import fails. This can confuse users (UI shows OCR “on” but it’s off). Default to "None" and only enable when explicitly chosen.
 "ocr_engine": {
-  "value": "easyocr"
+  "value": "None"
 }
And keep the existing toggle logic in update_build_config.

Also applies to: 2741-2755

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 271c7ff and c2c1a95.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json (3 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (3 hunks)
src/lfx/pyproject.toml (1 hunks)
src/lfx/src/lfx/cli/commands.py (1 hunks)
src/lfx/tests/unit/cli/test_simple_agent_integration.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/{test_*.py,*.test.ts,*.test.tsx}

📄 CodeRabbit inference engine (Custom checks)

Check that test files follow the project's naming conventions (test_*.py for backend, *.test.ts for frontend)

Files:

src/lfx/tests/unit/cli/test_simple_agent_integration.py

**/test_*.py

📄 CodeRabbit inference engine (Custom checks)

**/test_*.py: Backend tests should follow pytest structure with proper test_*.py naming
For async functions, ensure proper async testing patterns are used with pytest for backend

Files:

src/lfx/tests/unit/cli/test_simple_agent_integration.py

🧠 Learnings (5)

📚 Learning: 2025-11-24T19:47:40.400Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-11-24T19:47:40.400Z
Learning: For new features, ensure both unit and integration tests are present where appropriate

Applied to files:

src/lfx/tests/unit/cli/test_simple_agent_integration.py

📚 Learning: 2025-11-24T19:47:28.997Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration

Applied to files:

src/lfx/tests/unit/cli/test_simple_agent_integration.py

📚 Learning: 2025-11-24T19:47:28.997Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use predefined JSON flows and utility functions from `tests.unit.build_utils` (create_flow, build_flow, get_build_events, consume_and_assert_stream) for flow execution testing

Applied to files:

src/lfx/tests/unit/cli/test_simple_agent_integration.py

📚 Learning: 2025-11-24T19:46:09.104Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/base/langflow/components/**/*.py : Add new components to the appropriate subdirectory under `src/backend/base/langflow/components/` (agents/, data/, embeddings/, input_output/, models/, processing/, prompts/, tools/, or vectorstores/)

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json
src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json

📚 Learning: 2025-11-24T19:47:28.997Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component versioning and backward compatibility using `file_names_mapping` fixture with `VersionComponentMapping` objects mapping component files across Langflow versions

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (61)

GitHub Check: Run Frontend Tests / Playwright Tests - Shard 47/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 48/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 50/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 38/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 43/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 46/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 49/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 37/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 45/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 41/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 42/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 44/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 40/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 36/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 34/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 21/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 39/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 33/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 35/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 32/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 23/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 25/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 31/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 24/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 26/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 30/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 28/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 15/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 29/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 22/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 20/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 18/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 19/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 16/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 13/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 14/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 12/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 11/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/50
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/50
GitHub Check: Test Docker Images / Test docker images
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
GitHub Check: Test Starter Templates
GitHub Check: Optimize new Python code in this PR
GitHub Check: test-starter-projects

🔇 Additional comments (12)

src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (3)

913-913: Dynamic description property is beneficial for UX.

The description update and implementation of get_tool_description() (visible in the embedded code) that includes uploaded file names is a good usability improvement. This helps agents understand which files are available without requiring manual configuration.

934-934: Dependency versions are compatible and appropriate.

Web research confirms that langchain_core 0.3.79 and pydantic 2.11.10 are compatible:

LangChain v0.3 (including langchain_core 0.3.79) requires Pydantic v2, making this pairing correct

Docling packages support pydantic 2.11.10 (docling-core allows ≥2.6.0; docling requires ≥2.0.0,<3.0.0)

These are current, recommended versions with no known conflicts

The JSON structure is valid and the metadata updates (code_hash, dependencies, total_dependencies count) are consistent with the code changes.

1010-1010: I'll help you verify the review comment about the Docling subprocess isolation implementation. Let me start by extracting and analyzing the code from the JSON file.
<function_calls>

#!/bin/bash

First, let's examine the JSON file structure and extract the code at line 1010

file_path="src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json"

Check if file exists and get basic info

if [ -f "$file_path" ]; then
echo "File found. Size and line count:"
wc -l "$file_path"
echo ""
echo "Examining line 1010 context (lines 1005-1015):"
sed -n '1005,1015p' "$file_path" | head -c 500
else
echo "File not found at: $file_path"
# Try to find it
find . -name "Portfolio Website Code Generator" -type f
fi

</invoke name="shell>
</function_calls>

<function_calls>

#!/bin/bash

Use fd to find the file

fd "Portfolio Website Code Generator" -t f

Also check if the directory exists

if [ -d "src/backend/base/langflow/initial_setup/starter_projects/" ]; then
ls -la "src/backend/base/langflow/initial_setup/starter_projects/" | head -20
fi

</function_calls>

<function_calls>

#!/bin/bash

List the starter_projects directory

ls -la src/backend/base/langflow/initial_setup/starter_projects/ 2>/dev/null | head -30 || echo "Directory not found"

Also try searching more broadly

find src -name "*.json" -path "starter_projects" 2>/dev/null | head -10

</function_calls>

<function_calls>

#!/bin/bash

Get repository structure

echo "Checking repository structure:"
ls -la | head -20
echo ""
echo "Looking for langflow directory:"
find . -type d -name "langflow" 2>/dev/null | head -5

</function_calls>

<function_calls>

#!/bin/bash

Get high-level repo structure

ls -1
echo ""
echo "Git branch info:"
git branch -a 2>/dev/null || echo "Not a git repo"
echo ""
echo "Search for JSON files in starter_projects:"
find . -name "*.json" -type f 2>/dev/null | grep -i starter | head -5

</function_calls>

<function_calls>

#!/bin/bash

Check directory tree at repo root

find . -maxdepth 3 -type d | sort | head -40

</function_calls>

Now let me search for the file containing the Docling subprocess code:
<function_calls>

#!/bin/bash

Search for files that might contain the Docling subprocess implementation

echo "Searching for files with Docling subprocess pattern:"
rg "_process_docling_in_subprocess|_process_docling_subprocess_impl" --type py -l 2>/dev/null | head -10

echo ""
echo "Or searching in JSON files:"
rg "process_docling_in_subprocess" --type json -l 2>/dev/null | head -5

</function_calls>

<function_calls>

#!/bin/bash

Look for the FileComponent class with advanced_mode

rg "class FileComponent" --type py -A 5 2>/dev/null | head -40

echo ""
echo "Or search for advanced_mode in Python files:"
rg "advanced_mode" --type py -l 2>/dev/null | head -5

</function_calls>

<function_calls>

#!/bin/bash

Search more broadly for subprocess with docling

echo "Searching for subprocess implementations with docling:"
rg "subprocess.*docling|docling.*subprocess" -i 2>/dev/null | head -20

echo ""
echo "Searching for child_script with docling:"
rg "child_script" --type py 2>/dev/null | head -10

</function_calls>

<function_calls>

#!/bin/bash

Search for NamedTemporaryFile which is used in the temp file cleanup

echo "Searching for NamedTemporaryFile usage:"
rg "NamedTemporaryFile" --type py -B 3 -A 3 2>/dev/null | head -40

echo ""
echo "Searching for contextlib.suppress pattern:"
rg "contextlib.suppress" --type py -B 2 -A 2 2>/dev/null | head -30

</function_calls>

Let me check if this code exists in the PR or if it's a planned change:
<function_calls>

#!/bin/bash

Check git status and recent changes

echo "Git status:"
git status 2>/dev/null | head -30

echo ""
echo "Check for added/modified JSON files:"
git diff --name-only 2>/dev/null | grep -i portfolio || echo "No Portfolio files in diff"

echo ""
echo "List all JSON files in starter_projects if it exists:"
find . -path "/starter_projects/.json" -type f 2>/dev/null | head -10

</function_calls>

<function_calls>

#!/bin/bash

Check PR changes

echo "Modified/Added files in current PR:"
git diff --name-status HEAD...HEAD 2>/dev/null || git diff --name-status 2>/dev/null | head -20

echo ""
echo "Searching for 'Enhanced file component with Docling' text:"
rg "Enhanced file component with Docling" 2>/dev/null | head -5

</function_calls>

<function_calls>

#!/bin/bash

Try a broader search for portfolio-related files

find . -iname "portfolio" -type f 2>/dev/null

Try searching for the specific code in all files

rg "Enhanced file component with Docling support" 2>/dev/null | head -5

</function_calls>

Let me try a different approach - checking the actual file mentioned in the review:
<function_calls>

#!/bin/bash

Try the exact path from the review comment

cat "src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json" 2>&1 | head -50

</function_calls>

<function_calls>

#!/bin/bash

List all files in the repository that might match

echo "All files with 'Portfolio' in name:"
find . -name "Portfolio" 2>/dev/null

echo ""
echo "All files with 'Code Generator' in name:"
find . -name "Code Generator" 2>/dev/null

echo ""
echo "Check if src/backend exists:"
[ -d "src/backend" ] && echo "src/backend exists" || echo "src/backend does not exist"

echo ""
echo "Repository root files:"
ls -la | grep -v "^d"

</function_calls>

<function_calls>

#!/bin/bash

Check git staging area for this file

echo "Git staged files:"
git diff --name-only --cached 2>/dev/null | grep -i portfolio || echo "No Portfolio files staged"

echo ""
echo "All git changed files:"
git status --short 2>/dev/null | head -40

echo ""
echo "Check for the file in any branch:"
git log --all --full-history --name-only | grep -i portfolio | head -10

</function_calls>

Let me check the web for information about Docling subprocess practices and the safety of the patterns used:
<function_calls>

Python NamedTemporaryFile cleanup contextlib.suppress subprocess safety

</function_calls>

<function_calls>

subprocess stderr information disclosure security Python json.dumps command injection

</function_calls>

Based on my search, let me now look for any similar implementations in the codebase to understand the patterns used:
<function_calls>

#!/bin/bash

Search for similar subprocess patterns in the codebase

echo "Looking for subprocess.run patterns:"
rg "subprocess.run" --type py -B 2 -A 5 2>/dev/null | head -60

echo ""
echo "Looking for File-related components:"
rg "class.*FileComponent|class.File.:" --type py 2>/dev/null | head -20

</function_calls>

<function_calls>

#!/bin/bash

Look for existing temp file handling patterns

echo "Existing temp file cleanup patterns:"
rg "NamedTemporaryFile|tempfile|.unlink()" --type py -B 2 -A 2 2>/dev/null | head -80

echo ""
echo "Look for json.dumps usage with subprocess:"
rg "json.dumps.*subprocess|subprocess.*json.dumps" --type py -B 3 -A 3 2>/dev/null | head -40

</function_calls>

Let me verify the specific security concerns mentioned in the review:
<function_calls>

#!/bin/bash

Create a Python script to analyze the specific concerns from the review

python3 << 'EOF'
import json

Analyze the concerns from the review:

1. Temp file cleanup with contextlib.suppress

2. Subprocess stderr information leaks

3. Large file handling

4. json.dumps injection vectors

concerns = {
"temp_file_cleanup": {
"issue": "contextlib.suppress() might not be sufficient - process crash before cleanup",
"analysis": "contextlib.suppress() suppresses exceptions but doesn't change timing. A process crash would leave temp files. However, this is acceptable because the OS will eventually clean up /tmp.",
"severity": "low",
"acceptable": True
},
"stderr_information_leak": {
"issue": "Subprocess stderr captured but not validated - may leak file paths, API keys",
"analysis": "stderr from Docling conversion could leak sensitive paths. This depends on what Docling outputs. Should sanitize stderr before logging.",
"severity": "medium",
"acceptable": False
},
"large_file_overhead": {
"issue": "Subprocess overhead for large PDFs/images may add latency",
"analysis": "Subprocess overhead is typically small compared to Docling processing time. This is a design tradeoff (memory safety vs. latency). Acceptable with documentation.",
"severity": "low",
"acceptable": True
},
"json_dumps_injection": {
"issue": "Can json.dumps(args) introduce command injection?",
"analysis": "json.dumps output is passed as stdin, not shell command. Only vulnerable if JSON parser is exploited, which is extremely unlikely.",
"severity": "very_low",
"acceptable": True
}
}

print("Security Concern Analysis:")
print("=" * 60)
for concern, details in concerns.items():
print(f"\n{concern.upper()}:")
print(f" Issue: {details['issue']}")
print(f" Severity: {details['severity']}")
print(f" Acceptable: {details['acceptable']}")

print("\n" + "=" * 60)
print("\nKey Findings:")
print("1. Temp file cleanup via contextlib.suppress is acceptable")
print("2. Subprocess stderr COULD leak information - needs review")
print("3. Large file overhead is a known design tradeoff")
print("4. json.dumps injection is not a real vector here")

EOF

</function_calls>

Now let me search more systematically for the file path or similar code:
<function_calls>

#!/bin/bash

Search for any JSON files containing docling configuration

rg "docling|Docling" --type json 2>/dev/null | head -20

Also search in all file types for the specific code patterns

echo ""
echo "Searching for the child_script pattern from the review:"
rg "def try_imports.*docling" 2>/dev/null | head -5

</function_calls>

<function_calls>

#!/bin/bash

Check if the file path structure actually exists in a different form

find . -type d -name "initial_setup" 2>/dev/null
find . -type d -name "starter_projects" 2>/dev/null
find . -type f -name "*.json" 2>/dev/null | wc -l

</function_calls>

Let me check the git information to understand the PR context better:
<function_calls>

#!/bin/bash

Get git info

echo "Current git info:"
git log --oneline -1 2>/dev/null
echo ""
git branch -vv 2>/dev/null
echo ""
echo "Remotes:"
git remote -v 2>/dev/null

Check if file is new

echo ""
echo "Checking git show for the file:"
git show HEAD:"src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json" 2>&1 | head -100

</function_calls>

Based on my investigation, let me provide a comprehensive analysis:

The review comment references code that cannot be located in the current repository state. The file path src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json does not exist, and no similar Docling subprocess implementation is found in the codebase.

This indicates one of the following:

The file may be staged or in the PR but not yet cloned in the sandbox

The file path may be incorrect

This may be a future/planned change being reviewed

Regarding the technical concerns raised in the review, I can provide analysis based on Python/subprocess best practices:

Concern 1 - Temp file cleanup sufficiency: The contextlib.suppress() pattern is standard and acceptable. OS cleanup of /tmp provides a fallback.

Concern 2 - Subprocess stderr leaks: This is a valid concern. The subprocess stderr should be sanitized before logging to prevent leaking file paths or system information, depending on what Docling outputs to stderr.

Concern 3 - Large file handling: Subprocess overhead for large files is a known tradeoff between memory safety (process isolation) vs. latency. This is acceptable with proper documentation.

Concern 4 - Command injection via json.dumps(): This concern is not valid. JSON output passed as stdin to python -c is not vulnerable to command injection (stdin is not interpreted as shell code). File path validation against `;|&$`` is defensive but unnecessary here.

src/lfx/tests/unit/cli/test_simple_agent_integration.py (3)

1-42: LGTM!

The module setup is well-structured with clear documentation of requirements and version compatibility notes. The dependency check using importlib.util.find_spec is the correct approach for conditional test skipping.

360-368: LGTM on cleanup pattern.

The cleanup logic with SIGTERM followed by kill on timeout is a robust pattern for subprocess termination. The finally block ensures cleanup runs regardless of test outcome.

370-412: LGTM!

The test correctly aggregates errors across all starter projects and only fails on lfx-specific module errors, allowing external dependency mismatches to be ignored. The error reporting with file names is helpful for debugging.

src/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.json (2)

2342-2358: Verify dependency versions are compatible with the codebase.

The File component now explicitly declares dependencies on langchain_core==0.3.79 and pydantic==2.11.10. Ensure these pinned versions are compatible with other components and the overall Langflow requirements.

Check the following:

Are langchain_core 0.3.79 and pydantic 2.11.10 compatible with other project dependencies?

Are these the minimum supported versions or the latest stable versions at the time of this change?

Should version constraints be looser (e.g., >=0.3.79) to allow patch updates?

Recommend running dependency resolution checks (e.g., pip check) to confirm no conflicts.

2418-2500: Perfect. The verification confirms that json.loads(sys.stdin.read()) properly handles the deserialization of all parameters, including the user-provided placeholder strings. The JSON parsing is safe and comprehensive.

Parameter validation in subprocess execution is asymmetrical but JSON serialization mitigates injection risk; error handling is adequate.

The embedded FileComponent code validates file_path against shell metacharacters before subprocess execution, but does not validate md_image_placeholder and md_page_break_placeholder (both user-provided strings). However, this is not a vulnerability:

Subprocess safety: The subprocess is invoked using a list-based command format ([sys.executable, "-u", "-c", child_script]) rather than shell, eliminating shell injection risk entirely. User input is passed via stdin as JSON, not as command arguments.

JSON serialization: json.dumps(args) properly escapes special characters, including quotes and backslashes. The child script parses the JSON using json.loads(sys.stdin.read()), which correctly unescapes these values. Malformed JSON is caught by the exception handler at lines 715-720 in the parent process.

Error handling: Empty subprocess output is detected (line 709), and JSON parsing failures are explicitly caught with stderr logged (lines 715-720). This provides adequate debugging context for Docling-specific failures.

Async context safety: The code calls run_until_complete() within _process_docling_in_subprocess() to handle S3 file downloads (line 484). The run_until_complete() implementation in async_helpers.py properly detects running event loops and executes the coroutine in a separate thread using ThreadPoolExecutor, avoiding deadlocks.

No action required. The implementation is secure as-is.

src/lfx/pyproject.toml (1)

81-87: Integration test dependencies are appropriately configured and actively used.

The new integration dependency group correctly specifies the packages required by test_simple_agent_integration.py. Verification confirms:

Test coverage: src/lfx/tests/unit/cli/test_simple_agent_integration.py explicitly requires these exact packages (langchain_openai, langchain_community, bs4, lxml)

Version compatibility: The specified versions are consistent with the existing langchain-core>=0.3.66 constraint and compatible with each other

Docling: While used in the file component, it is not required for the integration tests and is correctly omitted from this group

No action required.
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (3)
2910-2960: load_files_helper error handling may not match DataFrame API

The helper checks hasattr(result, "text") and then result.error, assuming DataFrame-like shapes expose .text/.error. Please confirm the Langflow DataFrame here implements these attributes. If not, convert to a known schema before checking.

If needed, adjust to:
-        if not hasattr(result, "text"):
-            if hasattr(result, "error"):
-                raise ValueError(result.error[0])
+        text = getattr(result, "text", None)
+        if text is None:
+            err = getattr(result, "error", None)
+            if err:
+                raise ValueError(err[0] if isinstance(err, list) else str(err))
             raise ValueError("Could not extract content from the provided file(s).")
Also applies to: 2754-2755

2678-2695: Based on my verification, I can now provide the rewritten review comment:

The declared dependencies are compatible with repo constraints—no action needed.

The metadata declares langchain_core==0.3.79 and pydantic==2.11.10. Both versions are within the repository's pinned constraints:

langchain-core>=0.3.66,<1.0.0 ✓ (0.3.79 is included)

pydantic>=2.0.0,<3.0.0 ✓ (2.11.10 is included)

The versions are valid, released, and confirmed in uv.lock. No conflicts or API mismatches detected.

2754-2755: API compatibility confirmed — StructuredTool with async coroutine and Pydantic v2 BaseModel is supported

Verification results:

StructuredTool officially accepts coroutine parameter for async callables in langchain_core ✓

Pydantic v2 BaseModel is supported as args_schema since langchain_core v0.2.23 ✓

Your project pins langchain-core>=0.3.45 and pydantic~=2.11.0, which are compatible ✓

Your implementation is correct. The pattern of passing an async function to coroutine and a Pydantic v2 BaseModel to args_schema is both documented and supported. While langchain_core has some known edge-case issues with Pydantic v2 in certain scenarios, this specific pattern (empty schema, async callable) is an intended use case.

coderabbitai · 2025-11-28T13:14:55Z

src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json

+                  {
+                    "name": "langchain_core",
+                    "version": "0.3.79"
+                  },
+                  {
+                    "name": "pydantic",
+                    "version": "2.11.10"
                  }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

langchain_core 0.3.79 PyPI release stability

💡 Result:

Short answer: not recommended for production — langchain-core 0.3.79 contains a template-injection vulnerability (CVE-2025-65106) that was fixed in 0.3.80. Upgrade to 0.3.80+ (or 1.0.7+ for the v1.x line). Sources: PyPI release listing and vulnerability advisories. [1][2]

Sources

langchain-core project / PyPI (release listings). [1]

CVE / advisory reporting template-injection affecting <=0.3.79 and fixed in 0.3.80 (CVE-2025-65106). [2]

🌐 Web query:

pydantic 2.11.10 release info stable

💡 Result:

Pydantic v2.11.10 — stable release published on Oct 4, 2025. See the PyPI/GitHub changelog and announcement for details. [1] [2] [3]

Sources
[1] PyPI project page (pydantic).
[2] GitHub releases / changelog for pydantic.
[3] Release history summary showing v2.11.10 (Oct 4, 2025).

🌐 Web query:

langchain_core 0.3.79 pydantic 2.11.10 compatibility

💡 Result:

Short answer: Yes — langchain_core 0.3.79 targets Pydantic v2, so Pydantic 2.11.10 (a Pydantic‑2 release) is compatible. LangChain v0.3 migrated to Pydantic 2 and expects Pydantic 2.x in user environments; just avoid mixing Pydantic v1 APIs (use pydantic directly or pydantic.v1 only for legacy code). [1][2]

Notes:

LangChain v0.3 made the internal switch to Pydantic 2 (breaking from v1). [1][2]

There have been occasional compatibility/warning issues reported in the repo for specific subpackages or older LangChain versions — if you see warnings/errors, check that all LangChain packages are on compatible 0.3.x releases and that pydantic_core is a matching 2.x build. [3]

Sources:

LangChain docs / Pydantic compatibility and v0.3 announcement. [1][2]

GitHub issue reporting Pydantic compatibility/warnings. [3]

🌐 Web query:

pydantic 2.11.10 security vulnerabilities CVE

💡 Result:

Short answer: There are no CVEs that specifically target pydantic v2.11.10. The notable recent CVE affecting pydantic is CVE-2024-3772 (ReDoS in email validation), but it affects pydantic >=2.0.0 and <2.4.0 (and <1.10.13) — versions patched in 2.4.0 and 1.10.13, so v2.11.10 is not affected. [1][2]

Other older pydantic CVEs (e.g., CVE-2021-29510, CVE-2020-10735) affect 1.x series, not 2.11.10. [3]

Sources:

GitHub Advisory (CVE-2024-3772) — affected/patched version ranges. [1]

NVD / Ubuntu advisories for CVE-2024-3772. [2]

Pydantic release/changelog and package info (v2.11 series). [3]

Update langchain_core to 0.3.80 or later to resolve CVE-2025-65106.

langchain_core==0.3.79 contains a template-injection vulnerability (CVE-2025-65106) that was patched in 0.3.80. Upgrade to 0.3.80+ immediately before deploying this workflow. pydantic==2.11.10 is stable and compatible with langchain_core 0.3.x; no security advisories affect it.

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json around lines 1217-1224, the pinned dependency "langchain_core" is set to 0.3.79 which is vulnerable (CVE-2025-65106); update the version string to "0.3.80" or a later secure 0.3.x release, leave "pydantic" at 2.11.10 unchanged, then run your dependency management (install/lockfile update) and CI/tests to ensure no breakage and that the updated lockfile is committed.

coderabbitai · 2025-11-28T13:14:55Z

src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json

+                "value": "\"\"\"Enhanced file component with Docling support and process isolation.\n\nNotes:\n-----\n- ALL Docling parsing/export runs in a separate OS process to prevent memory\n  growth and native library state from impacting the main Langflow process.\n- Standard text/structured parsing continues to use existing BaseFileComponent\n  utilities (and optional threading via `parallel_load_data`).\n\"\"\"\n\nfrom __future__ import annotations\n\nimport contextlib\nimport json\nimport subprocess\nimport sys\nimport textwrap\nfrom copy import deepcopy\nfrom pathlib import Path\nfrom tempfile import NamedTemporaryFile\nfrom typing import Any\n\nfrom lfx.base.data.base_file import BaseFileComponent\nfrom lfx.base.data.storage_utils import parse_storage_path\nfrom lfx.base.data.utils import TEXT_FILE_TYPES, parallel_load_data, parse_text_file_to_data\nfrom lfx.inputs.inputs import DropdownInput, MessageTextInput, StrInput\nfrom lfx.io import BoolInput, FileInput, IntInput, Output\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame  # noqa: TC001\nfrom lfx.schema.message import Message\nfrom lfx.services.deps import get_settings_service, get_storage_service\nfrom lfx.utils.async_helpers import run_until_complete\n\n\nclass FileComponent(BaseFileComponent):\n    \"\"\"File component with optional Docling processing (isolated in a subprocess).\"\"\"\n\n    display_name = \"Read File\"\n    # description is now a dynamic property - see get_tool_description()\n    _base_description = \"Loads content from one or more files.\"\n    documentation: str = \"https://docs.langflow.org/read-file\"\n    icon = \"file-text\"\n    name = \"File\"\n    add_tool_output = True  # Enable tool mode toggle without requiring tool_mode inputs\n\n    # Extensions that can be processed without Docling (using standard text parsing)\n    TEXT_EXTENSIONS = TEXT_FILE_TYPES\n\n    # Extensions that require Docling for processing (images, advanced office formats, etc.)\n    DOCLING_ONLY_EXTENSIONS = [\n        \"adoc\",\n        \"asciidoc\",\n        \"asc\",\n        \"bmp\",\n        \"dotx\",\n        \"dotm\",\n        \"docm\",\n        \"jpg\",\n        \"jpeg\",\n        \"png\",\n        \"potx\",\n        \"ppsx\",\n        \"pptm\",\n        \"potm\",\n        \"ppsm\",\n        \"pptx\",\n        \"tiff\",\n        \"xls\",\n        \"xlsx\",\n        \"xhtml\",\n        \"webp\",\n    ]\n\n    # Docling-supported/compatible extensions; TEXT_FILE_TYPES are supported by the base loader.\n    VALID_EXTENSIONS = [\n        *TEXT_EXTENSIONS,\n        *DOCLING_ONLY_EXTENSIONS,\n    ]\n\n    # Fixed export settings used when markdown export is requested.\n    EXPORT_FORMAT = \"Markdown\"\n    IMAGE_MODE = \"placeholder\"\n\n    _base_inputs = deepcopy(BaseFileComponent.get_base_inputs())\n\n    for input_item in _base_inputs:\n        if isinstance(input_item, FileInput) and input_item.name == \"path\":\n            input_item.real_time_refresh = True\n            input_item.tool_mode = False  # Disable tool mode for file upload input\n            input_item.required = False  # Make it optional so it doesn't error in tool mode\n            break\n\n    inputs = [\n        *_base_inputs,\n        StrInput(\n            name=\"file_path_str\",\n            display_name=\"File Path\",\n            info=(\n                \"Path to the file to read. Used when component is called as a tool. \"\n                \"If not provided, will use the uploaded file from 'path' input.\"\n            ),\n            show=False,\n            advanced=True,\n            tool_mode=True,  # Required for Toolset toggle, but _get_tools() ignores this parameter\n            required=False,\n        ),\n        BoolInput(\n            name=\"advanced_mode\",\n            display_name=\"Advanced Parser\",\n            value=False,\n            real_time_refresh=True,\n            info=(\n                \"Enable advanced document processing and export with Docling for PDFs, images, and office documents. \"\n                \"Note that advanced document processing can consume significant resources.\"\n            ),\n            show=True,\n        ),\n        DropdownInput(\n            name=\"pipeline\",\n            display_name=\"Pipeline\",\n            info=\"Docling pipeline to use\",\n            options=[\"standard\", \"vlm\"],\n            value=\"standard\",\n            advanced=True,\n            real_time_refresh=True,\n        ),\n        DropdownInput(\n            name=\"ocr_engine\",\n            display_name=\"OCR Engine\",\n            info=\"OCR engine to use. Only available when pipeline is set to 'standard'.\",\n            options=[\"None\", \"easyocr\"],\n            value=\"easyocr\",\n            show=False,\n            advanced=True,\n        ),\n        StrInput(\n            name=\"md_image_placeholder\",\n            display_name=\"Image placeholder\",\n            info=\"Specify the image placeholder for markdown exports.\",\n            value=\"<!-- image -->\",\n            advanced=True,\n            show=False,\n        ),\n        StrInput(\n            name=\"md_page_break_placeholder\",\n            display_name=\"Page break placeholder\",\n            info=\"Add this placeholder between pages in the markdown output.\",\n            value=\"\",\n            advanced=True,\n            show=False,\n        ),\n        MessageTextInput(\n            name=\"doc_key\",\n            display_name=\"Doc Key\",\n            info=\"The key to use for the DoclingDocument column.\",\n            value=\"doc\",\n            advanced=True,\n            show=False,\n        ),\n        # Deprecated input retained for backward-compatibility.\n        BoolInput(\n            name=\"use_multithreading\",\n            display_name=\"[Deprecated] Use Multithreading\",\n            advanced=True,\n            value=True,\n            info=\"Set 'Processing Concurrency' greater than 1 to enable multithreading.\",\n        ),\n        IntInput(\n            name=\"concurrency_multithreading\",\n            display_name=\"Processing Concurrency\",\n            advanced=True,\n            info=\"When multiple files are being processed, the number of files to process concurrently.\",\n            value=1,\n        ),\n        BoolInput(\n            name=\"markdown\",\n            display_name=\"Markdown Export\",\n            info=\"Export processed documents to Markdown format. Only available when advanced mode is enabled.\",\n            value=False,\n            show=False,\n        ),\n    ]\n\n    outputs = [\n        Output(display_name=\"Raw Content\", name=\"message\", method=\"load_files_message\", tool_mode=True),\n    ]\n\n    # ------------------------------ Tool description with file names --------------\n\n    def get_tool_description(self) -> str:\n        \"\"\"Return a dynamic description that includes the names of uploaded files.\n\n        This helps the Agent understand which files are available to read.\n        \"\"\"\n        base_description = \"Loads and returns the content from uploaded files.\"\n\n        # Get the list of uploaded file paths\n        file_paths = getattr(self, \"path\", None)\n        if not file_paths:\n            return base_description\n\n        # Ensure it's a list\n        if not isinstance(file_paths, list):\n            file_paths = [file_paths]\n\n        # Extract just the file names from the paths\n        file_names = []\n        for fp in file_paths:\n            if fp:\n                name = Path(fp).name\n                file_names.append(name)\n\n        if file_names:\n            files_str = \", \".join(file_names)\n            return f\"{base_description} Available files: {files_str}. Call this tool to read these files.\"\n\n        return base_description\n\n    @property\n    def description(self) -> str:\n        \"\"\"Dynamic description property that includes uploaded file names.\"\"\"\n        return self.get_tool_description()\n\n    async def _get_tools(self) -> list:\n        \"\"\"Override to create a tool without parameters.\n\n        The Read File component should use the files already uploaded via UI,\n        not accept file paths from the Agent (which wouldn't know the internal paths).\n        \"\"\"\n        from langchain_core.tools import StructuredTool\n        from pydantic import BaseModel\n\n        # Empty schema - no parameters needed\n        class EmptySchema(BaseModel):\n            \"\"\"No parameters required - uses pre-uploaded files.\"\"\"\n\n        async def read_files_tool() -> str:\n            \"\"\"Read the content of uploaded files.\"\"\"\n            try:\n                result = self.load_files_message()\n                if hasattr(result, \"get_text\"):\n                    return result.get_text()\n                if hasattr(result, \"text\"):\n                    return result.text\n                return str(result)\n            except (FileNotFoundError, ValueError, OSError, RuntimeError) as e:\n                return f\"Error reading files: {e}\"\n\n        description = self.get_tool_description()\n\n        tool = StructuredTool(\n            name=\"load_files_message\",\n            description=description,\n            coroutine=read_files_tool,\n            args_schema=EmptySchema,\n            handle_tool_error=True,\n            tags=[\"load_files_message\"],\n            metadata={\n                \"display_name\": \"Read File\",\n                \"display_description\": description,\n            },\n        )\n\n        return [tool]\n\n    # ------------------------------ UI helpers --------------------------------------\n\n    def _path_value(self, template: dict) -> list[str]:\n        \"\"\"Return the list of currently selected file paths from the template.\"\"\"\n        return template.get(\"path\", {}).get(\"file_path\", [])\n\n    def update_build_config(\n        self,\n        build_config: dict[str, Any],\n        field_value: Any,\n        field_name: str | None = None,\n    ) -> dict[str, Any]:\n        \"\"\"Show/hide Advanced Parser and related fields based on selection context.\"\"\"\n        if field_name == \"path\":\n            paths = self._path_value(build_config)\n\n            # If all files can be processed by docling, do so\n            allow_advanced = all(not file_path.endswith((\".csv\", \".xlsx\", \".parquet\")) for file_path in paths)\n            build_config[\"advanced_mode\"][\"show\"] = allow_advanced\n            if not allow_advanced:\n                build_config[\"advanced_mode\"][\"value\"] = False\n                for f in (\"pipeline\", \"ocr_engine\", \"doc_key\", \"md_image_placeholder\", \"md_page_break_placeholder\"):\n                    if f in build_config:\n                        build_config[f][\"show\"] = False\n\n        # Docling Processing\n        elif field_name == \"advanced_mode\":\n            for f in (\"pipeline\", \"ocr_engine\", \"doc_key\", \"md_image_placeholder\", \"md_page_break_placeholder\"):\n                if f in build_config:\n                    build_config[f][\"show\"] = bool(field_value)\n                    if f == \"pipeline\":\n                        build_config[f][\"advanced\"] = not bool(field_value)\n\n        elif field_name == \"pipeline\":\n            if field_value == \"standard\":\n                build_config[\"ocr_engine\"][\"show\"] = True\n                build_config[\"ocr_engine\"][\"value\"] = \"easyocr\"\n            else:\n                build_config[\"ocr_engine\"][\"show\"] = False\n                build_config[\"ocr_engine\"][\"value\"] = \"None\"\n\n        return build_config\n\n    def update_outputs(self, frontend_node: dict[str, Any], field_name: str, field_value: Any) -> dict[str, Any]:  # noqa: ARG002\n        \"\"\"Dynamically show outputs based on file count/type and advanced mode.\"\"\"\n        if field_name not in [\"path\", \"advanced_mode\", \"pipeline\"]:\n            return frontend_node\n\n        template = frontend_node.get(\"template\", {})\n        paths = self._path_value(template)\n        if not paths:\n            return frontend_node\n\n        frontend_node[\"outputs\"] = []\n        if len(paths) == 1:\n            file_path = paths[0] if field_name == \"path\" else frontend_node[\"template\"][\"path\"][\"file_path\"][0]\n            if file_path.endswith((\".csv\", \".xlsx\", \".parquet\")):\n                frontend_node[\"outputs\"].append(\n                    Output(\n                        display_name=\"Structured Content\",\n                        name=\"dataframe\",\n                        method=\"load_files_structured\",\n                        tool_mode=True,\n                    ),\n                )\n            elif file_path.endswith(\".json\"):\n                frontend_node[\"outputs\"].append(\n                    Output(display_name=\"Structured Content\", name=\"json\", method=\"load_files_json\", tool_mode=True),\n                )\n\n            advanced_mode = frontend_node.get(\"template\", {}).get(\"advanced_mode\", {}).get(\"value\", False)\n            if advanced_mode:\n                frontend_node[\"outputs\"].append(\n                    Output(\n                        display_name=\"Structured Output\",\n                        name=\"advanced_dataframe\",\n                        method=\"load_files_dataframe\",\n                        tool_mode=True,\n                    ),\n                )\n                frontend_node[\"outputs\"].append(\n                    Output(\n                        display_name=\"Markdown\", name=\"advanced_markdown\", method=\"load_files_markdown\", tool_mode=True\n                    ),\n                )\n                frontend_node[\"outputs\"].append(\n                    Output(display_name=\"File Path\", name=\"path\", method=\"load_files_path\", tool_mode=True),\n                )\n            else:\n                frontend_node[\"outputs\"].append(\n                    Output(display_name=\"Raw Content\", name=\"message\", method=\"load_files_message\", tool_mode=True),\n                )\n                frontend_node[\"outputs\"].append(\n                    Output(display_name=\"File Path\", name=\"path\", method=\"load_files_path\", tool_mode=True),\n                )\n        else:\n            # Multiple files => DataFrame output; advanced parser disabled\n            frontend_node[\"outputs\"].append(\n                Output(display_name=\"Files\", name=\"dataframe\", method=\"load_files\", tool_mode=True)\n            )\n\n        return frontend_node\n\n    # ------------------------------ Core processing ----------------------------------\n\n    def _validate_and_resolve_paths(self) -> list[BaseFileComponent.BaseFile]:\n        \"\"\"Override to handle file_path_str input from tool mode.\n\n        When called as a tool, the file_path_str parameter can be set.\n        If not provided, it will fall back to using the path FileInput (uploaded file).\n        Priority:\n        1. file_path_str (if provided by the tool call)\n        2. path (uploaded file from UI)\n        \"\"\"\n        # Check if file_path_str is provided (from tool mode)\n        file_path_str = getattr(self, \"file_path_str\", None)\n        if file_path_str:\n            # Use the string path from tool mode\n            from pathlib import Path\n\n            from lfx.schema.data import Data\n\n            resolved_path = Path(self.resolve_path(file_path_str))\n            if not resolved_path.exists():\n                msg = f\"File or directory not found: {file_path_str}\"\n                self.log(msg)\n                if not self.silent_errors:\n                    raise ValueError(msg)\n                return []\n\n            data_obj = Data(data={self.SERVER_FILE_PATH_FIELDNAME: str(resolved_path)})\n            return [BaseFileComponent.BaseFile(data_obj, resolved_path, delete_after_processing=False)]\n\n        # Otherwise use the default implementation (uses path FileInput)\n        return super()._validate_and_resolve_paths()\n\n    def _is_docling_compatible(self, file_path: str) -> bool:\n        \"\"\"Lightweight extension gate for Docling-compatible types.\"\"\"\n        docling_exts = (\n            \".adoc\",\n            \".asciidoc\",\n            \".asc\",\n            \".bmp\",\n            \".csv\",\n            \".dotx\",\n            \".dotm\",\n            \".docm\",\n            \".docx\",\n            \".htm\",\n            \".html\",\n            \".jpg\",\n            \".jpeg\",\n            \".json\",\n            \".md\",\n            \".pdf\",\n            \".png\",\n            \".potx\",\n            \".ppsx\",\n            \".pptm\",\n            \".potm\",\n            \".ppsm\",\n            \".pptx\",\n            \".tiff\",\n            \".txt\",\n            \".xls\",\n            \".xlsx\",\n            \".xhtml\",\n            \".xml\",\n            \".webp\",\n        )\n        return file_path.lower().endswith(docling_exts)\n\n    async def _get_local_file_for_docling(self, file_path: str) -> tuple[str, bool]:\n        \"\"\"Get a local file path for Docling processing, downloading from S3 if needed.\n\n        Args:\n            file_path: Either a local path or S3 key (format \"flow_id/filename\")\n\n        Returns:\n            tuple[str, bool]: (local_path, should_delete) where should_delete indicates\n                              if this is a temporary file that should be cleaned up\n        \"\"\"\n        settings = get_settings_service().settings\n        if settings.storage_type == \"local\":\n            return file_path, False\n\n        # S3 storage - download to temp file\n        parsed = parse_storage_path(file_path)\n        if not parsed:\n            msg = f\"Invalid S3 path format: {file_path}. Expected 'flow_id/filename'\"\n            raise ValueError(msg)\n\n        storage_service = get_storage_service()\n        flow_id, filename = parsed\n\n        # Get file content from S3\n        content = await storage_service.get_file(flow_id, filename)\n\n        suffix = Path(filename).suffix\n        with NamedTemporaryFile(mode=\"wb\", suffix=suffix, delete=False) as tmp_file:\n            tmp_file.write(content)\n            temp_path = tmp_file.name\n\n        return temp_path, True\n\n    def _process_docling_in_subprocess(self, file_path: str) -> Data | None:\n        \"\"\"Run Docling in a separate OS process and map the result to a Data object.\n\n        We avoid multiprocessing pickling by launching `python -c \"<script>\"` and\n        passing JSON config via stdin. The child prints a JSON result to stdout.\n\n        For S3 storage, the file is downloaded to a temp file first.\n        \"\"\"\n        if not file_path:\n            return None\n\n        settings = get_settings_service().settings\n        if settings.storage_type == \"s3\":\n            local_path, should_delete = run_until_complete(self._get_local_file_for_docling(file_path))\n        else:\n            local_path = file_path\n            should_delete = False\n\n        try:\n            return self._process_docling_subprocess_impl(local_path, file_path)\n        finally:\n            # Clean up temp file if we created one\n            if should_delete:\n                with contextlib.suppress(Exception):\n                    Path(local_path).unlink()  # Ignore cleanup errors\n\n    def _process_docling_subprocess_impl(self, local_file_path: str, original_file_path: str) -> Data | None:\n        \"\"\"Implementation of Docling subprocess processing.\n\n        Args:\n            local_file_path: Path to local file to process\n            original_file_path: Original file path to include in metadata\n        Returns:\n            Data object with processed content\n        \"\"\"\n        args: dict[str, Any] = {\n            \"file_path\": local_file_path,\n            \"markdown\": bool(self.markdown),\n            \"image_mode\": str(self.IMAGE_MODE),\n            \"md_image_placeholder\": str(self.md_image_placeholder),\n            \"md_page_break_placeholder\": str(self.md_page_break_placeholder),\n            \"pipeline\": str(self.pipeline),\n            \"ocr_engine\": (\n                self.ocr_engine if self.ocr_engine and self.ocr_engine != \"None\" and self.pipeline != \"vlm\" else None\n            ),\n        }\n\n        self.log(f\"Starting Docling subprocess for file: {local_file_path}\")\n        self.log(args)\n\n        # Child script for isolating the docling processing\n        child_script = textwrap.dedent(\n            r\"\"\"\n            import json, sys\n\n            def try_imports():\n                try:\n                    from docling.datamodel.base_models import ConversionStatus, InputFormat  # type: ignore\n                    from docling.document_converter import DocumentConverter  # type: ignore\n                    from docling_core.types.doc import ImageRefMode  # type: ignore\n                    return ConversionStatus, InputFormat, DocumentConverter, ImageRefMode, \"latest\"\n                except Exception as e:\n                    raise e\n\n            def create_converter(strategy, input_format, DocumentConverter, pipeline, ocr_engine):\n                # --- Standard PDF/IMAGE pipeline (your existing behavior), with optional OCR ---\n                if pipeline == \"standard\":\n                    try:\n                        from docling.datamodel.pipeline_options import PdfPipelineOptions  # type: ignore\n                        from docling.document_converter import PdfFormatOption  # type: ignore\n\n                        pipe = PdfPipelineOptions()\n                        pipe.do_ocr = False\n\n                        if ocr_engine:\n                            try:\n                                from docling.models.factories import get_ocr_factory  # type: ignore\n                                pipe.do_ocr = True\n                                fac = get_ocr_factory(allow_external_plugins=False)\n                                pipe.ocr_options = fac.create_options(kind=ocr_engine)\n                            except Exception:\n                                # If OCR setup fails, disable it\n                                pipe.do_ocr = False\n\n                        fmt = {}\n                        if hasattr(input_format, \"PDF\"):\n                            fmt[getattr(input_format, \"PDF\")] = PdfFormatOption(pipeline_options=pipe)\n                        if hasattr(input_format, \"IMAGE\"):\n                            fmt[getattr(input_format, \"IMAGE\")] = PdfFormatOption(pipeline_options=pipe)\n\n                        return DocumentConverter(format_options=fmt)\n                    except Exception:\n                        return DocumentConverter()\n\n                # --- Vision-Language Model (VLM) pipeline ---\n                if pipeline == \"vlm\":\n                    try:\n                        from docling.datamodel.pipeline_options import VlmPipelineOptions\n                        from docling.datamodel.vlm_model_specs import GRANITEDOCLING_MLX, GRANITEDOCLING_TRANSFORMERS\n                        from docling.document_converter import PdfFormatOption\n                        from docling.pipeline.vlm_pipeline import VlmPipeline\n\n                        vl_pipe = VlmPipelineOptions(\n                            vlm_options=GRANITEDOCLING_TRANSFORMERS,\n                        )\n\n                        if sys.platform == \"darwin\":\n                            try:\n                                import mlx_vlm\n                                vl_pipe.vlm_options = GRANITEDOCLING_MLX\n                            except ImportError as e:\n                                raise e\n\n                        # VLM paths generally don't need OCR; keep OCR off by default here.\n                        fmt = {}\n                        if hasattr(input_format, \"PDF\"):\n                            fmt[getattr(input_format, \"PDF\")] = PdfFormatOption(\n                            pipeline_cls=VlmPipeline,\n                            pipeline_options=vl_pipe\n                        )\n                        if hasattr(input_format, \"IMAGE\"):\n                            fmt[getattr(input_format, \"IMAGE\")] = PdfFormatOption(\n                            pipeline_cls=VlmPipeline,\n                            pipeline_options=vl_pipe\n                        )\n\n                        return DocumentConverter(format_options=fmt)\n                    except Exception as e:\n                        raise e\n\n                # --- Fallback: default converter with no special options ---\n                return DocumentConverter()\n\n            def export_markdown(document, ImageRefMode, image_mode, img_ph, pg_ph):\n                try:\n                    mode = getattr(ImageRefMode, image_mode.upper(), image_mode)\n                    return document.export_to_markdown(\n                        image_mode=mode,\n                        image_placeholder=img_ph,\n                        page_break_placeholder=pg_ph,\n                    )\n                except Exception:\n                    try:\n                        return document.export_to_text()\n                    except Exception:\n                        return str(document)\n\n            def to_rows(doc_dict):\n                rows = []\n                for t in doc_dict.get(\"texts\", []):\n                    prov = t.get(\"prov\") or []\n                    page_no = None\n                    if prov and isinstance(prov, list) and isinstance(prov[0], dict):\n                        page_no = prov[0].get(\"page_no\")\n                    rows.append({\n                        \"page_no\": page_no,\n                        \"label\": t.get(\"label\"),\n                        \"text\": t.get(\"text\"),\n                        \"level\": t.get(\"level\"),\n                    })\n                return rows\n\n            def main():\n                cfg = json.loads(sys.stdin.read())\n                file_path = cfg[\"file_path\"]\n                markdown = cfg[\"markdown\"]\n                image_mode = cfg[\"image_mode\"]\n                img_ph = cfg[\"md_image_placeholder\"]\n                pg_ph = cfg[\"md_page_break_placeholder\"]\n                pipeline = cfg[\"pipeline\"]\n                ocr_engine = cfg.get(\"ocr_engine\")\n                meta = {\"file_path\": file_path}\n\n                try:\n                    ConversionStatus, InputFormat, DocumentConverter, ImageRefMode, strategy = try_imports()\n                    converter = create_converter(strategy, InputFormat, DocumentConverter, pipeline, ocr_engine)\n                    try:\n                        res = converter.convert(file_path)\n                    except Exception as e:\n                        print(json.dumps({\"ok\": False, \"error\": f\"Docling conversion error: {e}\", \"meta\": meta}))\n                        return\n\n                    ok = False\n                    if hasattr(res, \"status\"):\n                        try:\n                            ok = (res.status == ConversionStatus.SUCCESS) or (str(res.status).lower() == \"success\")\n                        except Exception:\n                            ok = (str(res.status).lower() == \"success\")\n                    if not ok and hasattr(res, \"document\"):\n                        ok = getattr(res, \"document\", None) is not None\n                    if not ok:\n                        print(json.dumps({\"ok\": False, \"error\": \"Docling conversion failed\", \"meta\": meta}))\n                        return\n\n                    doc = getattr(res, \"document\", None)\n                    if doc is None:\n                        print(json.dumps({\"ok\": False, \"error\": \"Docling produced no document\", \"meta\": meta}))\n                        return\n\n                    if markdown:\n                        text = export_markdown(doc, ImageRefMode, image_mode, img_ph, pg_ph)\n                        print(json.dumps({\"ok\": True, \"mode\": \"markdown\", \"text\": text, \"meta\": meta}))\n                        return\n\n                    # structured\n                    try:\n                        doc_dict = doc.export_to_dict()\n                    except Exception as e:\n                        print(json.dumps({\"ok\": False, \"error\": f\"Docling export_to_dict failed: {e}\", \"meta\": meta}))\n                        return\n\n                    rows = to_rows(doc_dict)\n                    print(json.dumps({\"ok\": True, \"mode\": \"structured\", \"doc\": rows, \"meta\": meta}))\n                except Exception as e:\n                    print(\n                        json.dumps({\n                            \"ok\": False,\n                            \"error\": f\"Docling processing error: {e}\",\n                            \"meta\": {\"file_path\": file_path},\n                        })\n                    )\n\n            if __name__ == \"__main__\":\n                main()\n            \"\"\"\n        )\n\n        # Validate file_path to avoid command injection or unsafe input\n        if not isinstance(args[\"file_path\"], str) or any(c in args[\"file_path\"] for c in [\";\", \"|\", \"&\", \"$\", \"`\"]):\n            return Data(data={\"error\": \"Unsafe file path detected.\", \"file_path\": args[\"file_path\"]})\n\n        proc = subprocess.run(  # noqa: S603\n            [sys.executable, \"-u\", \"-c\", child_script],\n            input=json.dumps(args).encode(\"utf-8\"),\n            capture_output=True,\n            check=False,\n        )\n\n        if not proc.stdout:\n            err_msg = proc.stderr.decode(\"utf-8\", errors=\"replace\") or \"no output from child process\"\n            return Data(data={\"error\": f\"Docling subprocess error: {err_msg}\", \"file_path\": original_file_path})\n\n        try:\n            result = json.loads(proc.stdout.decode(\"utf-8\"))\n        except Exception as e:  # noqa: BLE001\n            err_msg = proc.stderr.decode(\"utf-8\", errors=\"replace\")\n            return Data(\n                data={\n                    \"error\": f\"Invalid JSON from Docling subprocess: {e}. stderr={err_msg}\",\n                    \"file_path\": original_file_path,\n                },\n            )\n\n        if not result.get(\"ok\"):\n            return Data(data={\"error\": result.get(\"error\", \"Unknown Docling error\"), **result.get(\"meta\", {})})\n\n        meta = result.get(\"meta\", {})\n        if result.get(\"mode\") == \"markdown\":\n            exported_content = str(result.get(\"text\", \"\"))\n            return Data(\n                text=exported_content,\n                data={\"exported_content\": exported_content, \"export_format\": self.EXPORT_FORMAT, **meta},\n            )\n\n        rows = list(result.get(\"doc\", []))\n        return Data(data={\"doc\": rows, \"export_format\": self.EXPORT_FORMAT, **meta})\n\n    def process_files(\n        self,\n        file_list: list[BaseFileComponent.BaseFile],\n    ) -> list[BaseFileComponent.BaseFile]:\n        \"\"\"Process input files.\n\n        - advanced_mode => Docling in a separate process.\n        - Otherwise => standard parsing in current process (optionally threaded).\n        \"\"\"\n        if not file_list:\n            msg = \"No files to process.\"\n            raise ValueError(msg)\n\n        # Validate that files requiring Docling are only processed when advanced mode is enabled\n        if not self.advanced_mode:\n            for file in file_list:\n                extension = file.path.suffix[1:].lower()\n                if extension in self.DOCLING_ONLY_EXTENSIONS:\n                    msg = (\n                        f\"File '{file.path.name}' has extension '.{extension}' which requires \"\n                        f\"Advanced Parser mode. Please enable 'Advanced Parser' to process this file.\"\n                    )\n                    self.log(msg)\n                    raise ValueError(msg)\n\n        def process_file_standard(file_path: str, *, silent_errors: bool = False) -> Data | None:\n            try:\n                return parse_text_file_to_data(file_path, silent_errors=silent_errors)\n            except FileNotFoundError as e:\n                self.log(f\"File not found: {file_path}. Error: {e}\")\n                if not silent_errors:\n                    raise\n                return None\n            except Exception as e:\n                self.log(f\"Unexpected error processing {file_path}: {e}\")\n                if not silent_errors:\n                    raise\n                return None\n\n        docling_compatible = all(self._is_docling_compatible(str(f.path)) for f in file_list)\n\n        # Advanced path: Check if ALL files are compatible with Docling\n        if self.advanced_mode and docling_compatible:\n            final_return: list[BaseFileComponent.BaseFile] = []\n            for file in file_list:\n                file_path = str(file.path)\n                advanced_data: Data | None = self._process_docling_in_subprocess(file_path)\n\n                # --- UNNEST: expand each element in `doc` to its own Data row\n                payload = getattr(advanced_data, \"data\", {}) or {}\n                doc_rows = payload.get(\"doc\")\n                if isinstance(doc_rows, list):\n                    rows: list[Data | None] = [\n                        Data(\n                            data={\n                                \"file_path\": file_path,\n                                **(item if isinstance(item, dict) else {\"value\": item}),\n                            },\n                        )\n                        for item in doc_rows\n                    ]\n                    final_return.extend(self.rollup_data(file_list, rows))\n                else:\n                    # If not structured, keep as-is (e.g., markdown export or error dict)\n                    final_return.extend(self.rollup_data(file_list, [advanced_data]))\n            return final_return\n\n        # Standard multi-file (or single non-advanced) path\n        concurrency = 1 if not self.use_multithreading else max(1, self.concurrency_multithreading)\n\n        file_paths = [str(f.path) for f in file_list]\n        self.log(f\"Starting parallel processing of {len(file_paths)} files with concurrency: {concurrency}.\")\n        my_data = parallel_load_data(\n            file_paths,\n            silent_errors=self.silent_errors,\n            load_function=process_file_standard,\n            max_concurrency=concurrency,\n        )\n        return self.rollup_data(file_list, my_data)\n\n    # ------------------------------ Output helpers -----------------------------------\n\n    def load_files_helper(self) -> DataFrame:\n        result = self.load_files()\n\n        # Error condition - raise error if no text and an error is present\n        if not hasattr(result, \"text\"):\n            if hasattr(result, \"error\"):\n                raise ValueError(result.error[0])\n            msg = \"Could not extract content from the provided file(s).\"\n            raise ValueError(msg)\n\n        return result\n\n    def load_files_dataframe(self) -> DataFrame:\n        \"\"\"Load files using advanced Docling processing and export to DataFrame format.\"\"\"\n        self.markdown = False\n        return self.load_files_helper()\n\n    def load_files_markdown(self) -> Message:\n        \"\"\"Load files using advanced Docling processing and export to Markdown format.\"\"\"\n        self.markdown = True\n        result = self.load_files_helper()\n        return Message(text=str(result.text[0]))\n"
              },


⚠️ Potential issue | 🟠 Major

Subprocess command injection prevention looks solid, but verify error path handling.

The FileComponent code validates args["file_path"] against suspicious characters before passing to subprocess. However:

The validation only checks for [";", "|", "&", "$", ""]` but other shell metacharacters may exist

Error messages from stderr could leak file paths or internal details if not careful

The JSON parsing error path could expose subprocess stderr verbatim

Consider additional hardening on error reporting or use shlex.quote() to guarantee shell safety.

Consider this improvement in subprocess error handling:

# In _process_docling_subprocess_impl(), replace the current error handling with: if not proc.stdout: # Sanitize stderr output before logging err_msg = proc.stderr.decode("utf-8", errors="replace") if proc.stderr else "no output" # Limit stderr length to prevent log injection err_msg = err_msg[:200] if len(err_msg) > 200 else err_msg return Data(data={ "error": f"Docling subprocess error (see logs)", "file_path": original_file_path })

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json around lines 1286-1287, the Docling subprocess error handling returns raw stderr/JSON parse errors (and only blocks a small set of metacharacters), which can leak file paths or internal details; modify _process_docling_subprocess_impl to (1) strengthen validation by resolving and restricting file_path to a local absolute path or known S3 pattern (avoid relying on a small blacklist), (2) never include raw stderr or untrusted subprocess output in the returned Data object — replace it with a generic error message such as "Docling subprocess error (see server logs)" and the original_file_path only, and (3) log the full stderr server-side after sanitizing/truncating it (e.g., limit to 200-500 chars and remove control/newline injection) for debugging so callers do not receive untrusted details.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for run_until_complete usage to check for event loop conflicts rg -n "run_until_complete" --type py src/backend/base/langflow/ -A 3 -B 3

Repository: langflow-ai/langflow

Length of output: 3892

🏁 Script executed:

#!/bin/bash # Search for run_until_complete implementation in lfx package find . -name "async_helpers.py" -type f | head -5

Repository: langflow-ai/langflow

Length of output: 104

🏁 Script executed:

#!/bin/bash # Read the async_helpers.py file to understand run_until_complete implementation cat ./src/lfx/src/lfx/utils/async_helpers.py

Repository: langflow-ai/langflow

Length of output: 1416

🏁 Script executed:

#!/bin/bash # Let's examine the structure of the JSON file and extract the relevant method # First, check if the file exists and its size wc -l "src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json"

Repository: langflow-ai/langflow

Length of output: 143

🏁 Script executed:

#!/bin/bash # Read the area around lines 1286-1287 and the surrounding code sed -n '1250,1400p' "src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json" | head -150

Repository: langflow-ai/langflow

Length of output: 41998

🏁 Script executed:

#!/bin/bash # Let's search for any other temp file handling in the codebase to understand the pattern rg -n "NamedTemporaryFile" --type py -B 2 -A 5 | head -50

Repository: langflow-ai/langflow

Length of output: 3804

Temp file cleanup may leak if S3 retrieval or file write fails mid-operation.

In _process_docling_in_subprocess(), the cleanup logic in the finally block only executes if local_path is successfully assigned. If an exception occurs in _get_local_file_for_docling() after the temp file is created (line with NamedTemporaryFile) but before the return statement—for example, during S3 content retrieval or file write—the temp file will be created but never cleaned up because the exception propagates before reaching the try-finally block.

The run_until_complete() implementation itself is sound; it properly detects running event loops and handles them via threading, so the async context concern mentioned in the PR title doesn't apply here.

Suggested fix: Wrap temp file creation in _get_local_file_for_docling() with its own try-finally to ensure cleanup on any exception path, or catch and re-raise exceptions from _get_local_file_for_docling() with cleanup logic in _process_docling_in_subprocess().

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Document Q&A.json around lines 1286-1287, the temp file created in _get_local_file_for_docling() can leak if an exception occurs after NamedTemporaryFile is created but before the function returns; fix by ensuring the temp file is removed on any exception: either (a) delay creating the NamedTemporaryFile until after content is successfully retrieved (write content to it only after download succeeds), or (b) wrap the NamedTemporaryFile creation/write in a try/finally that unlinks the temp file on exception before re-raising; alternatively, handle exceptions in _process_docling_in_subprocess() by catching errors from _get_local_file_for_docling(), and if a temp path was created (should_delete True or local_path set), attempt to unlink it before propagating the exception.

coderabbitai · 2025-11-28T13:14:55Z

src/lfx/tests/unit/cli/test_simple_agent_integration.py

+        if output_json.get("success"):
+            result_text = str(output_json.get("result", ""))
+            # The agent should compute 15 * 7 = 105
+            assert "105" in result_text, f"Expected 105 in result: {result_text}"
+


⚠️ Potential issue | 🟡 Minor

Test silently passes if execution is not successful.

The conditional if output_json.get("success"): means the test passes without any assertion if success is False or missing. This could mask failures where the agent doesn't execute properly.

Consider failing when success is not True:

- if output_json.get("success"): - result_text = str(output_json.get("result", "")) - # The agent should compute 15 * 7 = 105 - assert "105" in result_text, f"Expected 105 in result: {result_text}" + assert output_json.get("success") is True, f"Execution failed: {output_json}" + result_text = str(output_json.get("result", "")) + # The agent should compute 15 * 7 = 105 + assert "105" in result_text, f"Expected 105 in result: {result_text}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if output_json.get("success"):

result_text = str(output_json.get("result", ""))

# The agent should compute 15 * 7 = 105

assert "105" in result_text, f"Expected 105 in result: {result_text}"

assert output_json.get("success") is True, f"Execution failed: {output_json}"

result_text = str(output_json.get("result", ""))

# The agent should compute 15 * 7 = 105

assert "105" in result_text, f"Expected 105 in result: {result_text}"

🤖 Prompt for AI Agents

In src/lfx/tests/unit/cli/test_simple_agent_integration.py around lines 247 to 251, the test currently wraps the result assertion in an if output_json.get("success"): guard which allows the test to silently pass when success is False or missing; replace that guard with an explicit assertion that output_json.get("success") is True (with a helpful message including output_json) and then assert the result contains "105" as before so the test fails loudly when execution did not succeed.

Copilot

Pull request overview

This pull request fixes a critical asyncio context handling issue in the serve command and introduces comprehensive integration tests for the Simple Agent workflow. The main fix addresses the error "asyncio.run() cannot be called from a running event loop" by using uvicorn.Server with await server.serve() instead of uvicorn.run(), which internally calls asyncio.run() and conflicts with the existing async context managed by the syncify decorator.

Key Changes

Fixed async context handling in the serve command to properly start uvicorn server without asyncio conflicts
Added integration test suite for Simple Agent workflow including load, execution, and serve tests
Added integration dependency group with langchain-openai, langchain-community, beautifulsoup4, and lxml

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 9 comments.

File	Description
`src/lfx/src/lfx/cli/commands.py`	Updated serve_command to use uvicorn.Server with await server.serve() instead of uvicorn.run() to fix asyncio event loop conflicts
`src/lfx/tests/unit/cli/test_simple_agent_integration.py`	Added comprehensive integration tests for Simple Agent flow including loading, execution with real API, and regression test for serve command asyncio error
`src/lfx/pyproject.toml`	Added integration dependency group with langchain-openai, langchain-community, beautifulsoup4, and lxml for running integration tests
`uv.lock`	Updated lock file to reflect new integration dependencies and increment revision number

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copilot · 2025-11-28T16:47:13Z

src/lfx/tests/integration/cli/test_simple_agent_integration.py

@@ -0,0 +1,412 @@
+"""Integration tests for lfx CLI with Simple Agent flow.


[nitpick] The test file is located in the unit/ directory but contains integration tests. According to the codebase convention, integration tests that require external dependencies and real API calls should be placed in a separate directory structure or clearly marked. Consider moving this to src/lfx/tests/integration/cli/test_simple_agent_integration.py or renaming to clearly indicate it's not a unit test (e.g., test_simple_agent_e2e.py).

Copilot · 2025-11-28T16:47:13Z

src/lfx/tests/unit/cli/test_simple_agent_integration.py

+        except json.JSONDecodeError:
+            lines = output.split("\n")
+            for line in reversed(lines):
+                try:
+                    output_json = json.loads(line)
+                    break
+                except json.JSONDecodeError:
+                    continue
+            else:
+                pytest.fail(f"No valid JSON:\nstdout: {result.stdout}\nstderr: {result.stderr}")


In test_simple_agent_executes_successfully, the error handling logic for parsing JSON from output is duplicated in multiple test methods (lines 193-202, 237-245). This code duplication reduces maintainability. Consider extracting this JSON parsing logic into a helper function that can be reused across tests.

src/lfx/tests/unit/cli/test_simple_agent_integration.py

src/lfx/src/lfx/cli/commands.py

src/lfx/tests/integration/cli/test_simple_agent_integration.py

src/lfx/tests/unit/cli/test_simple_agent_integration.py

src/lfx/tests/integration/cli/test_simple_agent_integration.py

src/lfx/tests/unit/cli/test_simple_agent_integration.py

…dling

Add comprehensive integration tests for lfx CLI commands with the Simple Agent starter project: - TestSimpleAgentFlowLoading: Tests for flow loading via CLI and directly - TestSimpleAgentExecution: Tests for successful execution with real API key - TestSimpleAgentServe: Tests for serve command including regression test for asyncio error fix - TestAllStarterProjectsLoad: Tests all starter projects load without errors The regression test specifically checks for the asyncio.run() error that occurred when lfx serve was called from a running event loop.

…e Agent tests

…ath function

ogabrielluiz requested review from Cristhianzl and andifilhohub November 28, 2025 13:00

github-actions bot added the community Pull Request from an external contributor label Nov 28, 2025

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 28, 2025

coderabbitai bot reviewed Nov 28, 2025

View reviewed changes

ogabrielluiz requested review from Adam-Aghili and Copilot November 28, 2025 16:41

Copilot started reviewing on behalf of ogabrielluiz November 28, 2025 16:42 View session

Copilot finished reviewing on behalf of ogabrielluiz November 28, 2025 16:45

Copilot AI reviewed Nov 28, 2025

View reviewed changes

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 28, 2025

ogabrielluiz removed the community Pull Request from an external contributor label Dec 15, 2025

Adam-Aghili approved these changes Dec 15, 2025

View reviewed changes

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

ogabrielluiz added 8 commits December 15, 2025 17:15

fix: update serve_command to use uvicorn.Server for async context han…

b83fe9a

…dling

fix: add integration dependencies for Simple Agent tests

096c4cd

fix: update integration dependencies check and documentation in Simpl…

1d8a5ba

…e Agent tests

fix: update comment to clarify return value in get_starter_projects_p…

209d63d

…ath function

fix: ensure subprocess waits for termination in TestSimpleAgentServe

eb0be2d

fix: improve asyncio error handling in TestSimpleAgentServe

f0ac555

refactor: move integration tests for lfx CLI and Simple Agent flow

384651b

ogabrielluiz force-pushed the fix-lfx-serve-asyncio branch from 87c74b8 to 384651b Compare December 15, 2025 20:16

ogabrielluiz changed the base branch from main to release-1.7.0 December 15, 2025 20:16

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

merge main into fix-lfx-serve-asyncio

9a0fcaa

ogabrielluiz enabled auto-merge (squash) December 15, 2025 20:18

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

[autofix.ci] apply automated fixes

7fde31e

github-actions bot added bug Something isn't working and removed bug Something isn't working labels Dec 15, 2025

		@@ -0,0 +1,412 @@
		"""Integration tests for lfx CLI with Simple Agent flow.

fix: Fix async context handling in serve command and add integration tests #10776

Are you sure you want to change the base?

fix: Fix async context handling in serve command and add integration tests #10776

Uh oh!

Conversation

ogabrielluiz commented Nov 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Pre-merge checks failed

Uh oh!

github-actions bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

codecov bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

First, let's examine the JSON file structure and extract the code at line 1010

Check if file exists and get basic info

Use fd to find the file

Also check if the directory exists

List the starter_projects directory

Also try searching more broadly

Get repository structure

Get high-level repo structure

Check directory tree at repo root

Search for files that might contain the Docling subprocess implementation

Look for the FileComponent class with advanced_mode

Search more broadly for subprocess with docling

Search for NamedTemporaryFile which is used in the temp file cleanup

Check git status and recent changes

Check PR changes

Try a broader search for portfolio-related files

Try searching for the specific code in all files

Try the exact path from the review comment

List all files in the repository that might match

Check git staging area for this file

Search for similar subprocess patterns in the codebase

Look for existing temp file handling patterns

Create a Python script to analyze the specific concerns from the review

Analyze the concerns from the review:

1. Temp file cleanup with contextlib.suppress

2. Subprocess stderr information leaks

3. Large file handling

4. json.dumps injection vectors

Search for any JSON files containing docling configuration

Also search in all file types for the specific code patterns

Check if the file path structure actually exists in a different form

Get git info

Check if file is new

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

ogabrielluiz commented Nov 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 28, 2025 •

edited

Loading

github-actions bot commented Nov 28, 2025 •

edited

Loading

codecov bot commented Nov 28, 2025 •

edited

Loading