-
Notifications
You must be signed in to change notification settings - Fork 8.2k
fix: Resolve MCP complex schema handling and add missing dependency #10772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Resolve MCP complex schema handling and add missing dependency #10772
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes add MCP (Model Context Protocol) support with enhanced argument handling and JSON schema parsing. A new MCP dependency is introduced, utility functions handle default-filling and post-processing of tool arguments, and schema parsing is improved to handle complex structures, anyOf types, list-type fields, and provide UI-friendly fallbacks for intricate schemas. Supporting unit tests validate the new functionality across multiple scenarios. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Areas requiring extra attention:
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (9)
src/lfx/tests/unit/base/mcp/test_repro_list_type.py (1)
1-16: Consider adding stronger assertions to validate the generated model.The tests correctly verify that
create_input_schema_from_json_schemadoesn't raise an exception for list-type schemas. However, they only assert truthiness. Consider validating the actual field annotations to ensure the conversion produces expected types (e.g.,Union[str, int]orOptional[str]).def test_type_as_list(): # Schema with type as a list (JSON Schema feature) schema = {"type": "object", "properties": {"field": {"type": ["string", "integer"]}}} # This should now succeed model = create_input_schema_from_json_schema(schema) assert model + # Verify the field type is a Union + field_info = model.model_fields["field"] + assert field_info.annotation is not None def test_type_as_list_with_null(): # Schema with type as a list including null schema = {"type": "object", "properties": {"field": {"type": ["string", "null"]}}} model = create_input_schema_from_json_schema(schema) assert model + # Verify the field allows None (nullable) + field_info = model.model_fields["field"] + assert field_info.annotation is not Nonesrc/lfx/tests/unit/base/mcp/test_util_custom.py (1)
72-75: Type annotation check could produce false positives.The check
str in getattr(annotation, "__args__", [annotation])may not work correctly for all Union representations in Python's typing system. ForOptional[str](which isUnion[str, None]),__args__would be(str, NoneType), which works. However, if the implementation returns juststrdirectly, the fallback[annotation]comparison would fail sincestr in [str]uses identity/equality on the type itself.Consider using
typing.get_args()for more robust type introspection:+from typing import get_args, get_origin + def test_create_input_schema_complex(): # ... annotation = field_info.annotation - # It might be Union[str, NoneType] - assert str in getattr(annotation, "__args__", [annotation]) + # Handle both direct str and Optional[str]/Union[str, None] + args = get_args(annotation) or (annotation,) + assert str in args or annotation is strsrc/lfx/src/lfx/base/mcp/util.py (4)
306-327: Type detection using string matching is fragile.The current approach uses string comparison (e.g.,
"list" in field_type_str) which can produce false positives. For example, a field namedplaylistor a type likeSomeListWrapperwould incorrectly match. Additionally,UnionandOptionaltypes aren't handled—a field typed asOptional[str]would fall through toNoneinstead of"".Consider using
typing.get_origin()andtyping.get_args()for more robust type introspection:def _fill_defaults(arg_schema: type[BaseModel], provided_args: dict) -> None: """Fill default values for missing fields in the provided arguments.""" + from typing import get_origin, get_args + for field, field_info in arg_schema.model_fields.items(): if field not in provided_args: field_type = field_info.annotation - field_type_str = str(field_type).lower() - - if "list" in field_type_str or str(field_type) == "list": + origin = get_origin(field_type) + + # Handle Optional/Union by extracting the first non-None type + if origin is Union: + args = [a for a in get_args(field_type) if a is not type(None)] + field_type = args[0] if args else field_type + origin = get_origin(field_type) + + if origin is list or field_type is list: provided_args[field] = [] - elif "dict" in field_type_str or str(field_type) == "dict" or "object" in field_type_str: + elif origin is dict or field_type is dict: provided_args[field] = {} - elif "str" in field_type_str or str(field_type) == "str": + elif field_type is str: provided_args[field] = "" - elif "int" in field_type_str or str(field_type) == "int": + elif field_type is int: provided_args[field] = 0 - elif "float" in field_type_str or str(field_type) == "float": + elif field_type is float: provided_args[field] = 0.0 - elif "bool" in field_type_str or str(field_type) == "bool": + elif field_type is bool: provided_args[field] = False else: provided_args[field] = None
329-334: Move imports to module level for efficiency.The
contextlib,json, andtypingimports inside_post_process_argumentsare already available at module level (lines 2, 4, 11). Re-importing them on each function call adds unnecessary overhead.def _post_process_arguments(arg_schema: type[BaseModel], arguments: dict) -> None: """Post-process arguments to handle JSON parsing and type normalization.""" - import contextlib - import json - from typing import get_args, get_origin + from typing import get_args, get_origin # contextlib and json already at module levelNote:
contextlib(line 2) andjson(line 4) are already imported at module level. Onlyget_argsandget_originneed to be imported here, or they could be moved to line 11 with the existingUnionimport.
363-382: Document the specific transformation logic for list of dicts.This transformation wraps each primitive value in
{"value": v}and converts numbers to strings. This appears to be a specific API format requirement, but there's no documentation explaining why this transformation is needed or which downstream system expects this format.Add a docstring or comment explaining the purpose:
+ # Transform records to match expected API format for downstream tool. + # Each primitive value is wrapped as {"value": <string_value>} + # to support consistent handling by the receiving system. if ( isinstance(parsed_value, list) and len(parsed_value) > 0
396-397: Bare except clause catches and suppresses all exceptions.While the
# noqa: S110, BLE001comments acknowledge this, silently suppressing all exceptions fromast.literal_evalcould hide legitimate parsing errors or unexpected issues.Consider catching more specific exceptions:
- except Exception: # noqa: S110, BLE001 - pass + except (ValueError, SyntaxError, TypeError): + pass # ast.literal_eval failed, will log warning and potentially raise belowsrc/lfx/src/lfx/schema/json_schema.py (3)
10-12: Consider making complexity thresholds configurable.The magic numbers
MAX_ANYOF_ITEMS=2,MAX_OBJECT_PROPERTIES_IN_ANYOF=3, andMAX_OBJECT_PROPERTIES=5determine when schemas fall back to string input. These thresholds may need adjustment based on different UI capabilities or use cases.Consider exposing these as configurable settings or at minimum documenting the rationale:
+# Complexity thresholds for UI rendering. Schemas exceeding these limits +# fall back to raw JSON string input to prevent UI rendering issues. MAX_ANYOF_ITEMS = 2 MAX_OBJECT_PROPERTIES_IN_ANYOF = 3 MAX_OBJECT_PROPERTIES = 5
145-146: Simplify None type check.The check
parsed is not type(None)compares againstNoneTypeclass, but sinceparse_typecan returnNonefor null schemas, this check might not behave as intended. A cleaner approach would use thetypesmodule.+from types import NoneType + # In the anyOf handling: - if parsed is not None and parsed is not type(None): + if parsed is not None and parsed is not NoneType:Alternatively, if you want to filter out both
Nonevalues andNoneTypeas a type:if parsed not in (None, type(None)):
177-209: Duplicated complexity detection logic.The complexity checks for array items (lines 182-200) largely duplicate the
is_complex_schemafunction defined at lines 69-94. Consider refactoring to reuse the helper function.if item_schema: - # Check for complex structures that UI cannot handle properly - is_complex = False - - # Check if items schema has additionalProperties with anyOf (very complex) - if "additionalProperties" in item_schema: - additional_props = item_schema["additionalProperties"] - if isinstance(additional_props, dict) and "anyOf" in additional_props: - anyof_items = additional_props.get("anyOf", []) - # If anyOf has more than 2 items or contains complex nested structures - if len(anyof_items) > MAX_ANYOF_ITEMS: - is_complex = True - else: - # Check if anyOf items are complex objects themselves - for item in anyof_items: - if isinstance(item, dict) and item.get("type") == "object": - is_complex = True - break - - if not is_complex and item_schema.get("type") == "object" and "properties" in item_schema: - # Complex object with many properties - properties_count = len(item_schema.get("properties", {})) - if properties_count > MAX_OBJECT_PROPERTIES: # Threshold for complexity - is_complex = True + # Check for complex structures that UI cannot handle properly + is_complex = is_complex_schema(item_schema) # For complex array items, fall back to list[dict[str, Any]] instead of str
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
src/lfx/pyproject.toml(1 hunks)src/lfx/src/lfx/base/mcp/util.py(4 hunks)src/lfx/src/lfx/schema/json_schema.py(5 hunks)src/lfx/tests/unit/base/mcp/test_repro_error.py(1 hunks)src/lfx/tests/unit/base/mcp/test_repro_list_type.py(1 hunks)src/lfx/tests/unit/base/mcp/test_util_custom.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/{test_*.py,*.test.ts,*.test.tsx}
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
Check that test files follow the project's naming conventions (test_*.py for backend, *.test.ts for frontend)
Files:
src/lfx/tests/unit/base/mcp/test_repro_list_type.pysrc/lfx/tests/unit/base/mcp/test_util_custom.pysrc/lfx/tests/unit/base/mcp/test_repro_error.py
**/test_*.py
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/test_*.py: Backend tests should follow pytest structure with proper test_*.py naming
For async functions, ensure proper async testing patterns are used with pytest for backend
Files:
src/lfx/tests/unit/base/mcp/test_repro_list_type.pysrc/lfx/tests/unit/base/mcp/test_util_custom.pysrc/lfx/tests/unit/base/mcp/test_repro_error.py
🧠 Learnings (1)
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use predefined JSON flows and utility functions from `tests.unit.build_utils` (create_flow, build_flow, get_build_events, consume_and_assert_stream) for flow execution testing
Applied to files:
src/lfx/tests/unit/base/mcp/test_util_custom.py
🧬 Code graph analysis (1)
src/lfx/tests/unit/base/mcp/test_repro_error.py (1)
src/lfx/src/lfx/schema/json_schema.py (1)
create_input_schema_from_json_schema(45-324)
🔇 Additional comments (6)
src/lfx/tests/unit/base/mcp/test_repro_error.py (1)
1-56: Good coverage of anyOf edge cases.The tests cover important scenarios for JSON Schema anyOf handling: simple arrays, complex object/array combinations, nested anyOf structures, and schemas with list defaults. These serve as effective regression tests for the schema parsing improvements.
Similar to
test_repro_list_type.py, consider adding assertions that validate the generated field types match expectations (e.g., verifying Union types are created correctly).src/lfx/src/lfx/schema/json_schema.py (2)
170-173: Good fix for the "unhashable type: 'list'" error.This correctly handles JSON Schema where
typeis specified as a list (e.g.,["string", "null"]) by converting it to an equivalentanyOfstructure. This directly addresses the TypeError mentioned in the PR objectives.
270-294: I'm unable to complete the verification due to repository access issues. The web search confirms that JSON Schema'sadditionalPropertiesis meant to handle properties NOT in the explicitpropertieslist, but I cannot verify the actual Python implementation concern without access to the full code context.To properly verify the reviewer's claim about "inconsistent handling" and potential "data loss," I need:
Required information:
- Full function implementation - Lines covering the entire function (from the beginning through the end) that processes this schema
- The code path when both
propertiesANDadditionalPropertiesexist - What happens after thenot propscheck on line 281? Is there a separate branch?- How
propsis populated - Where does thepropertiesdict come from?- The caller/return path - How does this function handle cases where explicit properties exist alongside
additionalProperties?Could you provide:
- The function name and signature
- Lines 200-300+ of the file to show the complete logic flow
- Or a link to the file in the repository
Without this context, I cannot confirm whether the reviewer's concern is valid or if there's existing logic handling the combined case that makes the review comment incorrect.
src/lfx/src/lfx/base/mcp/util.py (1)
413-417: I'll help you verify this review comment about the numeric-to-string conversion issue. Let me start by examining the code context and understanding the type handling in this utility file.
<function_calls>
#!/bin/bashFirst, let's look at the file structure and get line numbers
wc -l src/lfx/src/lfx/base/mcp/util.py
</function_calls>
<function_calls>
#!/bin/bashExamine the context around lines 413-417 (force conversion) and lines 344-349 (Union handling)
sed -n '340,420p' src/lfx/src/lfx/base/mcp/util.py
</function_calls>
<function_calls>
#!/bin/bashSearch for comments or documentation about numeric-to-string conversion
rg -n "Force.*string|force.*converting|numeric.*argument|MCP.*argument" --type=py src/lfx/src/lfx/base/mcp/util.py -B 3 -A 3
</function_calls>
<function_calls>
#!/bin/bashLook for where arguments are used after this conversion
rg -n "arguments[" --type=py src/lfx/src/lfx/base/mcp/util.py -A 2 -B 2 | head -100
</function_calls>
<function_calls>
#!/bin/bashCheck the function signature and docstring to understand the purpose
sed -n '300,430p' src/lfx/src/lfx/base/mcp/util.py | cat -n
</function_calls>src/lfx/pyproject.toml (1)
44-45: LGTM! Dependency verified — mcp 1.17.0 exists on PyPI.The
mcp>=1.17.0,<2.0.0constraint is valid and follows the project's versioning pattern. The version 1.17.0 was released October 10, 2025, and the constraint appropriately pins to the 1.x series while allowing patch and minor updates.src/lfx/tests/unit/base/mcp/test_util_custom.py (1)
26-43: Based on the web search results, I found general patterns for_post_process_argumentsthat focus on JSON parsing and unescaping strings. However, the test you're verifying expects a different and more specific transformation: wrapping primitive values in{"value": ...}format and converting numbers to strings.The web search results show the typical pattern is simply parsing JSON strings with
json.loads(), but the test assertions show:
arguments["data"][0]["id"]→{"value": "1"}(wrapped, converted to string)arguments["data"][0]["val"]→{"value": "a"}(wrapped)arguments["config"]["timeout"]→10(not wrapped, stays as int)This inconsistency between the general documentation pattern and the test expectations, combined with the infrastructure limitation preventing repository access, means I cannot definitively verify if the test assertions match the actual implementation.
The original review comment's concern remains valid: the value-wrapping transformation is quite specific and needs verification against the actual implementation.
Without access to the codebase, I recommend:
- Manually verify that the
_post_process_argumentsimplementation actually produces{"value": "1"}format- Confirm the mixed behavior (wrapping array items but not top-level config values) is intentional
- Add explanatory comments in the test or implementation if this format is required by downstream consumers
Description
This PR addresses several issues related to MCP (Model Context Protocol) integration, specifically focusing on schema parsing robustness and dependency management.
Key Changes
Fix
unhashable type: 'list'Error:TypeErrorinsrc/lfx/src/lfx/schema/json_schema.pythat occurred when a JSON schematypefield was defined as a list (e.g.,["string", "null"]).parse_typeto correctly handle list types by converting them into ananyOfstructure.Improve Complex Schema Handling:
is_complex_schemalogic to detect schemas that are too complex for the UI to render natively (e.g., nestedanyOfinadditionalProperties, objects with many properties).str(JSON input) to ensure the UI remains usable and doesn't crash or render incorrectly.Add Missing Dependency:
mcptosrc/lfx/pyproject.tomlas it is required bysrc/lfx/src/lfx/base/mcp/util.py.Refactor
util.py:_fill_defaultsand_post_process_argumentsinsrc/lfx/src/lfx/base/mcp/util.py.Tests:
src/lfx/tests/unit/base/mcp/test_util_custom.pyto test the new helper functions and complex schema detection.src/lfx/tests/unit/base/mcp/test_repro_list_type.pyas a regression test for the list type error.CodeRabbit Review Fixes:
.schemapatch injson_schema.pyin favor of Pydantic v2 standards.util.pyandjson_schema.py.CI Fixes:
Type of Change
How Has This Been Tested?
Summary by CodeRabbit
New Features
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.