-
Notifications
You must be signed in to change notification settings - Fork 8.2k
fix: Split text should only have dataframe output #8362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes consolidate the outputs of the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SplitTextComponent
User->>SplitTextComponent: Call split_text()
SplitTextComponent->>SplitTextComponent: Process and split documents
SplitTextComponent->>SplitTextComponent: Convert to Data objects
SplitTextComponent->>SplitTextComponent: Construct DataFrame from Data objects
SplitTextComponent-->>User: Return DataFrame
Suggested labels
Suggested reviewers
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/backend/base/langflow/components/processing/split_text.py (1)
134-135: Add missing docstring for the split_text method.The implementation correctly returns a DataFrame by wrapping the result of
_docs_to_data(self.split_text_base()). However, the method is missing a docstring as flagged by the static analysis tool.+ def split_text(self) -> DataFrame: + """Split the input text into chunks and return as a DataFrame. + + Returns: + DataFrame: A DataFrame containing the text chunks with metadata. + """ - def split_text(self) -> DataFrame: return DataFrame(self._docs_to_data(self.split_text_base()))🧰 Tools
🪛 Pylint (3.3.7)
[convention] 134-134: Missing function or method docstring
(C0116)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/backend/base/langflow/components/processing/split_text.py(2 hunks)src/backend/tests/unit/components/processing/test_split_text_component.py(9 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/backend/tests/unit/components/processing/test_split_text_component.py (2)
src/backend/base/langflow/components/processing/split_text.py (2)
split_text(134-135)SplitTextComponent(9-135)src/backend/base/langflow/schema/dataframe.py (1)
DataFrame(11-206)
🪛 Pylint (3.3.7)
src/backend/tests/unit/components/processing/test_split_text_component.py
[convention] 58-58: Line too long (113/100)
(C0301)
[convention] 88-88: Line too long (113/100)
(C0301)
[convention] 118-118: Line too long (113/100)
(C0301)
[convention] 200-200: Line too long (101/100)
(C0301)
[convention] 220-220: Line too long (103/100)
(C0301)
[convention] 221-221: Line too long (105/100)
(C0301)
[convention] 222-222: Line too long (107/100)
(C0301)
[convention] 223-223: Line too long (107/100)
(C0301)
[convention] 244-244: Line too long (103/100)
(C0301)
[convention] 245-245: Line too long (105/100)
(C0301)
[convention] 246-246: Line too long (107/100)
(C0301)
[convention] 247-247: Line too long (107/100)
(C0301)
src/backend/base/langflow/components/processing/split_text.py
[convention] 134-134: Missing function or method docstring
(C0116)
🔇 Additional comments (6)
src/backend/base/langflow/components/processing/split_text.py (1)
66-66: LGTM! Output consolidation aligns with PR objectives.The change to a single DataFrame output named "dataframe" properly consolidates the previous separate outputs as intended by the PR.
src/backend/tests/unit/components/processing/test_split_text_component.py (5)
55-67: Excellent test updates for DataFrame interface validation.The test correctly verifies the new DataFrame output interface:
- Proper type assertion for DataFrame
- Row count validation using
len(data_frame)- Column structure verification
- DataFrame indexing with
.iloc[0]["text"]for accessing chunk contentThese changes properly validate the consolidated output format.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 58-58: Line too long (113/100)
(C0301)
155-162: Well-implemented metadata preservation test.The test correctly uses
data_frame.iterrows()to iterate over DataFrame rows and validates that metadata is preserved in each chunk. This is the proper way to access row data in a DataFrame.
200-200: Consistent DataFrame column access pattern.The test properly uses
results["text"][0]to access the text column data directly, which is an efficient way to access DataFrame column values.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 200-200: Line too long (101/100)
(C0301)
220-223: Proper DataFrame access for multiple inputs test.The test correctly validates multiple text inputs by accessing the DataFrame columns directly with
results["text"][index]pattern, ensuring all chunks are properly created and accessible.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 220-220: Line too long (103/100)
(C0301)
[convention] 221-221: Line too long (105/100)
(C0301)
[convention] 222-222: Line too long (107/100)
(C0301)
[convention] 223-223: Line too long (107/100)
(C0301)
269-269: Correct DataFrame type assertion for URL loader integration.The test properly verifies that the split_text method returns a DataFrame when working with URL loader output, ensuring the integration works with the new consolidated interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json (1)
879-879: Embedded component code updated correctly.The JSON-embedded Python block for
SplitTextComponentnow reflects the singledataframeoutput viasplit_text, with no lingeringas_dataframeor list-based outputs. Ensure that the JSON string literal uses proper escaping to remain valid.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json(1 hunks)src/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.json(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Run Ruff Check and Format
- GitHub Check: Optimize new Python code in this PR
- GitHub Check: Update Starter Projects
- GitHub Check: Ruff Style Check (3.13)
🔇 Additional comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json (1)
925-925: Formatting-only change detectedThis JSON configuration shows a re-serialization/formatting update with no semantic modifications to the
APIRequestComponent. No action is needed.
| "dataType": "SplitText", | ||
| "id": "SplitText-aHhAi", | ||
| "name": "chunks", | ||
| "output_types": [ | ||
| "Data" | ||
| ] | ||
| "output_types": [] | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align output handle metadata with refactored component.
The "name": "chunks" and empty "output_types": [] must be updated to match the new output alias and type, i.e.:
- "name": "chunks",
- "output_types": []
+ "name": "dataframe",
+ "output_types": ["DataFrame"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "dataType": "SplitText", | |
| "id": "SplitText-aHhAi", | |
| "name": "chunks", | |
| "output_types": [ | |
| "Data" | |
| ] | |
| "output_types": [] | |
| }, | |
| "dataType": "SplitText", | |
| "id": "SplitText-aHhAi", | |
| "name": "dataframe", | |
| "output_types": ["DataFrame"] | |
| }, |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Vector Store
RAG.json around lines 152 to 156, update the "name" and "output_types" fields in
the output handle metadata to match the refactored component's new output alias
and type. Replace "name": "chunks" with the new output alias and set
"output_types" to the correct array reflecting the new output type.
| "id": "reactflow__edge-SplitText-aHhAi{œdataTypeœ:œSplitTextœ,œidœ:œSplitText-aHhAiœ,œnameœ:œchunksœ,œoutput_typesœ:[œDataœ]}-AstraDB-lXzoG{œfieldNameœ:œingest_dataœ,œidœ:œAstraDB-lXzoGœ,œinputTypesœ:[œDataœ,œDataFrameœ],œtypeœ:œotherœ}", | ||
| "selected": false, | ||
| "source": "SplitText-aHhAi", | ||
| "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œchunksœ, œoutput_typesœ: [œDataœ]}", | ||
| "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œchunksœ, œoutput_typesœ: []}", | ||
| "target": "AstraDB-lXzoG", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Update edge sourceHandle to new output handle.
The JSON-encoded sourceHandle still references "name": "chunks" and an empty output_types. It should reflect the new handle:
- "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œchunksœ, œoutput_typesœ: []}"
+ "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œdataframeœ, œoutput_typesœ: [œDataFrameœ]}"so that the starter project wires into the DataFrame output correctly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "id": "reactflow__edge-SplitText-aHhAi{œdataTypeœ:œSplitTextœ,œidœ:œSplitText-aHhAiœ,œnameœ:œchunksœ,œoutput_typesœ:[œDataœ]}-AstraDB-lXzoG{œfieldNameœ:œingest_dataœ,œidœ:œAstraDB-lXzoGœ,œinputTypesœ:[œDataœ,œDataFrameœ],œtypeœ:œotherœ}", | |
| "selected": false, | |
| "source": "SplitText-aHhAi", | |
| "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œchunksœ, œoutput_typesœ: [œDataœ]}", | |
| "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œchunksœ, œoutput_typesœ: []}", | |
| "target": "AstraDB-lXzoG", | |
| "id": "reactflow__edge-SplitText-aHhAi{œdataTypeœ:œSplitTextœ,œidœ:œSplitText-aHhAiœ,œnameœ:œchunksœ,œoutput_typesœ:[œDataœ]}-AstraDB-lXzoG{œfieldNameœ:œingest_dataœ,œidœ:œAstraDB-lXzoGœ,œinputTypesœ:[œDataœ,œDataFrameœ],œtypeœ:œotherœ}", | |
| "selected": false, | |
| "source": "SplitText-aHhAi", | |
| "sourceHandle": "{œdataTypeœ: œSplitTextœ, œidœ: œSplitText-aHhAiœ, œnameœ: œdataframeœ, œoutput_typesœ: [œDataFrameœ]}", | |
| "target": "AstraDB-lXzoG", |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Vector Store
RAG.json around lines 167 to 171, the edge's sourceHandle field still references
the old output handle with "name": "chunks" and an empty output_types array.
Update the sourceHandle to reflect the new output handle by changing the "name"
to the correct new output name and setting the "output_types" to include
"DataFrame" so that the wiring correctly connects to the DataFrame output.
Yukiyukiyeah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* fix: Split text should only have dataframe output * [autofix.ci] apply automated fixes * Update templates * test fix * Update Vector Store RAG.json * Update starter projects * fix tests * add shards --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Mike Fortman <[email protected]> Co-authored-by: cristhianzl <[email protected]>
* fix: Split text should only have dataframe output * [autofix.ci] apply automated fixes * Update templates * test fix * Update Vector Store RAG.json * Update starter projects * fix tests * add shards --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Mike Fortman <[email protected]> Co-authored-by: cristhianzl <[email protected]>
This pull request renames the output of the split text component, and removes the Data output in favor of dataframe.
Summary by CodeRabbit
New Features
Bug Fixes
Tests