fix: Knowledge base component refactor #9543

erichare · 2025-08-26T16:21:00Z

This pull request refactors the knowledge base ingestion and retrieval components in Langflow, moving them out of the data module into a dedicated knowledge_bases module, and introduces dynamic/lazy importing for these components. It also updates the starter project configuration and improves input/output handling and error management for knowledge base operations.

Component Refactoring and Module Organization:

The KBIngestionComponent and KBRetrievalComponent have been renamed to KnowledgeIngestionComponent and KnowledgeRetrievalComponent, respectively, and moved from langflow.components.data to a new langflow.components.knowledge_bases module for better code organization. [1] [2]
Knowledge base utility imports have been updated to use the new module path, reflecting the refactor. [1] [2]

Dynamic Import and API Improvements:

The new knowledge_bases/__init__.py uses Python’s __getattr__ and __dir__ to lazily import components only when accessed, improving startup time and modularity.

Input/Output Handling Enhancements:

The ingestion component now uses HandleInput for the input dataframe, supports both Data and DataFrame types (including lists), and automatically converts Data objects to dataframes, making it more flexible for various input sources. [1] [2]
The retrieval component adds an advanced option to include embeddings in the output and refines metadata inclusion logic for more granular control over returned data. Output method names have been clarified. [1] [2] [3] [4]

Error Handling Improvements:

Error management in the ingestion component now raises exceptions instead of returning error data objects, ensuring failures are handled consistently and transparently.

Starter Project Updates:

The starter project JSON for Knowledge Ingestion has been updated to use the new component names, handle IDs, and input types, reflecting the refactored API and improved input flexibility. [1] [2] [3] [4] [5] [6] [7] [8]

Summary by CodeRabbit

New Features
- Added “Knowledge Bases” category to the sidebar.
- Knowledge Retrieval now supports optional embeddings and improved query tool mode.
Refactor
- Renamed components: KBIngestion → KnowledgeIngestion (icon: upload), KBRetrieval → KnowledgeRetrieval (icon: download).
- Knowledge Ingestion accepts Data, DataFrame, and lists; supports real-time refresh when selecting knowledge bases.
- Consolidated Knowledge Base components under a dedicated package for cleaner access.
Chores
- Updated starter projects to use the new Knowledge Ingestion and Retrieval components.

coderabbitai · 2025-08-26T16:21:09Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Removes KBIngestion/KBRetrieval re-exports from components.data; introduces knowledge_bases package with lazy-loaded exports. Renames and relocates ingestion/retrieval components, updates their inputs, methods, and UI metadata; adjusts embedding/metadata retrieval behavior. Updates starter project JSONs, tests, and a frontend sidebar category.

Changes

Cohort / File(s)	Summary
Data package exports cleanup `src/backend/base/langflow/components/data/__init__.py`	Drops re-exports/imports of KBIngestionComponent and KBRetrievalComponent from data package. all pruned accordingly.
Knowledge bases package API (lazy loading) `src/backend/base/langflow/components/knowledge_bases/__init__.py`	Adds lazy-loading public API for KnowledgeIngestionComponent and KnowledgeRetrievalComponent via getattr, all, dir, and TYPE_CHECKING stubs.
Knowledge ingestion component refactor `src/backend/base/langflow/components/knowledge_bases/ingestion.py`	Renames KBIngestionComponent → KnowledgeIngestionComponent; icon/name updated. Input changes: DataFrameInput → HandleInput (supports Data/DataFrame, lists), real_time_refresh on KB input. Logic accepts Data/List[Data], converts to DataFrame; on errors now raises RuntimeError.
Knowledge retrieval component refactor `src/backend/base/langflow/components/knowledge_bases/retrieval.py`	Renames KBRetrievalComponent → KnowledgeRetrievalComponent; icon/name updated. Method get_chroma_kb_data → retrieve_data; output port renamed. Adds include_embeddings input; search_query tool_mode enabled; embeddings fetched conditionally and attached when requested; import path for utilities updated.
Starter projects updates (ingestion) `src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json`	Node/type/module renamed to KnowledgeIngestion; inputs schema updated (HandleInput, KB dialog real_time_refresh, table schema changes); IDs/edges regenerated; utility import paths updated.
Starter projects updates (retrieval) `src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json`	Node/type/module renamed to KnowledgeRetrieval; output/method renamed to retrieve_data; adds include_embeddings; rewires edges and updates metadata/paths.
Tests: kb utils import move `src/backend/tests/unit/base/data/test_kb_utils.py`	Updates imports from base.data.kb_utils → base.knowledge_bases.knowledge_base_utils; test logic unchanged.
Tests: ingestion component `src/backend/tests/unit/components/knowledge_bases/test_ingestion.py`	Updates to KnowledgeIngestionComponent and new module/utility paths; patches/mocks redirected; test class renamed.
Tests: retrieval component `src/backend/tests/unit/components/knowledge_bases/test_retrieval.py`	Updates to KnowledgeRetrievalComponent and new module/utility paths; method calls renamed to retrieve_data; patches adjusted; test class/methods renamed.
Frontend sidebar category `src/frontend/src/utils/styleUtils.ts`	Adds “Knowledge Bases” category to SIDEBAR_CATEGORIES.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Importer
  participant KBPkg as components.knowledge_bases.__init__
  participant Mod as ingestion/retrieval submodules

  Dev->>KBPkg: from ...knowledge_bases import KnowledgeIngestionComponent
  alt attribute not yet loaded
    KBPkg->>KBPkg: __getattr__("KnowledgeIngestionComponent")
    KBPkg->>Mod: import_mod("KnowledgeIngestionComponent", "ingestion", parent)
    Mod-->>KBPkg: class KnowledgeIngestionComponent
    KBPkg->>KBPkg: cache in globals
  end
  KBPkg-->>Dev: KnowledgeIngestionComponent

sequenceDiagram
  autonumber
  participant UI as Flow/UI
  participant Ingest as KnowledgeIngestionComponent
  participant Utils as knowledge_base_utils
  participant Store as Vector Store

  UI->>Ingest: run(input_df=Data|DataFrame|List[Data], kb, config)
  Ingest->>Ingest: normalize input_df to DataFrame
  Ingest->>Utils: get_knowledge_bases(), settings, user
  Ingest->>Store: ingest DataFrame into KB collection
  Note over Ingest: On error: raise RuntimeError
  Store-->>Ingest: ack
  Ingest-->>UI: status/result DataFrame

sequenceDiagram
  autonumber
  participant UI as Flow/UI
  participant Retr as KnowledgeRetrievalComponent
  participant Utils as knowledge_base_utils
  participant VS as Vector Store

  UI->>Retr: retrieve_data(kb, search_query, include_metadata, include_embeddings)
  Retr->>Utils: load KB config, embeddings provider
  Retr->>VS: query (similarity_search[_with_score])
  alt include_metadata or include_embeddings
    Retr->>VS: collection.get(ids, include metadatas[, embeddings])
    VS-->>Retr: metadatas[, embeddings]
  end
  Retr->>Retr: build rows: content + optional metadata + optional _embeddings
  Retr-->>UI: DataFrame "Results"

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: Make knowledge bases user-stored and support global vars #9458: Touches the same ingestion/retrieval components with renames/moves and API changes in knowledge_bases.
feat: Add support for Ingestion and Retrieval of Knowledge Bases #9088: Earlier work placing KBIngestion/KBRetrieval under components.data; this PR supersedes by relocating to components.knowledge_bases.
ref: URL and File components with Dataframe output #8117: Migrates kb_utils imports to knowledge_bases.knowledge_base_utils, matching this PR’s utility path changes.

Suggested labels

refactor, size:XL, lgtm

Suggested reviewers

rodrigosnader
edwinjosechittilappilly

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-kb-adjustments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (4)

526-553: Potential UnboundLocalError: embedding_model/api_key used before assignment.

If embedding_metadata.json does not exist (or lacks keys), embedding_model and/or api_key are undefined when passed to _create_vector_store. This violates the PR objective of “raise on failures” and will surface as an unhelpful UnboundLocalError.

Apply this diff to initialize, validate, and fail fast with a clear error:

@@ async def build_kb_info(self) -> Data:
-            # Read the embedding info from the knowledge base folder
+            # Read the embedding info from the knowledge base folder
             kb_path = await self._kb_path()
@@
-            metadata_path = kb_path / "embedding_metadata.json"
+            metadata_path = kb_path / "embedding_metadata.json"
+            embedding_model: str | None = None
+            api_key: str | None = None
@@
-            if metadata_path.exists():
+            if metadata_path.exists():
                 settings_service = get_settings_service()
                 metadata = json.loads(metadata_path.read_text())
-                embedding_model = metadata.get("embedding_model")
+                embedding_model = metadata.get("embedding_model")
                 try:
                     api_key = decrypt_api_key(metadata["api_key"], settings_service)
                 except (InvalidToken, TypeError, ValueError) as e:
                     logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
@@
-            if self.api_key:
+            if self.api_key:
                 api_key = self.api_key
                 self._save_embedding_metadata(
                     kb_path=kb_path,
                     embedding_model=embedding_model,
                     api_key=api_key,
                 )
+
+            # Validate presence of embedding model (and api_key if required)
+            if not embedding_model:
+                raise ValueError(
+                    "Embedding metadata not found for knowledge base. "
+                    "Create the KB via the 'Create new knowledge' dialog or ensure embedding_metadata.json exists."
+                )

342-381: Swallowing exceptions in _create_vector_store hides ingestion failures.

The PR states ingestion should raise on failures. _create_vector_store logs errors but does not propagate them, so build_kb_info may report success with zero documents.

Propagate exceptions after logging:

@@ async def _create_vector_store(self, df_source: pd.DataFrame, config_list: list[dict[str, Any]], embedding_model: str, api_key: str) -> None:
-        try:
+        try:
@@
-        except (OSError, ValueError, RuntimeError) as e:
-            self.log(f"Error creating vector store: {e}")
+        except (OSError, ValueError, RuntimeError) as e:
+            self.log(f"Error creating vector store: {e}")
+            raise

416-454: Confusing/incorrect use of content vs identifier columns when building text and IDs.

The comment says “Build content text from identifier columns,” but the code uses content_cols. Then page_content is reassigned from identifier_cols and reused for hashing. This conflates concerns and is error-prone.

Build text_content from content_cols for embeddings.
Build id_source from identifier_cols if present; else fall back to text_content.
Hash id_source for _id.

@@     async def _convert_df_to_data_objects(self, df_source: pd.DataFrame, config_list: list[dict[str, Any]]) -> list[Data]:
-        for _, row in df_source.iterrows():
-            # Build content text from identifier columns using list comprehension
-            identifier_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]
-
-            # Join all parts into a single string
-            page_content = " ".join(identifier_parts)
-
-            # Build metadata from NON-vectorized columns only (simple key-value pairs)
-            data_dict = {
-                "text": page_content,  # Main content for vectorization
-            }
-
-            # Add identifier columns if they exist
-            if identifier_cols:
-                identifier_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]
-                page_content = " ".join(identifier_parts)
+        for _, row in df_source.iterrows():
+            # Build the text content from content (vectorized) columns
+            content_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]
+            text_content = " ".join(content_parts)
+
+            # Determine identifier source (prefer identifier cols; else fall back to text_content)
+            if identifier_cols:
+                id_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]
+                id_source = " ".join(id_parts)
+            else:
+                id_source = text_content
+
+            # Build metadata from NON-vectorized columns only (simple key-value pairs)
+            data_dict = {
+                "text": text_content,  # Main content to embed
+            }
@@
-            # Hash the page_content for unique ID
-            page_content_hash = hashlib.sha256(page_content.encode()).hexdigest()
+            # Hash the identifier source for unique ID
+            page_content_hash = hashlib.sha256(id_source.encode()).hexdigest()
             data_dict["_id"] = page_content_hash

635-642: Catching the wrong timeout exception during embedding validation.

asyncio.wait_for raises asyncio.TimeoutError, not built-in TimeoutError. The current except block won’t trigger.

Fix the except clause:

-                except TimeoutError as e:
+                except asyncio.TimeoutError as e:
                     msg = "Embedding validation timed out. Please verify network connectivity and key."
                     raise ValueError(msg) from e

src/backend/base/langflow/components/knowledge_bases/retrieval.py (1)

239-257: Bug: Constructing Data with arbitrary kwargs will likely fail validation

Data expects its payload under the data field (see DataFrame.to_data_list usage across the codebase). Instantiating as Data(**kwargs) risks Pydantic validation errors and silent schema drift. Wrap the row dict in Data(data=...).

Apply this diff:
-            data_list.append(Data(**kwargs))
+            data_list.append(Data(data=kwargs))

src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (1)

853-869: Potential UnboundLocalError and weak defaults in build_kb_info

Inside the embedded component code (template.code.value), build_kb_info uses embedding_model and api_key variables that may be unset if metadata_path doesn’t exist or decrypt fails. That can throw NameError/UnboundLocalError and makes error handling brittle.

Patch the method to initialize variables and fail loudly when metadata is missing; also normalize SecretStr to str:

@@
-    async def build_kb_info(self) -> Data:
+    async def build_kb_info(self) -> Data:
@@
-            # Read the embedding info from the knowledge base folder
+            # Read the embedding info from the knowledge base folder
             kb_path = await self._kb_path()
             if not kb_path:
                 msg = "Knowledge base path is not set. Please create a new knowledge base first."
                 raise ValueError(msg)
             metadata_path = kb_path / "embedding_metadata.json"
 
-            # If the API key is not provided, try to read it from the metadata file
-            if metadata_path.exists():
-                settings_service = get_settings_service()
-                metadata = json.loads(metadata_path.read_text())
-                embedding_model = metadata.get("embedding_model")
-                try:
-                    api_key = decrypt_api_key(metadata["api_key"], settings_service)
-                except (InvalidToken, TypeError, ValueError) as e:
-                    logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
+            # Initialize defaults
+            embedding_model: str | None = None
+            api_key: str | None = None
+
+            # If the API key is not provided, try to read it from the metadata file
+            if metadata_path.exists():
+                settings_service = get_settings_service()
+                metadata = json.loads(metadata_path.read_text())
+                embedding_model = metadata.get("embedding_model")
+                try:
+                    enc = metadata.get("api_key")
+                    api_key = decrypt_api_key(enc, settings_service) if enc else None
+                except (InvalidToken, TypeError, ValueError) as e:
+                    logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
+                    api_key = None
+            else:
+                msg = "Embedding metadata not found. Create the knowledge base first from the dropdown dialog."
+                raise ValueError(msg)
 
             # Check if a custom API key was provided, update metadata if so
-            if self.api_key:
-                api_key = self.api_key
+            if self.api_key:
+                api_key = self.api_key.get_secret_value() if hasattr(self.api_key, "get_secret_value") else str(self.api_key)
                 self._save_embedding_metadata(
                     kb_path=kb_path,
                     embedding_model=embedding_model,
                     api_key=api_key,
                 )
 
+            if not embedding_model:
+                raise ValueError("Embedding model unavailable. Metadata is missing or invalid.")
+
             # Create vector store following Local DB component pattern
             await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key)

🧹 Nitpick comments (19)

src/backend/tests/unit/base/data/test_kb_utils.py (4)
5-5: Rename test class to reflect the new module name.

Keeps naming consistent with the new package.
-class TestKBUtils:
+class TestKnowledgeBaseUtils:
45-47: Strengthen the assertion to match the comment (tokenization doesn’t stem “cats”→“cat”).

As written, >= 0.0 always passes and doesn’t validate the stated behavior.
-        # Third document contains both "cats" and "dogs", but case-insensitive matching should work
-        # Note: "cats" != "cat" exactly, so this tests the term matching behavior
-        assert scores[2] >= 0.0
+        # Third document contains "cats" and "dogs" (plural), which should NOT match "cat"/"dog"
+        # given simple whitespace tokenization without stemming.
+        assert scores[2] == 0.0
178-181: Make “different scores” assertions robust to edge cases and float quirks.

Direct list inequality can be brittle if parameters incidentally yield identical vectors; check for any meaningful element-wise difference instead.
-        # Scores should be different with different parameters
-        assert scores_default != scores_k1
-        assert scores_default != scores_b
+        # Scores should be different with different parameters (tolerate float quirks)
+        assert any(abs(a - b) > 1e-12 for a, b in zip(scores_default, scores_k1))
+        assert any(abs(a - b) > 1e-12 for a, b in zip(scores_default, scores_b))
1-459: (Optional) Consider relocating/renaming this test to mirror the new package path.

For long-term maintainability, consider moving the file to:

src/backend/tests/unit/base/knowledge_bases/test_knowledge_base_utils.py

This keeps tests aligned with the refactor’s package structure.

If you’d like, I can draft the minimal PR-wide file move plan (including updating any import paths) to keep everything consistent.
src/frontend/src/utils/styleUtils.ts (1)
214-218: Add icon/color mappings for the new 'knowledge_bases' category to keep UI consistent.

You added the category to SIDEBAR_CATEGORIES, but there are no corresponding entries in categoryIcons, nodeIconToDisplayIconMap, nodeColors, or nodeColorsName. Some UI surfaces use those maps; missing keys can cause fallback icons/colors or blank icons.

Apply this diff to add consistent mappings:
--- a/src/frontend/src/utils/styleUtils.ts
+++ b/src/frontend/src/utils/styleUtils.ts
@@
 export const nodeColors: { [char: string]: string } = {
@@
   Tool: "#00fbfc",
+  knowledge_bases: "#7c3aed", // violet-ish, distinct from vectorstores/retrievers
 };
@@
 export const nodeColorsName: { [char: string]: string } = {
@@
   DataFrame: "pink",
+  knowledge_bases: "violet",
 };
@@
 export const categoryIcons: Record<string, string> = {
@@
   toolkits: "Package2",
   tools: "Hammer",
+  knowledge_bases: "Package2",
   custom: "Edit",
   custom_components: "GradientInfinity",
 };
@@
 export const nodeIconToDisplayIconMap: Record<string, string> = {
@@
   textsplitters: "Scissors",
   toolkits: "Package2",
   tools: "Hammer",
+  knowledge_bases: "Package2",
   custom_components: "GradientInfinity",
src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)

285-290: Prefer atomic write for embedding_metadata.json to avoid partial files.

A crash between write_text and flush could leave a truncated JSON. Use a temp file + replace.

I can provide a small helper to write atomically using NamedTemporaryFile if you want it integrated here.

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (3)

246-295: Good duplicate-prevention test; consider adding a variant with allow_duplicates=True.

Verifies skip on hash collision. Add a complementary test asserting both rows are returned when allow_duplicates=True.

If helpful, I can draft the test body.

103-108: Backward-compatibility mapping is missing.

Guidelines ask for file_names_mapping to keep version compatibility. Since this is a rename/move from KBIngestionComponent, provide a mapping from the old component file to the new one.

If you share the exact legacy path/class (e.g., langflow/components/data/kb_ingestion.py::KBIngestionComponent), I can propose the precise mapping snippet.

312-333: Successful path test is solid; add failure-path coverage for missing metadata.

Consider a test where embedding_metadata.json is absent: build_kb_info should now raise ValueError (after the suggested fix), and the test should assert the error message.

Happy to provide the assertion once the error text is finalized.
src/backend/base/langflow/components/knowledge_bases/__init__.py (1)
19-31: Lazy importer is clean; consider minor introspection polish.

Return a sorted list in dir for deterministic introspection order. No functional change.
-def __dir__() -> list[str]:
-    return list(__all__)
+def __dir__() -> list[str]:
+    return sorted(__all__)
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (2)

90-95: Backward-compatibility mapping likely needed here, too.

As with ingestion tests, add file_names_mapping to cover the rename from KBRetrievalComponent.

Share the old path/class and I’ll draft the mapping block.

369-385: Optional: Verify include_embeddings behavior end-to-end.

You can patch the underlying Chroma client to return embeddings and assert that results include an _embeddings key when include_embeddings=True and omit it when False.

I can draft a focused unit test that stubs Chroma._client.get_collection(...).get(...) to return deterministic vectors.
src/backend/base/langflow/components/knowledge_bases/retrieval.py (4)
71-77: Mismatch between “Include Embeddings” help text and behavior

The input’s info states embeddings are “Only applicable if 'Include Metadata' is enabled,” but the code adds _embeddings regardless of include_metadata. Either enforce the dependency or clarify the description.

Two options:

Update help text (simple, no behavior change):
-            info="Whether to include embeddings in the output. Only applicable if 'Include Metadata' is enabled.",
+            info="Whether to include embeddings in the output (adds an _embeddings field when available).",
Or gate by include_metadata:
-            if self.include_embeddings:
+            if self.include_embeddings and self.include_metadata:
                 kwargs["_embeddings"] = id_to_embedding.get(doc[0].metadata.get("_id"))
I recommend the first to keep current flexibility while avoiding confusion.

Also applies to: 245-251

231-238: Avoid reliance on private chroma._client

Accessing chroma._client is private API and brittle across langchain_chroma/chromadb versions. Prefer a public accessor if available, or guard with feature detection and clear error messages.

Example defensive pattern:
# Try safer access first
collection = getattr(getattr(chroma, "_collection", None), "get", None)
if callable(collection):
    embeddings_result = chroma._collection.get(where={"_id": {"$in": doc_ids}}, include=["metadatas", "embeddings"])
else:
    # Fallback to client with explicit warning
    if not hasattr(chroma, "_client"):
        raise RuntimeError("Underlying Chroma client not accessible to fetch embeddings safely.")
    collection = chroma._client.get_collection(name=self.knowledge_base)
    embeddings_result = collection.get(where={"_id": {"$in": doc_ids}}, include=["metadatas", "embeddings"])
If the wrapper exposes a public get/collection accessor in your pinned version, prefer that instead.

206-222: Empty-query path uses similarity_search("")

When no search_query is provided, similarity_search("") may yield implementation-defined results. If the intended behavior is “top-k arbitrary/most recent rows,” consider fetching via collection.get(limit=k) or a deterministic fallback, rather than embedding an empty string.

Sketch:
if self.search_query:
    ...
else:
    # Deterministic fallback without embeddings call
    collection = chroma._client.get_collection(name=self.knowledge_base)
    fetched = collection.get(limit=self.top_k, include=["documents", "metadatas"])
    results = [(Document(page_content=doc, metadata=meta), 0.0) for doc, meta in zip(fetched["documents"], fetched["metadatas"])]
This also avoids an unnecessary embedding of "".

245-246: Score semantics: expose distance directly or a normalized similarity

You invert the distance via -1 * doc[1], producing negative scores. Consider exposing both distance (lower is better) and a normalized similarity in [0, 1], or just distance to avoid confusion.

Example:
-    kwargs["_score"] = -1 * doc[1]
+    kwargs["distance"] = float(doc[1])
+    # Optional: bounded similarity
+    kwargs["similarity"] = 1.0 / (1.0 + float(doc[1]))
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (1)
656-673: Clarify “Include Embeddings” description in template

The description repeats the “Only applicable if 'Include Metadata' is enabled” constraint, which the backend does not enforce. To avoid UX confusion, update the info text here too.

Apply this diff:
-                "info": "Whether to include embeddings in the output. Only applicable if 'Include Metadata' is enabled.",
+                "info": "Whether to include embeddings in the output (adds an _embeddings field when available).",
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (2)
853-869: Catching the wrong timeout exception in update_build_config

asyncio.wait_for raises asyncio.TimeoutError, not built-in TimeoutError. The current except might miss the specific timeout path.

Inside update_build_config, adjust:
-                except TimeoutError as e:
+                except asyncio.TimeoutError as e:
                     msg = "Embedding validation timed out. Please verify network connectivity and key."
                     raise ValueError(msg) from e
933-939: Default column_config sets both “identifier” and “vectorize” to true

This is valid (ID used for dedup hash, text used for vectors), just calling it out. If users expect a distinct identifier (e.g., URL) you might consider an example row with a non-vectorized identifier column in docs, but no change required.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 933198d and 84e6c05.

📒 Files selected for processing (10)

src/backend/base/langflow/components/data/__init__.py (0 hunks)
src/backend/base/langflow/components/knowledge_bases/__init__.py (1 hunks)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (5 hunks)
src/backend/base/langflow/components/knowledge_bases/retrieval.py (7 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (15 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (13 hunks)
src/backend/tests/unit/base/data/test_kb_utils.py (1 hunks)
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (6 hunks)
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (8 hunks)
src/frontend/src/utils/styleUtils.ts (1 hunks)

💤 Files with no reviewable changes (1)

src/backend/base/langflow/components/data/init.py

🧰 Additional context used

📓 Path-based instructions (9)

src/frontend/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

src/frontend/src/**/*.{ts,tsx,js,jsx}: All frontend TypeScript and JavaScript code should be located under src/frontend/src/ and organized into components, pages, icons, stores, types, utils, hooks, services, and assets directories as per the specified directory layout.
Use React 18 with TypeScript for all UI components in the frontend.
Format all TypeScript and JavaScript code using the make format_frontend command.
Lint all TypeScript and JavaScript code using the make lint command.

Files:

src/frontend/src/utils/styleUtils.ts

src/frontend/src/utils/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

All utility functions should be placed in the utils directory.

Files:

src/frontend/src/utils/styleUtils.ts

{src/backend/**/*.py,tests/**/*.py,Makefile}

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code

Files:

src/backend/tests/unit/base/data/test_kb_utils.py
src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/base/langflow/components/knowledge_bases/retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py
src/backend/base/langflow/components/knowledge_bases/__init__.py

src/backend/tests/unit/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

Test component integration within flows using create_flow, build_flow, and get_build_events utilities

Files:

src/backend/tests/unit/base/data/test_kb_utils.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

src/backend/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)

src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...

Files:

src/backend/tests/unit/base/data/test_kb_utils.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

src/backend/base/langflow/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately

Files:

src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/base/langflow/components/knowledge_bases/retrieval.py
src/backend/base/langflow/components/knowledge_bases/__init__.py

src/backend/**/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/base/langflow/components/knowledge_bases/retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py
src/backend/base/langflow/components/knowledge_bases/__init__.py

src/backend/tests/unit/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components

Files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

src/backend/base/langflow/components/**/__init__.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

Update init.py with alphabetical imports when adding new components

Files:

src/backend/base/langflow/components/knowledge_bases/__init__.py

🧠 Learnings (4)

📚 Learning: 2025-07-21T14:16:14.125Z

Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/backend/tests/**/*.py : Test backward compatibility across Langflow versions in backend Python tests by mapping component files to supported versions using 'VersionComponentMapping'.

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py

📚 Learning: 2025-08-05T22:51:27.961Z

Learnt from: edwinjosechittilappilly
PR: langflow-ai/langflow#0
File: :0-0
Timestamp: 2025-08-05T22:51:27.961Z
Learning: The TestComposioComponentAuth test in src/backend/tests/unit/components/bundles/composio/test_base_composio.py demonstrates proper integration testing patterns for external API components, including real API calls with mocking for OAuth completion, comprehensive resource cleanup, and proper environment variable handling with pytest.skip() fallbacks.

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

📚 Learning: 2025-07-21T14:16:14.125Z

Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/backend/tests/**/*.py : When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

📚 Learning: 2025-07-18T18:25:54.486Z

Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

🧬 Code graph analysis (6)

src/backend/tests/unit/base/data/test_kb_utils.py (1)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (2)

compute_bm25 (53-109)

compute_tfidf (10-50)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (4)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/inputs/inputs.py (2)

HandleInput (76-87)

TableInput (38-73)

src/backend/base/langflow/schema/data.py (1)

Data (23-285)

src/backend/base/langflow/schema/dataframe.py (1)

DataFrame (11-206)

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (4)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/components/knowledge_bases/retrieval.py (2)

KnowledgeRetrievalComponent (27-256)

retrieve_data (173-256)

src/backend/tests/base.py (1)

ComponentTestBaseWithoutClient (163-164)

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (5)

component_class (16-18)

mock_knowledge_base_path (21-24)

mock_user_data (32-36)

default_kwargs (62-101)

mock_user_id (57-59)

src/backend/base/langflow/components/knowledge_bases/retrieval.py (2)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/schema/dataframe.py (1)

DataFrame (11-206)

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (2)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)

KnowledgeIngestionComponent (41-668)

src/backend/base/langflow/components/knowledge_bases/__init__.py (3)

src/backend/base/langflow/components/_importing.py (1)

import_mod (8-37)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)

KnowledgeIngestionComponent (41-668)

src/backend/base/langflow/components/knowledge_bases/retrieval.py (1)

KnowledgeRetrievalComponent (27-256)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test-starter-projects

🔇 Additional comments (15)

src/backend/tests/unit/base/data/test_kb_utils.py (1)

2-2: Import path refactor verified – ready to merge

All checks passed:

No references to the old langflow.base.data.kb_utils path were found.

The new module knowledge_base_utils.py exists under src/backend/base/langflow/base/knowledge_bases/.

Both compute_tfidf and compute_bm25 are correctly defined in that module.

Great work moving these utilities into the new knowledge_bases package.

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (2)

212-223: Patch targets for settings/key encryption look correct. LGTM.

Mocks are scoped to the module under test and won’t leak. Good practice.

109-120: Validate behavior for list inputs (Data and DataFrame).

Given the component now accepts HandleInput with is_list=True for Data/DataFrame, add tests for:

input_df: list[Data]

input_df: list[pd.DataFrame]

mixed list [Data, DataFrame] (if you adopt the defensive handling)

I can generate these unit tests with fixtures producing the data and asserting the concatenated row count.

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (3)

96-109: KB listing tests are accurate and mirror ingestion-side expectations. LGTM.

Covers hidden dirs and multiple KBs correctly.

302-329: Great override test for user-supplied API key.

Using a real SecretStr ensures parity with runtime behavior.

330-342: Test for missing metadata is valuable; mirror the ingestion-side failure.

Once ingestion raises on missing metadata, both components will behave consistently. No action here; just noting the alignment.
src/backend/base/langflow/components/knowledge_bases/retrieval.py (2)
27-33: Renaming and output alignment look good

Component metadata (display_name/name/icon), output name/method (retrieve_data), and dynamic KB options via update_build_config align with the PR goals and frontend mapping.

Also applies to: 80-87, 89-101

1-25: Formatting/Linting Verification Required

The automated check could not run in this environment because the make command was not found. Please verify locally that this module passes the repository’s formatting and linting standards by running:
make format_backend
make lint
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (4)

26-31: Edge wiring: KnowledgeRetrieval → ChatOutput is consistent

Source handle expose DataFrame; target ChatOutput accepts Data/DataFrame/Message. Connection metadata and ids look consistent.

603-617: Output port/method rename correctly reflected

Output port name retrieve_data and method retrieve_data match the backend component. Good alignment for lazy exports and UI.

715-737: Search query tool_mode enabled — good call

Marking search_query as tool_mode enables smoother interop with tool-calling models. No action needed.

535-561: Component identity updates are consistent

Module path, icon (“download”), display/title, and code_hash updates reflect the rename/move. No issues spotted.

src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (3)

713-777: Rename and module relocation look correct

Node type/name/icon updated to Knowledge Ingestion with module langflow.components.knowledge_bases.ingestion.KnowledgeIngestionComponent. Starter graph metadata consistent.

940-960: Input flexibility: HandleInput for input_df (Data | DataFrame, list) is aligned with PR goals

This improves usability by accepting Data directly and lists. Matches the described auto-conversion behavior.

1-1122: No outdated references detected; please run formatting and lint manually

Ripgrep search for KBIngestionComponent, KBRetrievalComponent, get_chroma_kb_data, chroma_kb_data, components.data.kb_retrieval, components.data.kb_ingestion, or kb_utils returned no matches.

The make format_backend and make lint commands weren’t available in this environment—please run your project’s formatting and lint steps (e.g. via your Makefile, npm scripts, or CI commands) to ensure code style consistency.

coderabbitai · 2025-08-26T16:36:12Z

src/backend/base/langflow/components/knowledge_bases/ingestion.py

+        HandleInput(
            name="input_df",
-            display_name="Data",
-            info="Table with all original columns (already chunked / processed).",
+            display_name="Input",
+            info=(
+                "Table with all original columns (already chunked / processed). "
+                "Accepts Data or DataFrame. If Data is provided, it is converted to a DataFrame automatically."
+            ),
+            input_types=["Data", "DataFrame"],
+            is_list=True,
            required=True,


⚠️ Potential issue

Input handling misses list[DataFrame] support; can crash downstream.

The HandleInput advertises input_types ["Data", "DataFrame"] with is_list=True, but build_kb_info only handles Data, list[Data], or single DataFrame. Passing list[DataFrame] leaves df_source as a Python list, breaking validation and ingestion.

Apply this diff to support list[DataFrame] (and mixed Data/DataFrame lists defensively):

@@ async def build_kb_info(self) -> Data: - if isinstance(self.input_df, Data): - df_source: pd.DataFrame = self.input_df.to_dataframe() - elif isinstance(self.input_df, list) and all(isinstance(item, Data) for item in self.input_df): - # If input_df is a list of Data objects, concatenate them into a single DataFrame - df_source: pd.DataFrame = pd.concat([item.to_dataframe() for item in self.input_df], ignore_index=True) - else: - df_source: pd.DataFrame = self.input_df + if isinstance(self.input_df, Data): + df_source: pd.DataFrame = self.input_df.to_dataframe() + elif isinstance(self.input_df, list): + # Accept lists of Data or DataFrame (or a mix); normalize to DataFrame + frames: list[pd.DataFrame] = [] + for item in self.input_df: + if isinstance(item, Data): + frames.append(item.to_dataframe()) + elif isinstance(item, pd.DataFrame): + frames.append(item) + else: + msg = f"Unsupported input type in list: {type(item).__name__}. Expected Data or DataFrame." + raise ValueError(msg) + if not frames: + raise ValueError("Empty input list provided for 'input_df'.") + df_source = pd.concat(frames, ignore_index=True) + elif isinstance(self.input_df, pd.DataFrame): + df_source = self.input_df + else: + msg = f"Unsupported input type for 'input_df': {type(self.input_df).__name__}. Expected Data or DataFrame." + raise ValueError(msg)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

HandleInput(

name="input_df",

display_name="Data",

info="Table with all original columns (already chunked / processed).",

display_name="Input",

info=(

"Table with all original columns (already chunked / processed). "

"Accepts Data or DataFrame. If Data is provided, it is converted to a DataFrame automatically."

),

input_types=["Data", "DataFrame"],

is_list=True,

required=True,

async def build_kb_info(self) -> Data:

# … previous logic …

if isinstance(self.input_df, Data):

df_source: pd.DataFrame = self.input_df.to_dataframe()

elif isinstance(self.input_df, list):

# Accept lists of Data or DataFrame (or a mix); normalize to a single DataFrame

frames: list[pd.DataFrame] = []

for item in self.input_df:

if isinstance(item, Data):

frames.append(item.to_dataframe())

elif isinstance(item, pd.DataFrame):

frames.append(item)

else:

msg = (

f"Unsupported input type in list: {type(item).__name__}. "

"Expected Data or DataFrame."

)

raise ValueError(msg)

if not frames:

raise ValueError("Empty input list provided for 'input_df'.")

df_source = pd.concat(frames, ignore_index=True)

elif isinstance(self.input_df, pd.DataFrame):

df_source = self.input_df

else:

msg = (

f"Unsupported input type for 'input_df': {type(self.input_df).__name__}. "

"Expected Data or DataFrame."

)

raise ValueError(msg)

# … subsequent logic …

🤖 Prompt for AI Agents

In src/backend/base/langflow/components/knowledge_bases/ingestion.py around lines 107-116, build_kb_info currently accepts ["Data","DataFrame"] with is_list=True but only handles list[Data] or single DataFrame; this misses list[DataFrame] and mixed lists causing df_source to remain a raw Python list. Fix by normalizing incoming values: if input is a list, iterate elements and classify each element as Data (leave as-is) or DataFrame (convert to Data by records dicts or collect DataFrames); if all elements are DataFrames, concat them into a single DataFrame and proceed down the single-DataFrame branch; otherwise convert DataFrame elements to Data and treat the whole list as list[Data] so existing list-Data handling works; add defensive type checks and clear error messages for unsupported element types.

edwinjosechittilappilly

Code LGTM

edwinjosechittilappilly · 2025-08-26T17:38:13Z

please check the Icons in template vs sidebar
@erichare

github-actions · 2025-08-26T17:39:58Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	6.47% (1680/25935)	3.51% (690/19625)	3.47% (194/5584)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
682	0 💤	0 ❌	0 🔥	11.501s ⏱️

codecov · 2025-08-26T17:40:17Z

Codecov Report

❌ Patch coverage is 62.50000% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-1.6.0@e423526). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...e/langflow/components/knowledge_bases/retrieval.py	41.66%	7 Missing ⚠️
...e/langflow/components/knowledge_bases/ingestion.py	83.33%	2 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff                @@
##             release-1.6.0    #9543   +/-   ##
================================================
  Coverage                 ?   34.68%           
================================================
  Files                    ?     1209           
  Lines                    ?    57112           
  Branches                 ?     5419           
================================================
  Hits                     ?    19812           
  Misses                   ?    37156           
  Partials                 ?      144

Flag	Coverage Δ
backend	`56.19% <62.50%> (?)`
frontend	`5.81% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...gflow/base/knowledge_bases/knowledge_base_utils.py	`90.76% <ø> (ø)`
src/frontend/src/utils/styleUtils.ts	`49.09% <ø> (ø)`
...e/langflow/components/knowledge_bases/ingestion.py	`75.95% <83.33%> (ø)`
...e/langflow/components/knowledge_bases/retrieval.py	`73.98% <41.66%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sonarqubecloud · 2025-08-27T15:01:51Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.6% Duplication on New Code

See analysis details on SonarQube Cloud

…ent (#9415) * 🐛 (dataframe_operations.py): Fix bug in DataFrameOperationsComponent where "not contains" filter option was missing, causing incorrect filtering behavior. * [autofix.ci] apply automated fixes * Update pyproject versions * fix: Avoid namespace collision for Astra (#9544) * fix: Avoid namespace collision for Astra * [autofix.ci] apply automated fixes * Update Vector Store RAG.json * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Revert to a working composio release for module import (#9569) fix: revert to stable composio version * fix: Knowledge base component refactor (#9543) * fix: Knowledge base component refactor * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update styleUtils.ts * Update ingestion.py * [autofix.ci] apply automated fixes * Fix ingestion of df * [autofix.ci] apply automated fixes * Update Knowledge Ingestion.json * Fix one failing test * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * Revert composio versions for CI * Revert "Revert composio versions for CI" This reverts commit 9bcb694. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]> Co-authored-by: Carlos Coelho <[email protected]> * fix: Fix env file handling in Windows build scripts (#9414) fix .env load on windows script Co-authored-by: Ítalo Johnny <[email protected]> * fix: update agent_llm display name to "Model Provider" in AgentComponent (#9564) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * 📝 (test_mcp_util.py): add a check to skip test if DeepWiki server is rate limiting requests to avoid false test failures * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Jordan Frazier <[email protected]> Co-authored-by: Eric Hare <[email protected]> Co-authored-by: Edwin Jose <[email protected]> Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: Ítalo Johnny <[email protected]>

* Update pyproject versions * fix: Avoid namespace collision for Astra (#9544) * fix: Avoid namespace collision for Astra * [autofix.ci] apply automated fixes * Update Vector Store RAG.json * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Revert to a working composio release for module import (#9569) fix: revert to stable composio version * fix: Knowledge base component refactor (#9543) * fix: Knowledge base component refactor * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update styleUtils.ts * Update ingestion.py * [autofix.ci] apply automated fixes * Fix ingestion of df * [autofix.ci] apply automated fixes * Update Knowledge Ingestion.json * Fix one failing test * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * Revert composio versions for CI * Revert "Revert composio versions for CI" This reverts commit 9bcb694. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]> Co-authored-by: Carlos Coelho <[email protected]> * fix: Fix env file handling in Windows build scripts (#9414) fix .env load on windows script Co-authored-by: Ítalo Johnny <[email protected]> * fix: update agent_llm display name to "Model Provider" in AgentComponent (#9564) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: use custom file handler on chat view, disable mcp_composer by default (#9550) * Use custom voice assistant on chat input * Changed mcp composer to default disabled --------- Co-authored-by: Carlos Coelho <[email protected]> * fix: Use the newest file component in Vector Store RAG Template (#9571) fix: Use newest file component in RAG * fix: AI/ML icon is missing (#9553) * refactor: clean up imports and improve code readability in AIML and FlowSidebar components - Organized import statements in aiml.py and index.tsx for better structure. - Enhanced formatting in aiml.py for the update_build_config method. - Updated nodeIconToDisplayIconMap in styleUtils.ts for consistency in AIML label. - Removed unnecessary console log in FlowSidebarComponent for cleaner code. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Coelho <[email protected]> * fix: Allow updates to the file component in templates using it (#9572) fix: Allow updates to file component in templates * fix: Fixes filtering so legacy components aren't shown by default (#9575) fix filtering so legacy components aren't shown by default * fix: changed name on tool mode to slug, added close button to sidebar (#9589) * Changed Name to Slug, added Close button * Updated data test id * Tested closing the sidebar * fixed test * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: enhance scroll behavior on playground (#9586) * Added stick to bottom dependency * Removed scroll direction dependency * Added scroll to bottom action in voice assistant and chat input * Made messages occupy full width * Changed chat view to use StickToBottom instead of previous scroll handling mechanism * Delete unused chat scroll anchor * Set initial as instant * Update session name styling * fix: delete duplicate Serper api from google bundle (#9601) Deleted google serper api core * fix: allow deletion of mcp servers, add tests for mcp sidebar (#9587) * Added onDelete prop to sidebarDraggableComponent * Removed unused props * Added handleDeleteMcpServer * Add tests for on delete functionality, fixed linting errors * Format * Add test on mcp-server to test adding and deleting mcp server from sidebar * Adds data test id to delete select item * Changed data test id to not be mcp related * Added delete confirmation modal to mcp sidebar group * Changed test to contain modal * fix: change zoom in and out limit, create tests for zooming in and out, change zoom out logic in canvasControls (#9595) * Fix zoom out to 0.6 instead of 1.0 * remove min zoom on canvascontrolsdropdown, since it's enforced by reactflow * Changed min zoom to 0.25 and max zoom to 2.0 * Added tests for zoom in and zoom out in canvas controls dropdown * fix: Add localStorage persistence for feature toggles (#9597) * fix: Add help text to Lock Flow option (#9600) * fix: Add comprehensive tests and improve minimal condition logic (#9611) * fix: change icon color for mcp, remove color setting of icons (#9594) * Changed node icon to not have icon color * Added portion of test that checks if color is right for mcp component * Refactor nodeIcon * removed lucideIcon check for performance * Changed the test to evaluate color from computed style * fix: remove unsupported styling options from chats components (#9610) * fix: disable mcp auto install for not installed applications, refactor mcp-projects logic (#9599) * Add new available field to installed mcps * Disable auto install field when program not present * Refactor logic and get back the Available field for the installed * Added tooltip * Fixed linting * fix: Properly allow the non-specification of an OCR Engine (#9617) * fix: Properly allow no OCR engine * [autofix.ci] apply automated fixes * Set default to easyocr * Update docling_inline.py * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Support objects with data attribute in body processing (#9644) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Add comprehensive message sorting + tests (#9641) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Fix audio recording resource cleanup (#9623) * fix: Add voice mode availability detection (#9621) * fix: Remove formatting from agent input text content (#9638) * fix: added most important types at the beginning of the extensions array on File component (#9639) * Changed file type order * Changed starter projects that had the file component * order tooltip types alphabetically * changed order of text_file_types * Removed duplicate types * Changed starter projects that used past types * changed test * Fixed data test id * Changed test to expect correct types * fix: Include flow ID in webhook URLs (#9624) * fix(logger): add optional cache to configure; update caching behavior (#9532) * fix: update logger configuration to use environment variable for log level * fix: remove default log level configuration and set logger initialization * fix: enhance logger configuration to prevent redundant setup and improve cache handling * fix: improve cache handling in logger configuration to prevent unintended defaults * fix: enhance logger configuration to prevent redundant setup and improve early-exit logic * fix: remove defensive comment in logger configuration for clarity --------- Co-authored-by: Jordan Frazier <[email protected]> * fix: Update sidebar border styles (#9625) style fix * fix: Remove top padding from sidebar groups (#9636) Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: disable message editing on playground, fix new session not persisting after message is sent (#9662) * update session storage in the same useeffect as the refetchSessions * updated to send just the chat id * added useGetFlowId custom hook * updated places to use new currentFlowId hook * updated to use new id, to edit the message in the api and to set the flowId in the message * Restore current flow id from chat view * put on cell value changed only if it exists to enable read-only tables * removed call to backend when updating messages on playground * disable editing session view when on playground page * delete unused props, show edit only when not in playground * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: disable elevate edges on node select (#9658) disable elevate edges on select * fix: Properly respect the order parameter for Message History (#9605) * fix: Respect the order parameter for Message History * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Return multi-row dataframe when Structured Output data supports it (#9659) * fix: Return multi-row dataframe output in SO * [autofix.ci] apply automated fixes * Tool support in message schema --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: apply to fields in settings page (#9602) * fix: Segmented Sidebar switch to search on value change (#9615) * search icon selection behavior * switch to search on input change * unit test fix * test fix * update test * ✨ (frontend): add mock modules for @jsonquerylang/jsonquery and vanilla-jsoneditor packages 📝 (frontend): update test file to improve robustness and add debugging information for CI environment * [autofix.ci] apply automated fixes --------- Co-authored-by: cristhianzl <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: deprecate claude 3 sonnet model (#9622) * fix: Properly import Langchain ToolMessage for Message options (#9675) * fix: Properly import the ToolMessage from langchain * Update logger.py * Fix one line output * fix: fixed user settings test (#9690) Fixed userSettings test * fix: Remove warning log for unset TRACELOOP_API_KEY in configuration validation (#9536) fix: remove warning log for unset TRACELOOP_API_KEY in configuration validation * fix: knowledge base fixes for 1.6 pointing to release branch (#9683) * refactor: Improve code readability and organization in Knowledge Ingestion component - Reorganized import statements for better clarity. - Enhanced formatting of lists and function parameters for improved readability. - Removed unused parameters and streamlined the column configuration in the Knowledge Bases tab. - Updated JSON configuration for Knowledge Ingestion to reflect changes in code structure. These changes aim to enhance maintainability and readability of the codebase. * fix: Remove extraneous flag from package-lock and update column configuration in knowledge base - Removed the extraneous flag from the `@clack/prompts` dependency in `package-lock.json`. - Updated the `editable` property in the knowledge base columns configuration to `false`, enhancing the integrity of the data structure. * refactor: Update FlowToolbar and related components for improved API modal handling - Refactored FlowToolbar to replace openCodeModal with openApiModal for better clarity in modal management. - Updated FlowToolbarOptions to accept openApiModal and setOpenApiModal props, enhancing the component's flexibility. - Adjusted PublishDropdown to utilize the new API modal state, ensuring consistent behavior across the toolbar. - Cleaned up import statements for better organization and readability. * refactor: Clean up imports and streamline knowledge base column configuration - Reorganized import statements in KnowledgeBasesTab and knowledgeBaseColumns for improved clarity and consistency. - Removed unused parameters from the createKnowledgeBaseColumns function, simplifying its signature. - Adjusted column flex properties for better layout in the knowledge base table. - Enhanced overall readability and maintainability of the codebase. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: remove github link on discord button (#9655) * Fixed discord opening github * [autofix.ci] apply automated fixes * fixed mcp server tab test * Fixed flakyness on files test * fixed flaky file upload * Try to fix file upload component test --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Coelho <[email protected]> * fix: remove python code component, fix placeholder not appearing (#9660) * remove python code component and experimental category * Refactor code area modal test to use custom component * fixed placeholder not returning default value * fix: add margins to <p> tag in markdown (#9656) * Added margins to message paragraph * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: delete unused components, delete [deprecated] tag on the component title, add Replace and legacy tag functionality to components (#9645) * Remove deprecated components * Removed deprecated tags * removed mcp deprecated components * Add new applyComponentFilter * add replacement to node parameters * added filtercomponent to flow store * Added replacement to api class type * Made sidebar filter component more modular * remove unused props and pass props to filtercomponent * Apply component filters and get name and description for filter * Add resetting to handle and page * Added types to sidebar header * Added legacy and replacement to node description, activate component filter on click * updated sidebar header test * format test * update sidebar filter component test to match current behavior * Refactor to allow multiple replacements * removed legacy from node description * added dismissed legacy nodes * removed unused props * add node legacy component * changed replacement type to list * Instantiate nodelegacycomponent on generic node when component is legacy * Add components filtering in nodelegacycomponent * added replacement instead of display name * Added legacy tag to component name * Add replacement to some components * Added replacements to majority of legacy components * Made component name not be capitalized * fixed bundles not appearing at component filter * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * added replacement for crew ai --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Coelho <[email protected]> * fix: Ensure correct Docling Remote URL for API (#9708) fix: Correct url for docling remote * feat: remove agent dual output (#9700) * remove agent dual output * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * refactor: Agent component enhancements for release v1.6 (#9685) * refactor: improve code structure and add NodeDrawer component - Refactored import statements for better organization in agent.py and dropdownComponent. - Enhanced the AgentComponent's description and memory_inputs formatting for clarity. - Introduced a new NodeDrawer component for improved UI handling in the dropdown. - Updated Dropdown component to integrate NodeDrawer functionality, allowing for side panel interactions. * refactor: simplify NodeDrawer component and enhance Dropdown integration - Removed unnecessary props from NodeDrawer, streamlining its interface. - Updated the Dropdown component to improve the integration of NodeDrawer, ensuring better handling of side panel interactions. - Refactored the NodeDrawer's structure for improved readability and maintainability. * fix * refactor: enhance Dropdown and input components with externalOptions support - Updated Dropdown and related components to incorporate externalOptions for improved flexibility. - Refactored input classes to maintain consistent formatting and readability. - Removed deprecated dialogInputs functionality in favor of the new externalOptions structure. * fix: reorganize imports after cherry-pick resolution * refactor: enhance Dropdown component with loading state and source options - Introduced a loading state to the Dropdown component to indicate when a response is awaited. - Updated the logic to utilize sourceOptions instead of dialogInputs for better clarity and maintainability. - Refactored the rendering of options and associated UI elements to improve user experience. * refactor: improve Dropdown component structure and styling - Cleaned up import statements for better organization. - Enhanced the loading state display and adjusted the layout for better user experience. - Updated styling for CommandItem components to ensure consistent padding and font weight. - Refactored option rendering logic for improved clarity and maintainability. * refactor: reorganize imports and adjust Dropdown component behavior - Moved import statements for better clarity and organization. - Commented out the setOpen function call to modify Dropdown behavior when dialog inputs are present. * refactor: enhance Dropdown component functionality and logging - Removed unnecessary console log for source options. - Introduced handleSourceOptions function to streamline value handling and state management. - Updated onSelect logic to utilize handleSourceOptions for improved clarity and functionality. * refactor: enhance Dropdown component with flow store integration - Added useFlowStore to manage node state within the Dropdown component. - Introduced a new handleSourceOptions function to streamline value handling and API interaction. - Updated onSelect logic to ensure proper value handling when selecting options. * refactor: Update agent component to support custom model connections - Changed the agent component's dropdown input to allow selection of "connect_other_models" for custom model integration. - Enhanced the dropdown options and metadata for better user guidance. - Updated the build configuration to reflect these changes and ensure proper input handling. * refactor: Reorganize imports and enhance dropdown component logic - Moved and re-imported necessary dependencies for clarity. - Updated dropdown rendering logic to improve handling of selected values and loading states. - Ensured compatibility with agent component requirements by refining option checks. * small fix and revert * refactor: Clean up imports and improve dropdown component styling - Removed duplicate imports for PopoverAnchor and Fuse. - Simplified class names in the dropdown component for better readability. - Adjusted layout properties for improved visual consistency. * refactor: Enhance dropdown component functionality and clean up imports - Reorganized imports for better clarity and removed duplicates. - Implemented a new feature to handle "connect_other_models" option, improving the dropdown's interaction with flow store and types store. - Added logic to manage input types and display compatible handles, enhancing user experience. - Updated utility functions for better integration with the dropdown component. * style: format options_metadata in agent component * refactor: Update import statements in starter project JSON files and adjust proxy settings in frontend configuration - Refactored import statements in multiple starter project JSON files to improve readability by breaking long lines. - Changed proxy settings from "http://localhost:7860" to "http://127.0.0.1:7860" in frontend configuration files for consistency and to avoid potential issues with localhost resolution. * [autofix.ci] apply automated fixes * revert and fix * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fixed dropdown * [autofix.ci] apply automated fixes * kb clean up * [autofix.ci] apply automated fixes (attempt 2/3) * update to templates with display name change --------- Co-authored-by: Edwin Jose <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: mcp composer integration (#9506) * encrypt oauth auth settings at rest * [autofix.ci] apply automated fixes * Fix rebase changes and add env to env server config * Correctly unmask secretstr before encryption * update mcp-composer args * [autofix.ci] apply automated fixes * ruff * ruff * ruff * [autofix.ci] apply automated fixes * ruff * catch invalidtoken error * ruff * [autofix.ci] apply automated fixes * ruff * [autofix.ci] apply automated fixes * ruff * ruff * [autofix.ci] apply automated fixes * ruff * [autofix.ci] apply automated fixes * fix test * Add initial mcp composer service and startup * remove token url * Register server on project creation * WARN: fall back to superuser on no auth params, to allow mcp-composer to connect. also fixes race condition in server creatoin * update sse url args * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Add langflow api keys to the server configs * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * add port searching * [autofix.ci] apply automated fixes * Fix for dead servers - use devnull on subprocess to avoid pipe from filling up * uvlock * [autofix.ci] apply automated fixes * Update composer startup behavior re: auth settings * [autofix.ci] apply automated fixes * fix some auth logic, add dynamic fetch of new url * Clean up sse-url parameters * [autofix.ci] apply automated fixes * Only call composer url when composer is enabled * [autofix.ci] apply automated fixes * improve shutdown * starter projects update * [autofix.ci] apply automated fixes * update logging git push * revert hack to auth mcp composer * [autofix.ci] apply automated fixes * Fix 500 on composer-url query * [autofix.ci] apply automated fixes * Update feature flag; update api key addition to aut-install * [autofix.ci] apply automated fixes * Fix composer url and re-add auth * Changed needs_api_key logic * Refactor use-get-composer-url * remove python fallback for now, then pipe stderr to pipe * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Changed api key logic to allow connection if not api key and auto login is off * fix oauth addition to cmd * restart server when auth values change * Restart server on oauth values changes * [autofix.ci] apply automated fixes * Changed project port to be the same as OAuth port * Changed endpoint to provide port_available * add is_port_available prop * Added port_available to request * Edit mutation to not have linting errors * Added port not available state to authentication * [autofix.ci] apply automated fixes * Added port and host to get composer url * Invalidate project composer url queries * Changed to display port and host that is not running * Cleanup old port finding and some mypy fixes * Add print, remove unused env var * Use mcp-composer directly in client and a lot of fixes * changed starter projects * refactor mcp_projects to use always IP generated for WSL * changed to check args -4 too on installed servers * changed to just check if sse url is in args * added member servers in gitignore * add check for ff * Handle secret request response cycle securely and add better logging * Use asycn logger * Add decorator to check if composer is enabled in settings * more logging changes * Much better handling of existing oauth servers when the flag is disabled on restart * Reset oauth projects to apikey or none when composer flag is disabled * fix url for api key auth * Fix auth check; set project auth to api key when auto login disabled * Ruff, comments, cleanup * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Consolidate the auth handling since its used in two endpoints * [autofix.ci] apply automated fixes * Ruff * [autofix.ci] apply automated fixes * last ruff * Update FE env var naming and dont unnecessarily decrypt auth settings at times * update feature flag usage - remove mcp composer * [autofix.ci] apply automated fixes * Update timeout methods to have more reliable startup * more feature flag changes * Attempt to extract helpful user messages * [autofix.ci] apply automated fixes * Added loading on mcp server tab auth * Changed to load on start too * cleanup mcp composer on project deletion * [autofix.ci] apply automated fixes * remove nested retry mech * Ruff * lint * Fix unit tests * [autofix.ci] apply automated fixes * ruff * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]> Co-authored-by: Lucas Oliveira <[email protected]> Co-authored-by: Mike Fortman <[email protected]> * fix: Adjust padding and layout for improved spacing (#9711) Co-authored-by: Carlos Coelho <[email protected]> * fix: remove Groq from Agents model provider list (#9616) Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: Deon Sanchez <[email protected]> * fix: conditional scheduling logic to not run branch (#9722) * Use separate conditional router flag to check if-else branch execution * clean comments * [autofix.ci] apply automated fixes * Ruff --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: disable keys when flow is locked (#9726) * Disable keys when isLocked * Disabled adding and deleting nodes when flow is locked * fix: superuser credential handling and AUTO_LOGIN security (#9542) * Refactor superuser credential handling for security * [autofix.ci] apply automated fixes * refactor: enhance superuser credential handling in setup process (#9574) * [autofix.ci] apply automated fixes * refactor: streamline superuser flow tests and enhance error handling (#9577) * [autofix.ci] apply automated fixes * None Super user is not allowed hence for a valid string resetting it to * None Super user is not allowed hence for a valid string resetting it to "" * use secret str for password everywhere * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Jordan Frazier <[email protected]> * chore: Update version to 1.6.0 in package files (#9746) fix: update version to 1.6.0 in package.json and package-lock.json * fix: update logs position to be absolute (#9724) * removed left auto from log canvas controls * made initial position be fetched from event for notes * added relative class and put shadow box outside of the div that contains reactflow --------- Co-authored-by: Carlos Coelho <[email protected]> * fix: make entire playground chat area be clickable (#9721) * Add stop propagation to chat input buttons * Made entire area focus chat when clicked * fix: Restore old description from file description. (#9752) fix: Restore file component description * fix: Preserve flows and components during project updates (#9750) * 📝 (projects.py): add logic to separate flows and components from a single query result for better organization and readability 🐛 (projects.py): fix logic to correctly exclude project flows from the list of excluded flows * ✨ (test_projects.py): add tests to ensure project renaming preserves associated flows and components * 📝 (projects.py): remove unnecessary comment to improve code readability and maintainability * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix(langwatch): prevent trace errors with proper API key validation (#9720) * fix(URLComponent): filter out `None` in headers to avoid silent serialization errors (#9596) fix: Filter out None values from headers in URLComponent * fix: put knowledge bases under feature flag (#9749) Added enable_knowledge_bases feature flag everywhere it's been used * refactor: Padding misaligned for sidebar icons and other issues fix for 1.6 (#9713) * sidebar fixes * [autofix.ci] apply automated fixes * refactor: update FlowMenu and CanvasControlsDropdown styles, enhance MemoizedCanvasControls with flow lock status * feat: add 'Sticky Notes' functionality to sidebar and enhance note handling - Introduced a new 'add_note' section in the sidebar for adding sticky notes. - Implemented event listeners to manage the add-note flow, allowing for better integration with the sidebar. - Updated styles and structure in various components to accommodate the new feature. - Refactored existing components for improved organization and readability. * fix: adjust button height in FlowSidebarComponent for improved UI consistency * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Updated Agent Starter Projects with new Templates (#9772) * updates templates * fix formating * fix: remove agents from skipped components list in setup (#9785) * fix: improve error handling for missing OCR dependencies (#9753) * fix(docling): improve error handling for missing OCR dependencies - Add DoclingDependencyError custom exception - Detect specific OCR engine installation issues - Provide clear installation instructions to users - Suggest OCR disable as alternative solution - Fail fast when dependencies are missing Fixes issue where users received unclear error messages when OCR engines like ocrmac, easyocr, tesserocr, or rapidocr were not properly installed. * fix: prevent missing clean_data attribute error * chore: update starter_projects * [autofix.ci] apply automated fixes * refactor(docling): update dependency error messages to use uv and suggest complete install Address code review feedback by using 'uv pip install' and offering langflow[docling] as alternative * refactor(docling): simplify worker error handling per code review --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: adjust casing on Add MCP Server buttons (#9774) * Added span to buttons to not remove casing * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: set gpt4.1 as default model (#9780) * feat: double clicking a component will add it to the canvas (#9730) * fix * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: release branch tests (#9820) * Fix auth tests * [autofix.ci] apply automated fixes * Modularized lock flow * Added more assertions * t Fixed flow lock test * Fixed sticky notes test * Removed warning on mcpsidebargroup test * fixed warnings on sidebar footer buttons * fixed sidebar footer buttons and test * Re-added padding * Fixed sidebar segmented nav test * fixed searchconfigtrigger * Fixed sidebar header test * fixed sidebar items list test * Fixed flow lock test * fixed sticky notes test * Fixed sidebarfooterbuttons test * Revert fix * Fixed test_refresh_starter_projects * attempt to fix fe merge conflicts * [autofix.ci] apply automated fixes * lint * [autofix.ci] apply automated fixes * mypy * Improve sorting of sender messages * ruff * starter projects --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Lucas Oliveira <[email protected]> * fix: Properly serialize documents for Graph RAG in Astra (#9777) fix: Serialize documents for graph RAG * fix: Standardize content dict format for LLM provider compatibility (#9745) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: Jordan Frazier <[email protected]> Co-authored-by: Lucas Oliveira <[email protected]> * fix: make components added with filter come with the output pre-selected, fix Agent filtering (#9787) * Added output selection if there is a filter and the output exists * Fixed wrong pseudo source handle * Removed handle dragging --------- Co-authored-by: Cristhian Zanforlin Lousa <[email protected]> Co-authored-by: Jordan Frazier <[email protected]> * fix: make mcp tools refresh when changing server with the same name (#9778) * Delete cache when editing servers * Add check to refresh tools if the cache is changed from the current tools * Updated mcp to only delete from cache if it exists there * Added test to tools refresh functionality * remove tools count from updated server * [autofix.ci] apply automated fixes --------- Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Jordan Frazier <[email protected]> * fix: remove extra spaces from playground, added separator (#9779) Adjusted margins for items in playground * fix: Update message schema tests for image_url structure (#9823) * 📝 (test_schema_message.py): update image content type to "image_url" for consistency and clarity in message schema ♻️ (test_schema_message.py): refactor image URL handling to improve readability and maintainability of the code * [autofix.ci] apply automated fixes * fixed mcp sidebar * revert sidebar change * Fixed test --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Lucas Oliveira <[email protected]> * fix: Restore Embedding Model Connections in Vector Store RAG (#9776) fix: Restore Embedding model connections * fix: update CORS configuration and add env vars to allow for user control (#9744) * Update CORS configuration and add env vars to allow for user control * Add unit test * fix: add noqa comment for linting in create_refresh_token function Added a noqa comment to suppress linting warnings for the token type check in the create_refresh_token function, ensuring code clarity while maintaining compliance with linting standards. * refactor: enhance CORS tests with temporary directory usage and improve exception handling Updated unit tests for CORS configuration to utilize a temporary directory for environment variables, ensuring isolation during tests. Improved exception handling in refresh token tests to raise HTTPException with appropriate status codes and messages. This enhances test reliability and clarity. * fix: enhance CORS origin validation to ensure consistent list output Updated the validate_cors_origins method to convert a single origin string into a list for consistency, in addition to handling comma-separated strings. This change improves the robustness of CORS configuration handling. * refactor: update CORS test cases to ensure single origin is consistently treated as a list Modified unit tests for CORS configuration to assert that single origin strings are converted to lists, enhancing consistency in CORS origin handling. This change aligns with recent updates to the CORS validation logic. * re-add uvlock * [autofix.ci] apply automated fixes * feat: revert CORS changes and add deprecation warnings for v1.7 - Revert CORS behavior to maintain backward compatibility until v1.7 - Add CORS configuration fields with current defaults (*, True, *, *) - Implement deprecation warnings for upcoming v1.7 security changes - Preserve existing behavior while enabling early configuration adoption * [autofix.ci] apply automated fixes * Add back imports * Fix tests * [autofix.ci] apply automated fixes * Rt;uff * Ruff --------- Co-authored-by: Gabriel Luiz Freitas Almeida <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: italojohnny <[email protected]> Co-authored-by: Lucas Oliveira <[email protected]> * fix: handle pandas Series in get_message boolean evaluation (#9765) * fix: handle pandas Series in get_message boolean evaluation Resolves ValueError when message is pandas Series by checking .empty instead of relying on ambiguous truth value evaluation. * fix: mypy error * fix: mypy error * [autofix.ci] apply automated fixes --------- Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Add VLM support for Docling and improve deps (#9757) * fix: Support the VLM pipeline in docling * fix: Add VLM support and opencv dep * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update comments and fix ruff errors * [autofix.ci] apply automated fixes * Hide OCR engine when VLM selected * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Fix VLM pipeline * [autofix.ci] apply automated fixes * Add type annotation for visited and excluded * [autofix.ci] apply automated fixes * Small fix for the templates * Update Vector Store RAG.json --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Jordan Frazier <[email protected]> * fix: make cursor position not go to end on input list component (#9782) * added new cursor input with cursor handling to input list component * Change to use cursor input on input list component * Test cursor position preservation on inputlistcomponent --------- Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: Jordan Frazier <[email protected]> * fix: remove metadata building to speed up load times (#9819) Co-authored-by: Jordan Frazier <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: turn mcp composer feature on by default on frontend (#9831) fix mcp composer not on by default * fix: make astra db component not disconnect, fix handle color when type is undefined (#9829) * remove hidden from vector store * Added vector store color * Added fallback for when the connected color is not found * Changed starter projects * fixed null check * fix: Handle ImportError for parse_api_endpoint and remove debug logging (#9830) * fix: remove debug logging for processed modules in _process_single_module * fix: move parse_api_endpoint import inside try-except block to handle ImportError gracefully * fix: make session_id visible in Message History Retrieve mode (#9557) * fix: make session_id visible in Message History Retrieve mode - Added session_id to default_keys to make it visible by default - Added session_id to Retrieve mode config to show it when mode is selected - Kept advanced=True as requested to maintain field categorization - Fixes issue where session_id input was hidden preventing dynamic session retrieval Fixes #9464 * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * Fix api key detection for skip test * Oops. Remove print --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Lucas Oliveira <[email protected]> Co-authored-by: Jordan Frazier <[email protected]> Co-authored-by: Jordan Frazier <[email protected]> * feat: Enhance logging configuration with caller information and conditional callsite data (#9747) * fix: enhance logging configuration to include caller information and improve log level handling * fix: add caller information to logging configuration * fix: conditionally add callsite information to logging configuration based on DEV environment * fix: update DEV variable initialization to use environment variable * fix: update logging configuration to include callsite fields in DEV environment * mypy * [autofix.ci] apply automated fixes * mypy --------- Co-authored-by: Jordan Frazier <[email protected]> Co-authored-by: Jordan Frazier <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * ref: update auto login behavior to use secure defaults (#9825) * Update auto login behavior to use secure defaults; remove skip auth option * Add warning back * ordering of params * skip unimplemented cors tests * fix: make mcp tools update when auth settings is null (#9844) * allow trailing and starting _ * Fixed selection * fixed auth settings being called when it's null * fix: adjust tools title and description padding (#9847) * Fixed padding on tools table * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Apply a per-user uniqueness for mcp (#9840) * fix: Apply a per-user uniqueness for mcp * Test updates * [autofix.ci] apply automated fixes * Fix tests * Update mcp.py * Update 1cb603706752_modify_uniqueness_constraint_on_file_.py * Update 1cb603706752_modify_uniqueness_constraint_on_file_.py * Update 1cb603706752_modify_uniqueness_constraint_on_file_.py * Update 1cb603706752_modify_uniqueness_constraint_on_file_.py --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: make file button clickable after focusing on chat (#9863) Fixed file button unclickable * fix: make input clickable on chat (#9864) Fixed not able to select text * test: Increase loading performance test timeout (#9873) * Revert "ref: update auto login behavior to use secure defaults (#9825)" This reverts commit 567d0fa. * refactor: update skip_auth_auto_login behavior and update messaging timelines for removal (#9874) * Update auto login warnings to reflect new timelines * Add regex match to test * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Remove the uniqueness constraint on file names (#9872) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Move mcp composer service to lfx * MCP Composer service factory fixes * Add back script changes for lfx * Clean up mcp project imports * Update d37bc4322900_drop_single_constraint_on_files_name_.py * Fix mcp imports * Update CORS deprecation notice * Add back the external options in dropdown input * Add back starter projects * Remove old components/data init file and update starter projects * remove todo * Re-adds the clean_data param for chat output * [autofix.ci] apply automated fixes * Update starter projects with clean_data * Update oauth timeouts and fix socket binding check * Fix import and main duplicate code * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Import and test fixes * [autofix.ci] apply automated fixes * Temporarily skipping knowledge bases import issue to allow other CI tests to run * Add python function to lfx test * fixed typesStore test * Ruff and image unit test fixes * [autofix.ci] apply automated fixes * 📝 (image.py): Update function documentation to reflect changes in the content dictionary structure ♻️ (test_schema_data.py): Refactor tests to use "image" type instead of "image_url" and remove unnecessary checks for "source_type" in the content dictionary * Knowledge bases update for merge * Update __init__.py * Revert the changes to package lock * Fix kb tests to use client fixture instead of mocks * Clean up mcp init task * Fix other kb tests * fixed text file typs with repeated data * [autofix.ci] apply automated fixes --------- Co-authored-by: Eric Hare <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]> Co-authored-by: Carlos Coelho <[email protected]> Co-authored-by: Cristhian Zanforlin Lousa <[email protected]> Co-authored-by: Ítalo Johnny <[email protected]> Co-authored-by: Lucas Oliveira <[email protected]> Co-authored-by: Deon Sanchez <[email protected]> Co-authored-by: Mike Fortman <[email protected]> Co-authored-by: Gabriel Luiz Freitas Almeida <[email protected]> Co-authored-by: Lucas Oliveira <[email protected]> Co-authored-by: VICTOR CORREA GOMES <[email protected]>

fix: Knowledge base component refactor

2a0dabd

erichare requested review from edwinjosechittilappilly and rodrigosnader August 26, 2025 16:21

github-actions bot added the bug Something isn't working label Aug 26, 2025

[autofix.ci] apply automated fixes

78cd280