Skip to content

Conversation

@lucaseduoli
Copy link
Collaborator

@lucaseduoli lucaseduoli commented Sep 25, 2025

This pull request introduces major improvements to the Knowledge Base (KB) and Variable management systems, focusing on making KB vector store selection user-configurable via the database, enhancing metadata extraction, and extending the Variable API with category support. The changes increase flexibility for multi-user and multi-provider environments and improve API robustness.

Knowledge Base Vector Store and Metadata System Overhaul:

  • Introduced a new vector_store_factory module that builds KB vector store instances (e.g., Chroma, OpenSearch) based on per-user configuration stored in the database, using a protocol and adapter pattern for extensibility. This enables dynamic, user-specific backend selection.
  • Refactored KB metadata extraction to use the new factory and adapter system, allowing provider-specific metadata extraction and ensuring the correct vector store is used for each user. The get_kb_metadata function is now async and receives user_id to support this. [1] [2] [3] [4] [5]
  • Updated all usages of KB metadata extraction to pass the current user's ID, ensuring correct configuration is applied. [1] [2]

Variable API and Database Enhancements:

  • Added a new Alembic migration that introduces a non-nullable category column to the variable table, defaulting to "global". This supports categorizing variables (e.g., for KB configuration).
  • Extended the Variable API to allow filtering variables by category, including input validation against a list of valid categories. [1] [2]
  • Updated variable creation to support the new category field.

Dependency and Import Updates:

  • Removed direct imports of Chroma from files that now rely on the new vector store factory, ensuring all vector store interactions go through the new abstraction layer. [1] [2]

These changes lay the groundwork for supporting multiple KB backends per user and provide a more robust, extensible foundation for knowledge base and variable management.

Summary by CodeRabbit

  • New Features

    • Added Knowledge Base settings page to configure vector store (Chroma or OpenSearch) with per-user persistence.
    • Introduced variable categories and a new endpoint to fetch variables by category, plus a frontend hook to use it.
  • Improvements

    • Per-user Knowledge Base vector stores with more robust ingestion, retrieval, and metadata extraction; metadata fetching is now async.
    • Faster LLM loading via configuration caching and support for user-specific overrides.
  • Database

    • Variables now support categories (auto-migrated; no action required).
  • Tests

    • Added comprehensive unit tests for KB settings, vector stores, metadata, and LLM loading.

- Introduced a new migration script to add a 'category' column to the 'variable' table.
- Updated the VariableBase model to include a 'category' field with a default value and validation against predefined categories.
- Enhanced the VariableCreate and VariableUpdate models to accommodate the new 'category' field.
- Added constants for category types and valid categories in the constants module.
- Introduced new module for loading chat models based on user settings.
- Implemented caching for initialized chat models to optimize performance.
- Added utility functions for saving and loading LLM settings from the database.
- Enhanced documentation for new functions to ensure clarity and usability.
- Added a new endpoint to retrieve variables by category, improving organization and access.
- Updated the DatabaseVariableService to handle category assignments for variables.
- Introduced utility methods for saving and retrieving LLM settings, allowing for better integration with user preferences.
- Enhanced validation for category input to ensure compliance with predefined categories.
- Added an asynchronous method to set the LLM based on user ID and session context.
- Introduced a new generate method to produce responses from the LLM, handling both string inputs and Message objects.
- Enhanced error handling for uninitialized LLM and unsupported input types.
- Created a new test file for unit tests related to LLM functionality.
- Introduced a new test file for unit tests related to loading LLM instances.
- Implemented tests for various scenarios including missing API keys, caching behavior, and concurrent calls.
- Added integration tests to verify model loading with real API keys for OpenAI and Anthropic providers.
- Ensured proper handling of settings and parameters during model initialization.
…ariable posting

- Introduced a new hook, `useGetVariablesByCategory`, to fetch variables based on the specified category, improving data organization and access.
- Updated the `usePostGlobalVariables` function to include an optional `category` parameter, allowing for more flexible variable management.
- Enhanced error handling and type definitions for better integration with Axios and React Query.
- Introduced a new LLMSettingsPage component for configuring Large Language Model settings.
- Updated the settings navigation to include a link to the new LLM settings page.
- Enhanced the settings page with form handling for provider, model, base URL, and API key, utilizing React Hook Form for state management.
- Implemented asynchronous saving of settings with error handling and user feedback through alerts.
- Added validation to ensure that required fields (provider, model, api_key) are present in the LLM settings.
- Introduced a clear error message for users when any of the required fields are missing, improving user experience and robustness of the service.
- Introduced `get_global_llm` method to fetch the global LLM instance asynchronously.
- Enhanced the component's ability to manage LLM initialization and retrieval, improving overall performance and usability.
…ition

- Corrected the down_revision identifier in the migration script to ensure proper migration order.
- Maintained consistency with the latest migration changes, enhancing the robustness of the migration process.
Eliminated the `_llm` attribute from the `Component` class to streamline the code and improve clarity. This change enhances maintainability by removing unnecessary components.
Introduced an abstract method `get_by_category` in the `VariableService` class to allow asynchronous retrieval of variables for a user based on a specified category. This addition enhances the service's functionality and supports better organization of variable management.
Added a new category, CATEGORY_KB, to the VALID_CATEGORIES list in constants.py to enhance the categorization of variables. This change improves the flexibility and organization of variable management within the service.
Updated the Alembic migration script to add a new 'category' column to the 'variable' table, ensuring it is non-nullable with a default value of 'global'. The migration also handles existing nullable 'category' columns by updating NULL values before altering the column to be non-nullable. This change enhances data integrity and categorization within the variable management system.
…ependencies

Removed several deprecated and unused dependencies from package-lock.json to streamline the project and improve maintainability. This cleanup enhances the overall health of the dependency tree and reduces potential security vulnerabilities.
…configuration

Refactored the settings page by removing the LLMSettingsPage and introducing the KBSettingsPage. Updated routing and UI elements to reflect the new Knowledge Base settings, enhancing the configuration options for users. This change improves the organization of settings related to Knowledge Bases and streamlines the user experience.
Added a new module for a database-driven vector store factory that allows the creation of vector store instances based on user-specific configurations. This includes the implementation of a protocol for vector stores, an adapter for the LangChain Chroma vector store, and a mock OpenSearch vector store for testing purposes. The factory function `build_kb_vector_store` retrieves user configurations and builds the appropriate vector store instance, enhancing the flexibility and robustness of the knowledge base management system.
Introduced a new module containing abstract and concrete classes for metadata adapters that facilitate provider-agnostic metadata extraction from various vector store backends, including Chroma and OpenSearch. This enhancement provides a unified interface for metadata operations, improving the flexibility and robustness of knowledge base management. The implementation includes methods for retrieving documents, metadata, and embeddings, as well as provider-specific information.
… configuration

Refactored the `get_kb_metadata` function to be asynchronous and user-aware, allowing for enhanced metadata extraction from knowledge base directories. Integrated a new vector store creation process that utilizes user-specific configurations, improving the robustness and flexibility of metadata operations. This change also includes better error handling and a structured return of minimal metadata in case of failures.
Updated the KnowledgeIngestionComponent and KnowledgeRetrievalComponent to utilize a new database-driven vector store factory, enhancing user context handling during vector store creation and retrieval. This change improves error handling by ensuring user IDs are validated before operations, and replaces direct Chroma instantiation with a more flexible factory approach. The refactor aims to streamline vector store interactions and improve the robustness of knowledge base management.
Introduced new unit tests for knowledge bases, including tests for database-driven vector store factory, enhanced metadata adapters, and KB variable service integration. These tests improve coverage and ensure robust functionality across various components, enhancing the reliability of knowledge base management. The additions include tests for creating and retrieving metadata, handling different vector store providers, and validating KB variable structures.
Introduced new unit tests for knowledge bases, including tests for database-driven vector store factory, enhanced metadata adapters, and KB variable service integration. These tests improve coverage and ensure robust functionality across various components, enhancing the reliability of knowledge base management. The additions include tests for creating and retrieving metadata, handling different vector store providers, and validating KB variable structures.
…d configuration types

- Introduced `OpenSearchVectorStoreAdapter` to implement the `VectorStoreProtocol`, enabling seamless interaction with OpenSearch.
- Added `BaseKBConfig`, `ChromaKBConfig`, and `OpenSearchKBConfig` TypedDicts for structured configuration management.
- Updated `build_kb_vector_store` to return the new OpenSearch adapter and handle configuration more robustly.
- Removed the mock OpenSearch vector store, replacing it with a real implementation for improved functionality and testing.
- Enhanced error handling and documentation throughout the OpenSearch integration for better clarity and usability.
…ling

- Enhanced the `OpenSearchMetadataAdapter` to support both `get` and `search` methods for retrieving documents and metadata, improving flexibility.
- Updated error logging to use a consistent format for better clarity.
- Refactored `get_document_count` to handle different methods of counting documents, ensuring robustness.
- Adjusted docstrings for clarity and accuracy regarding embeddings retrieval support.
- Refactored the authentication configuration methods in the `OpenSearchVectorStoreComponent` to support both class-level and instance-level authentication setups.
- Improved the `_build_auth_kwargs` method to accept parameters directly, allowing for more flexible authentication configurations.
- Added a new instance method `_build_auth_kwargs_instance` to utilize instance attributes for authentication.
- Updated the `build_client` and `build_client_instance` methods to streamline client creation with enhanced authentication handling.
- Enhanced docstrings for clarity and completeness regarding authentication modes and parameters.
- Added a blank line in `test_llm_load.py` to enhance code readability and maintain consistent formatting throughout the test suite.
- Replaced instances of `MockOpenSearchVectorStore` with `OpenSearchVectorStoreAdapter` in unit tests for improved accuracy and consistency.
- Enhanced test cases to create mock OpenSearch clients and ensure proper functionality of the adapter.
- Updated assertions to reflect changes in the adapter's behavior, ensuring robust testing of metadata extraction and adapter creation.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@erichare erichare self-requested a review October 16, 2025 19:04
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025
@github-actions
Copy link
Contributor

Component index has been automatically updated due to changes in src/lfx/src/lfx/components/

New Index:
🧩 Components: 356
📁 Categories: 91

@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 0% with 324 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.00%. Comparing base (58dd11f) to head (54f8f19).

Files with missing lines Patch % Lines
.../src/lfx/base/knowledge_bases/metadata_adapters.py 0.00% 173 Missing ⚠️
...c/lfx/base/knowledge_bases/vector_store_factory.py 0.00% 151 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (39.45%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9981      +/-   ##
==========================================
- Coverage   30.16%   30.00%   -0.17%     
==========================================
  Files        1313     1315       +2     
  Lines       59259    59535     +276     
  Branches     8876     8909      +33     
==========================================
- Hits        17877    17865      -12     
- Misses      40565    40852     +287     
- Partials      817      818       +1     
Flag Coverage Δ
lfx 39.45% <0.00%> (-0.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rc/backend/base/langflow/api/v1/knowledge_bases.py 19.17% <ø> (+1.80%) ⬆️
src/backend/base/langflow/api/v1/variable.py 64.81% <ø> (-0.64%) ⬇️
...angflow/services/database/models/variable/model.py 100.00% <ø> (ø)
...rc/backend/base/langflow/services/variable/base.py 100.00% <ø> (ø)
...ckend/base/langflow/services/variable/constants.py 100.00% <ø> (ø)
...backend/base/langflow/services/variable/service.py 87.85% <ø> (-0.44%) ⬇️
...API/queries/variables/use-post-global-variables.ts 31.25% <ø> (ø)
src/frontend/src/customization/feature-flags.ts 100.00% <ø> (ø)
src/frontend/src/pages/SettingsPage/index.tsx 0.00% <ø> (ø)
src/frontend/src/routes.tsx 0.00% <ø> (ø)
... and 2 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE Don't Merge this PR enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants