-
Notifications
You must be signed in to change notification settings - Fork 8.2k
feat: Add support for Ingestion and Retrieval of Knowledge Bases #9088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…across components - Updated import statements to use consistent single quotes. - Refactored various components to enhance readability and maintainability. - Adjusted folder and file handling logic in the sidebar and file manager components. - Introduced a new tabbed interface for the files page to separate files and knowledge bases, improving user experience.
- Added a new FilesPage component to manage file uploads and organization. - Implemented a tabbed interface to separate Files and Knowledge Bases for improved user experience. - Created FilesTab and KnowledgeBasesTab components for handling respective functionalities. - Refactored routing to accommodate the new structure and updated import statements for consistency. - Removed the old filesPage component to streamline the codebase.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce a new utility module for text relevance scoring (TF-IDF and BM25), add a knowledge base ingestion component for processing tabular data with embeddings and vector store creation, and update the data components package to export this new ingestion component. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant KBIngestionComponent
participant DataFrame
participant EmbeddingProvider
participant VectorStore
participant Disk
User->>KBIngestionComponent: Provide DataFrame, column config, embedding params
KBIngestionComponent->>DataFrame: Validate columns
KBIngestionComponent->>EmbeddingProvider: Generate embeddings for selected columns
EmbeddingProvider-->>KBIngestionComponent: Return embeddings
KBIngestionComponent->>VectorStore: Create vector store from DataFrame rows and embeddings
VectorStore-->>KBIngestionComponent: Vector store created
KBIngestionComponent->>Disk: Save KB files (parquet, JSON, vector store)
KBIngestionComponent-->>User: Return KB metadata and status message
sequenceDiagram
participant User
participant knowledgebase_utils
participant Documents
User->>knowledgebase_utils: Call compute_tfidf or compute_bm25 with documents and query terms
knowledgebase_utils->>Documents: Tokenize and analyze documents
knowledgebase_utils-->>User: Return relevance scores
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
…edge-bases`) Here's an optimized rewrite to make your `compute_tfidf` function notably faster, primarily by. 1. **Avoiding Repeated `.lower()` Calls** - Lowercase query terms only once. 2. **Precomputing Query Terms Set** - Improves "in" checks when computing document frequencies. 3. **Precomputing Document Frequencies in One Pass** - Avoids O(num_terms*num_docs) nested looping. 4. **Avoiding Repeated Counter Creation** - By pretokenizing and counting in the same loop. 5. **Reducing Division/Recalculation** - Only compute values once per doc/term. 6. **Reducing Function Call Overhead, Avoid Extra Appends**. Here's the optimized code, **with all original comments preserved** (except where logic changes). **Major improvements:** - Computing all doc frequencies in a single pass over the docs. - Lowercasing query terms only once. - Computing IDF for each query term once. - Using sets for fast existence checks. - No code logic is changed (outputs are identical for any input). Your most expensive lines (`df`, repeated lowercasing, Counter calls, IDF calc) are now much faster, and this solution scales far better for large inputs.
|
* refactor: Standardize import statements and improve code readability across components - Updated import statements to use consistent single quotes. - Refactored various components to enhance readability and maintainability. - Adjusted folder and file handling logic in the sidebar and file manager components. - Introduced a new tabbed interface for the files page to separate files and knowledge bases, improving user experience. * [autofix.ci] apply automated fixes * feat: Introduce new Files and Knowledge Bases page with tabbed interface - Added a new FilesPage component to manage file uploads and organization. - Implemented a tabbed interface to separate Files and Knowledge Bases for improved user experience. - Created FilesTab and KnowledgeBasesTab components for handling respective functionalities. - Refactored routing to accommodate the new structure and updated import statements for consistency. - Removed the old filesPage component to streamline the codebase. * Create knowledgebase_utils.py * Push initial ingest component * [autofix.ci] apply automated fixes * Create initial KB Ingestion component * [autofix.ci] apply automated fixes * Fix ruff check on utility functions * [autofix.ci] apply automated fixes * Some quick fixes * Update kb_ingest.py * [autofix.ci] apply automated fixes * First version of retrieval component * [autofix.ci] apply automated fixes * Update icon * Update kb_retrieval.py * [autofix.ci] apply automated fixes * Add knowledge bases feature with API integration and UI components * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Refactor imports and update routing paths for assets and main page components. Adjust tab handling in the assets page to reflect URL changes and improve user navigation experience. * [autofix.ci] apply automated fixes * Add CreateKnowledgeBaseButton, KnowledgeBaseEmptyState, and KnowledgeBaseSelectionOverlay components. Refactor KnowledgeBasesTab to utilize new components and improve UI for knowledge base management. Introduce utility functions for formatting numbers and average chunk sizes. * [autofix.ci] apply automated fixes * PoV: Add Parquet data retrieval to KBRetrievalComponent (#9097) * Add Parquet data retrieval to KBRetrievalComponent Introduces a new output to KBRetrievalComponent for returning knowledge base data by reading Parquet files. Updates dependencies to include fastparquet for Parquet support. * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Fix some ruff issues * [autofix.ci] apply automated fixes * feat: refactor file management and knowledge base components - Replaced the existing assetsPage with a new filesPage to better organize file management functionalities. - Introduced KnowledgePage to handle knowledge base operations, integrating KnowledgeBasesTab for displaying and managing knowledge bases. - Added various components for file and knowledge base management, including CreateKnowledgeBaseButton, KnowledgeBaseEmptyState, and drag-and-drop functionality. - Updated routing and imports to reflect the new structure and ensure consistency across the application. - Enhanced user experience with improved UI elements and state management for file selection and operations. * feat: implement delete confirmation modal for knowledge base deletion - Added a DeleteConfirmationModal component to confirm deletion actions. - Integrated the modal into the KnowledgeBasesTab for handling knowledge base deletions. - Updated column definitions to include a delete button for each knowledge base. - Enhanced user experience by ensuring deletion actions require confirmation. - Adjusted styles for the knowledge base table to improve checkbox visibility. * feat: enhance knowledge base metadata with embedding model detection - Added `embedding_model` field to `KnowledgeBaseInfo` for improved metadata tracking. - Implemented `detect_embedding_model` function to extract embedding model information from configuration files. - Updated `get_kb_metadata` to prioritize metadata extraction from `embedding_metadata.json`, falling back to detection if necessary. - Modified `KBIngestionComponent` to save embedding model metadata during ingestion. - Adjusted frontend components to display embedding model information in knowledge base queries and tables. * refactor: clean up tooltip and value getter comments in knowledge base columns - Removed redundant comments in the `knowledgeBaseColumns.tsx` file to enhance code clarity. - Simplified the tooltip and value getter functions for embedding model display. * [autofix.ci] apply automated fixes * refactor: simplify KnowledgeBaseSelectionOverlay component - Removed the unused onExport prop and its associated functionality. - Cleaned up code formatting for consistency and readability. - Updated success message strings to use single quotes for uniformity. * feat: implement bulk and single deletion for knowledge bases - Added `BulkDeleteRequest` model to handle bulk deletion requests. - Implemented `delete_knowledge_base` endpoint for single knowledge base deletion. - Created `delete_knowledge_bases_bulk` endpoint for deleting multiple knowledge bases at once. - Introduced `useDeleteKnowledgeBase` and `useDeleteKnowledgeBases` hooks for frontend integration. - Updated `KnowledgeBaseSelectionOverlay` and `KnowledgeBasesTab` components to utilize new deletion functionality with user feedback on success and error handling. * Initial support for vector search * feat: add KnowledgeBaseDrawer component for enhanced knowledge base details - Introduced `KnowledgeBaseDrawer` component to display detailed information about selected knowledge bases. - Integrated mock data for source files and linked flows, with a layout for displaying descriptions and embedding models. - Updated `KnowledgeBasesTab` to handle row clicks and open the drawer with relevant knowledge base data. - Enhanced `KnowledgePage` to manage drawer state and selected knowledge base, improving user interaction and experience. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Fix ruff checks * Update knowledge_bases.py * feat: update mock data and enhance drawer functionality in KnowledgeBase components - Replaced mock data in `KnowledgeBaseDrawer` with more descriptive placeholders. - Added a reference to the drawer in `KnowledgePage` for improved click handling. - Implemented logic to close the drawer when clicking outside, except for table row clicks. - Enhanced row click handling to toggle drawer state based on current visibility. * [autofix.ci] apply automated fixes * Append scores column to rows * refactor: improve knowledge base deletion and UI components - Updated `useDeleteKnowledgeBase` and `useDeleteKnowledgeBases` to enhance parameter naming for clarity. - Removed the `CreateKnowledgeBaseButton` component and its references to streamline the UI. - Simplified the `KnowledgeBaseDrawer` and `KnowledgeBasesTab` components by removing mock data and improving state management. - Enhanced the `KnowledgeBaseSelectionOverlay` to better handle bulk deletions and selection states. - Refactored various components for consistent styling and improved readability. * refactor: standardize import statements and improve code readability in SideBarFoldersButtonsComponent - Updated import statements to use consistent single quotes. - Refactored various function calls and state management for improved clarity. - Enhanced folder handling logic and UI interactions for better user experience. * feat: Add encryption for API keys in KB ingest and retrieval (#9129) Add encryption for API keys in KB ingest and retrieval Introduces secure storage of embedding model API keys by encrypting them during knowledge base ingestion and decrypting them during retrieval. Refactors metadata handling to include encrypted API keys, updates retrieval to support decryption and dynamic embedder construction, and improves logging for key operations. Removes legacy embedding client code in retrieval in favor of a provider-based approach. * [autofix.ci] apply automated fixes * Fix import of auth utils * Allow appending to existing knowledge base * [autofix.ci] apply automated fixes * Update kb_ingest.py * Update kb_ingest.py * feat: enhance table component with editable Vectorize column functionality - Implemented logic to determine editability of the Vectorize column based on other row values. - Added checks to refresh grid cells upon changes to the Vectorize column. - Updated TableAutoCellRender to conditionally disable editing based on Vectorize column state. * New ingestion creation dialog * [autofix.ci] apply automated fixes * Clean up the creation process for KB * [autofix.ci] apply automated fixes * Clean up names and descriptions * Update kb_retrieval.py * chroma retrieval * [autofix.ci] apply automated fixes * Further KB cleanup * refactor: update KB ingestion component and enhance NodeDialog functionality - Restored SecretStrInput for API key in KB ingestion component. - Modified NodeDialog to handle new value format and added support for additional properties. - Introduced custom hooks for managing global variable states in InputGlobalComponent. - Improved dropdown component styling and interaction. - Cleaned up input component code for better readability and maintainability. * Hash the text as id * [autofix.ci] apply automated fixes * Update kb_retrieval.py * [autofix.ci] apply automated fixes * Make sure to write out the source parquet * Remove unneeded old code * Add ability to block duplicate ingestion chunks * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Rename retrieval component * Better refresh mechanism for the retrieve * Clean up some unused functionality * Update kb_ingest.py * Fix dropdown component logic to include checks for refresh button and dialog inputs * Test the API key before saving knowledge * [autofix.ci] apply automated fixes * Allow storing updated api keys if provided at ingest time * Add Knowledge Bases component and enhance Knowledge Base Empty State - Introduced a new JSON configuration for Knowledge Bases, defining nodes and edges for data processing. - Enhanced the KnowledgeBaseEmptyState component to include a button for creating a knowledge base template. - Updated KnowledgeBasesTab to handle template creation, integrating flow management and navigation features. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update Knowledge Bases.json * Update Knowledge Bases configuration and enhance UI components - Updated the code hash in the Knowledge Bases JSON configuration. - Modified the KnowledgeBaseEmptyState component to change the button icon and text from "Try Knowledge Base Template" to "Create Knowledge". - Cleared the options for the knowledge base selection dropdowns to ensure they reflect the current state of available knowledge bases. * [autofix.ci] apply automated fixes * Implement feature flag for Knowledge Bases functionality - Added FEATURE_FLAGS.knowledge_bases to control the visibility of knowledge base components in the API and UI. - Updated the router to conditionally include the knowledge bases router based on the feature flag. - Modified KBIngestionComponent and KBRetrievalComponent to hide if the knowledge bases feature is disabled. - Enhanced the initial setup to skip loading knowledge base starter projects when the feature is disabled. - Updated frontend routes and sidebar components to conditionally render knowledge base options based on the feature flag. - Adjusted API queries to return an empty array if the knowledge bases feature is disabled. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Refactor Knowledge Bases feature flag implementation - Removed the FEATURE_FLAGS.knowledge_bases flag from backend components and frontend routes. - Updated the API and UI to always include knowledge base components, simplifying the codebase. - Adjusted the frontend feature flags to set ENABLE_KNOWLEDGE_BASES to false, ensuring knowledge base features are not displayed. - Cleaned up related components and routes to reflect the removal of the feature flag, enhancing maintainability. * revert * [autofix.ci] apply automated fixes * Remove Knowledge Bases JSON configuration and clean up KnowledgeBasesTab component by eliminating unused imports and template creation functionality. * [autofix.ci] apply automated fixes * Enhance routing structure by adding admin and login routes with protected access. Refactor flow routes for improved organization and clarity. * added template back * Use chroma for stats computation * Fix ruff issue * [autofix.ci] apply automated fixes * Update Knowledge Bases.json * Update Knowledge Bases.json * Rename to just knowledge * feat: enhance Jest configuration and add new tests for Knowledge Base components - Updated jest.config.js to include a new setup file and refined test matching patterns. - Introduced jest.setup.js for mocking globals and Vite-specific syntax. - Added tests for KnowledgeBaseDrawer, KnowledgeBaseEmptyState, KnowledgeBaseSelectionOverlay, KnowledgeBasesTab, and KnowledgePage components. - Created utility functions for testing and mock data for knowledge bases. - Implemented tests for utility functions related to knowledge base formatting. * [autofix.ci] apply automated fixes * refactor: reorganize imports and clean up console log in Dropdown component - Moved and re-imported necessary dependencies for better structure. - Removed unnecessary console log statement to clean up the code. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * feat: add success callback for knowledge base creation in NodeDialog component - Introduced a new success callback to handle knowledge base creation notifications. - Enhanced dialog closing logic with a delay for Astra database tracking. - Reorganized imports for better structure. * refactor: update table component to handle single-toggle columns - Renamed functions and variables to improve clarity regarding single-toggle columns (Vectorize and Identifier). - Updated logic to ensure proper editability checks for single-toggle columns. - Adjusted related components to reflect changes in column handling and rendering. * [autofix.ci] apply automated fixes * feat: Add unit tests for KBIngestionComponent (#9246) * [autofix.ci] apply automated fixes * fix: remove unnecessary drawer open state change in KnowledgePage * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Remove kb_info output from KBIngestionComponent (#9275) * [autofix.ci] apply automated fixes * Update Knowledge Bases.json * Use settings service for knowledge base directory Replaces the hardcoded knowledge base directory path with a value from the settings service. This improves configurability and centralizes directory management. * Fix knowledge bases mypy issue * test: Update file page tests for consistency and clarity - Changed expected title text from "My Files" to "Files" for accuracy. - Removed unnecessary parentheses in arrow functions for cleaner syntax. - Updated test assertions to ensure visibility checks are clear and consistent. - Improved readability by standardizing the formatting of test cases. * test: Update expected title in file upload component test for accuracy - Changed expected title text from "My Files" to "Files" to reflect the correct page title. * [autofix.ci] apply automated fixes * Fix tests on backend * Update kb_ingest.py * [autofix.ci] apply automated fixes * Switch to two templates for KB * Update names and descs * [autofix.ci] apply automated fixes * Rename templates * [autofix.ci] apply automated fixes --------- Co-authored-by: Deon Sanchez <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]>
* refactor: Standardize import statements and improve code readability across components - Updated import statements to use consistent single quotes. - Refactored various components to enhance readability and maintainability. - Adjusted folder and file handling logic in the sidebar and file manager components. - Introduced a new tabbed interface for the files page to separate files and knowledge bases, improving user experience. * [autofix.ci] apply automated fixes * feat: Introduce new Files and Knowledge Bases page with tabbed interface - Added a new FilesPage component to manage file uploads and organization. - Implemented a tabbed interface to separate Files and Knowledge Bases for improved user experience. - Created FilesTab and KnowledgeBasesTab components for handling respective functionalities. - Refactored routing to accommodate the new structure and updated import statements for consistency. - Removed the old filesPage component to streamline the codebase. * Create knowledgebase_utils.py * Push initial ingest component * [autofix.ci] apply automated fixes * Create initial KB Ingestion component * [autofix.ci] apply automated fixes * Fix ruff check on utility functions * [autofix.ci] apply automated fixes * Some quick fixes * Update kb_ingest.py * [autofix.ci] apply automated fixes * First version of retrieval component * [autofix.ci] apply automated fixes * Update icon * Update kb_retrieval.py * [autofix.ci] apply automated fixes * Add knowledge bases feature with API integration and UI components * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Refactor imports and update routing paths for assets and main page components. Adjust tab handling in the assets page to reflect URL changes and improve user navigation experience. * [autofix.ci] apply automated fixes * Add CreateKnowledgeBaseButton, KnowledgeBaseEmptyState, and KnowledgeBaseSelectionOverlay components. Refactor KnowledgeBasesTab to utilize new components and improve UI for knowledge base management. Introduce utility functions for formatting numbers and average chunk sizes. * [autofix.ci] apply automated fixes * PoV: Add Parquet data retrieval to KBRetrievalComponent (#9097) * Add Parquet data retrieval to KBRetrievalComponent Introduces a new output to KBRetrievalComponent for returning knowledge base data by reading Parquet files. Updates dependencies to include fastparquet for Parquet support. * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Fix some ruff issues * [autofix.ci] apply automated fixes * feat: refactor file management and knowledge base components - Replaced the existing assetsPage with a new filesPage to better organize file management functionalities. - Introduced KnowledgePage to handle knowledge base operations, integrating KnowledgeBasesTab for displaying and managing knowledge bases. - Added various components for file and knowledge base management, including CreateKnowledgeBaseButton, KnowledgeBaseEmptyState, and drag-and-drop functionality. - Updated routing and imports to reflect the new structure and ensure consistency across the application. - Enhanced user experience with improved UI elements and state management for file selection and operations. * feat: implement delete confirmation modal for knowledge base deletion - Added a DeleteConfirmationModal component to confirm deletion actions. - Integrated the modal into the KnowledgeBasesTab for handling knowledge base deletions. - Updated column definitions to include a delete button for each knowledge base. - Enhanced user experience by ensuring deletion actions require confirmation. - Adjusted styles for the knowledge base table to improve checkbox visibility. * feat: enhance knowledge base metadata with embedding model detection - Added `embedding_model` field to `KnowledgeBaseInfo` for improved metadata tracking. - Implemented `detect_embedding_model` function to extract embedding model information from configuration files. - Updated `get_kb_metadata` to prioritize metadata extraction from `embedding_metadata.json`, falling back to detection if necessary. - Modified `KBIngestionComponent` to save embedding model metadata during ingestion. - Adjusted frontend components to display embedding model information in knowledge base queries and tables. * refactor: clean up tooltip and value getter comments in knowledge base columns - Removed redundant comments in the `knowledgeBaseColumns.tsx` file to enhance code clarity. - Simplified the tooltip and value getter functions for embedding model display. * [autofix.ci] apply automated fixes * refactor: simplify KnowledgeBaseSelectionOverlay component - Removed the unused onExport prop and its associated functionality. - Cleaned up code formatting for consistency and readability. - Updated success message strings to use single quotes for uniformity. * feat: implement bulk and single deletion for knowledge bases - Added `BulkDeleteRequest` model to handle bulk deletion requests. - Implemented `delete_knowledge_base` endpoint for single knowledge base deletion. - Created `delete_knowledge_bases_bulk` endpoint for deleting multiple knowledge bases at once. - Introduced `useDeleteKnowledgeBase` and `useDeleteKnowledgeBases` hooks for frontend integration. - Updated `KnowledgeBaseSelectionOverlay` and `KnowledgeBasesTab` components to utilize new deletion functionality with user feedback on success and error handling. * Initial support for vector search * feat: add KnowledgeBaseDrawer component for enhanced knowledge base details - Introduced `KnowledgeBaseDrawer` component to display detailed information about selected knowledge bases. - Integrated mock data for source files and linked flows, with a layout for displaying descriptions and embedding models. - Updated `KnowledgeBasesTab` to handle row clicks and open the drawer with relevant knowledge base data. - Enhanced `KnowledgePage` to manage drawer state and selected knowledge base, improving user interaction and experience. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Fix ruff checks * Update knowledge_bases.py * feat: update mock data and enhance drawer functionality in KnowledgeBase components - Replaced mock data in `KnowledgeBaseDrawer` with more descriptive placeholders. - Added a reference to the drawer in `KnowledgePage` for improved click handling. - Implemented logic to close the drawer when clicking outside, except for table row clicks. - Enhanced row click handling to toggle drawer state based on current visibility. * [autofix.ci] apply automated fixes * Append scores column to rows * refactor: improve knowledge base deletion and UI components - Updated `useDeleteKnowledgeBase` and `useDeleteKnowledgeBases` to enhance parameter naming for clarity. - Removed the `CreateKnowledgeBaseButton` component and its references to streamline the UI. - Simplified the `KnowledgeBaseDrawer` and `KnowledgeBasesTab` components by removing mock data and improving state management. - Enhanced the `KnowledgeBaseSelectionOverlay` to better handle bulk deletions and selection states. - Refactored various components for consistent styling and improved readability. * refactor: standardize import statements and improve code readability in SideBarFoldersButtonsComponent - Updated import statements to use consistent single quotes. - Refactored various function calls and state management for improved clarity. - Enhanced folder handling logic and UI interactions for better user experience. * feat: Add encryption for API keys in KB ingest and retrieval (#9129) Add encryption for API keys in KB ingest and retrieval Introduces secure storage of embedding model API keys by encrypting them during knowledge base ingestion and decrypting them during retrieval. Refactors metadata handling to include encrypted API keys, updates retrieval to support decryption and dynamic embedder construction, and improves logging for key operations. Removes legacy embedding client code in retrieval in favor of a provider-based approach. * [autofix.ci] apply automated fixes * Fix import of auth utils * Allow appending to existing knowledge base * [autofix.ci] apply automated fixes * Update kb_ingest.py * Update kb_ingest.py * feat: enhance table component with editable Vectorize column functionality - Implemented logic to determine editability of the Vectorize column based on other row values. - Added checks to refresh grid cells upon changes to the Vectorize column. - Updated TableAutoCellRender to conditionally disable editing based on Vectorize column state. * New ingestion creation dialog * [autofix.ci] apply automated fixes * Clean up the creation process for KB * [autofix.ci] apply automated fixes * Clean up names and descriptions * Update kb_retrieval.py * chroma retrieval * [autofix.ci] apply automated fixes * Further KB cleanup * refactor: update KB ingestion component and enhance NodeDialog functionality - Restored SecretStrInput for API key in KB ingestion component. - Modified NodeDialog to handle new value format and added support for additional properties. - Introduced custom hooks for managing global variable states in InputGlobalComponent. - Improved dropdown component styling and interaction. - Cleaned up input component code for better readability and maintainability. * Hash the text as id * [autofix.ci] apply automated fixes * Update kb_retrieval.py * [autofix.ci] apply automated fixes * Make sure to write out the source parquet * Remove unneeded old code * Add ability to block duplicate ingestion chunks * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Rename retrieval component * Better refresh mechanism for the retrieve * Clean up some unused functionality * Update kb_ingest.py * Fix dropdown component logic to include checks for refresh button and dialog inputs * Test the API key before saving knowledge * [autofix.ci] apply automated fixes * Allow storing updated api keys if provided at ingest time * Add Knowledge Bases component and enhance Knowledge Base Empty State - Introduced a new JSON configuration for Knowledge Bases, defining nodes and edges for data processing. - Enhanced the KnowledgeBaseEmptyState component to include a button for creating a knowledge base template. - Updated KnowledgeBasesTab to handle template creation, integrating flow management and navigation features. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update Knowledge Bases.json * Update Knowledge Bases configuration and enhance UI components - Updated the code hash in the Knowledge Bases JSON configuration. - Modified the KnowledgeBaseEmptyState component to change the button icon and text from "Try Knowledge Base Template" to "Create Knowledge". - Cleared the options for the knowledge base selection dropdowns to ensure they reflect the current state of available knowledge bases. * [autofix.ci] apply automated fixes * Implement feature flag for Knowledge Bases functionality - Added FEATURE_FLAGS.knowledge_bases to control the visibility of knowledge base components in the API and UI. - Updated the router to conditionally include the knowledge bases router based on the feature flag. - Modified KBIngestionComponent and KBRetrievalComponent to hide if the knowledge bases feature is disabled. - Enhanced the initial setup to skip loading knowledge base starter projects when the feature is disabled. - Updated frontend routes and sidebar components to conditionally render knowledge base options based on the feature flag. - Adjusted API queries to return an empty array if the knowledge bases feature is disabled. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Refactor Knowledge Bases feature flag implementation - Removed the FEATURE_FLAGS.knowledge_bases flag from backend components and frontend routes. - Updated the API and UI to always include knowledge base components, simplifying the codebase. - Adjusted the frontend feature flags to set ENABLE_KNOWLEDGE_BASES to false, ensuring knowledge base features are not displayed. - Cleaned up related components and routes to reflect the removal of the feature flag, enhancing maintainability. * revert * [autofix.ci] apply automated fixes * Remove Knowledge Bases JSON configuration and clean up KnowledgeBasesTab component by eliminating unused imports and template creation functionality. * [autofix.ci] apply automated fixes * Enhance routing structure by adding admin and login routes with protected access. Refactor flow routes for improved organization and clarity. * added template back * Use chroma for stats computation * Fix ruff issue * [autofix.ci] apply automated fixes * Update Knowledge Bases.json * Update Knowledge Bases.json * Rename to just knowledge * feat: enhance Jest configuration and add new tests for Knowledge Base components - Updated jest.config.js to include a new setup file and refined test matching patterns. - Introduced jest.setup.js for mocking globals and Vite-specific syntax. - Added tests for KnowledgeBaseDrawer, KnowledgeBaseEmptyState, KnowledgeBaseSelectionOverlay, KnowledgeBasesTab, and KnowledgePage components. - Created utility functions for testing and mock data for knowledge bases. - Implemented tests for utility functions related to knowledge base formatting. * [autofix.ci] apply automated fixes * refactor: reorganize imports and clean up console log in Dropdown component - Moved and re-imported necessary dependencies for better structure. - Removed unnecessary console log statement to clean up the code. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * feat: add success callback for knowledge base creation in NodeDialog component - Introduced a new success callback to handle knowledge base creation notifications. - Enhanced dialog closing logic with a delay for Astra database tracking. - Reorganized imports for better structure. * refactor: update table component to handle single-toggle columns - Renamed functions and variables to improve clarity regarding single-toggle columns (Vectorize and Identifier). - Updated logic to ensure proper editability checks for single-toggle columns. - Adjusted related components to reflect changes in column handling and rendering. * [autofix.ci] apply automated fixes * feat: Add unit tests for KBIngestionComponent (#9246) * [autofix.ci] apply automated fixes * fix: remove unnecessary drawer open state change in KnowledgePage * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Remove kb_info output from KBIngestionComponent (#9275) * [autofix.ci] apply automated fixes * Update Knowledge Bases.json * Use settings service for knowledge base directory Replaces the hardcoded knowledge base directory path with a value from the settings service. This improves configurability and centralizes directory management. * Fix knowledge bases mypy issue * test: Update file page tests for consistency and clarity - Changed expected title text from "My Files" to "Files" for accuracy. - Removed unnecessary parentheses in arrow functions for cleaner syntax. - Updated test assertions to ensure visibility checks are clear and consistent. - Improved readability by standardizing the formatting of test cases. * test: Update expected title in file upload component test for accuracy - Changed expected title text from "My Files" to "Files" to reflect the correct page title. * [autofix.ci] apply automated fixes * Fix tests on backend * Update kb_ingest.py * [autofix.ci] apply automated fixes * Switch to two templates for KB * Update names and descs * [autofix.ci] apply automated fixes * Rename templates * [autofix.ci] apply automated fixes --------- Co-authored-by: Deon Sanchez <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]>




This pull request introduces new functionality for knowledge base retrieval, adds supporting components and utilities, and updates dependencies to support these features. The main changes include the addition of a knowledge base retrieval component, supporting TF-IDF and BM25 scoring utilities, and necessary configuration and dependency updates.
Knowledge Base Retrieval Functionality:
KBRetrievalComponentinkb_retrieval.pyto enable searching and retrieving data from local knowledge bases, supporting multiple embedding providers (OpenAI, HuggingFace, Cohere), and returning results as a DataFrame. This includes dynamic metadata handling, API key decryption, and flexible output with or without metadata.KBRetrievalComponentandKBIngestionComponentin the data components module for integration into the system. [1] [2]Supporting Utilities:
compute_tfidfandcompute_bm25utility functions for document scoring inkb_utils.py, providing standard text retrieval scoring methods for use with knowledge base queries.Configuration and Dependency Updates:
Introduced a new configuration option
knowledge_bases_dirin the settings to specify where knowledge bases are stored.Added new dependencies:
langchain-huggingfacefor HuggingFace embedding support andfastparquetfor efficient data processing. [1] [2]## Summary by CodeRabbitNew Features
Chores