-
Notifications
You must be signed in to change notification settings - Fork 8.5k
feat: Add support for Ingestion and Retrieval of Knowledge Bases #9088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
187 commits
Select commit
Hold shift + click to select a range
9be2d30
refactor: Standardize import statements and improve code readability …
deon-sanchez 941bc81
[autofix.ci] apply automated fixes
autofix-ci[bot] 4df3225
feat: Introduce new Files and Knowledge Bases page with tabbed interface
deon-sanchez c32d451
Create knowledgebase_utils.py
erichare 75409c1
Push initial ingest component
erichare 1c9a2aa
[autofix.ci] apply automated fixes
autofix-ci[bot] de3ade8
Create initial KB Ingestion component
erichare 5ea7224
[autofix.ci] apply automated fixes
autofix-ci[bot] c22e59b
Fix ruff check on utility functions
erichare ccd0f79
[autofix.ci] apply automated fixes
autofix-ci[bot] b9f9e01
Some quick fixes
erichare c00f486
Update kb_ingest.py
erichare 4ada462
Merge branch 'main' into feat-knowledge-bases
erichare cabf676
[autofix.ci] apply automated fixes
autofix-ci[bot] 350461e
First version of retrieval component
erichare b0b62a3
[autofix.ci] apply automated fixes
autofix-ci[bot] 7dad9d6
Update icon
erichare 6a0f187
Update kb_retrieval.py
erichare 8da44b2
[autofix.ci] apply automated fixes
autofix-ci[bot] 0d25004
Merge branch 'lfoss-1813' into feat-knowledge-bases
deon-sanchez 1247bed
Add knowledge bases feature with API integration and UI components
deon-sanchez 66da30e
[autofix.ci] apply automated fixes
autofix-ci[bot] 5951200
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] d9c9cb9
Refactor imports and update routing paths for assets and main page co…
deon-sanchez 75189e8
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 81367fb
Merge branch 'main' into feat-knowledge-bases
edwinjosechittilappilly d7940af
[autofix.ci] apply automated fixes
autofix-ci[bot] db49a96
Add CreateKnowledgeBaseButton, KnowledgeBaseEmptyState, and Knowledge…
deon-sanchez 5503c78
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 845f0a7
[autofix.ci] apply automated fixes
autofix-ci[bot] ef94bcf
PoV: Add Parquet data retrieval to KBRetrievalComponent (#9097)
edwinjosechittilappilly 6d82934
Fix some ruff issues
erichare 79e3425
[autofix.ci] apply automated fixes
autofix-ci[bot] b43333f
Merge branch 'main' into feat-knowledge-bases
erichare 109363c
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez 49c0db0
Merge branch 'main' into feat-knowledge-bases
erichare d7e5c33
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez bd1d91f
feat: refactor file management and knowledge base components
deon-sanchez d5d2a5e
feat: implement delete confirmation modal for knowledge base deletion
deon-sanchez 63dd4c9
feat: enhance knowledge base metadata with embedding model detection
deon-sanchez 14b87c4
refactor: clean up tooltip and value getter comments in knowledge bas…
deon-sanchez 8daab25
[autofix.ci] apply automated fixes
autofix-ci[bot] 8268740
refactor: simplify KnowledgeBaseSelectionOverlay component
deon-sanchez c3d286b
feat: implement bulk and single deletion for knowledge bases
deon-sanchez 388e98a
Merge branch 'main' into feat-knowledge-bases
erichare 2c78dd0
Initial support for vector search
erichare 2adcc77
feat: add KnowledgeBaseDrawer component for enhanced knowledge base d…
deon-sanchez c4bf9bf
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 3b88885
[autofix.ci] apply automated fixes
autofix-ci[bot] 6b3a349
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] 4116cae
Fix ruff checks
erichare 810c717
Update knowledge_bases.py
erichare c883ae1
feat: update mock data and enhance drawer functionality in KnowledgeB…
deon-sanchez 24e7715
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez dd8855b
[autofix.ci] apply automated fixes
autofix-ci[bot] 2c02cc0
Append scores column to rows
erichare 0d36985
Merge branch 'main' into feat-knowledge-bases
erichare 77bc57f
refactor: improve knowledge base deletion and UI components
deon-sanchez 98766fc
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 9c7fb6a
refactor: standardize import statements and improve code readability …
deon-sanchez 63fb9b9
feat: Add encryption for API keys in KB ingest and retrieval (#9129)
edwinjosechittilappilly 049e39f
[autofix.ci] apply automated fixes
autofix-ci[bot] 8ec1341
Merge branch 'main' into feat-knowledge-bases
erichare 8adcd12
Merge branch 'main' into feat-knowledge-bases
erichare 0ca5a67
Merge branch 'main' into feat-knowledge-bases
erichare f251c73
Merge branch 'main' into feat-knowledge-bases
erichare 1def7f6
Fix import of auth utils
erichare 9146f7e
Allow appending to existing knowledge base
erichare 06211a6
[autofix.ci] apply automated fixes
autofix-ci[bot] d3a7120
Update kb_ingest.py
erichare 67d5ae5
Update kb_ingest.py
erichare bc10c6e
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez bad02f3
feat: enhance table component with editable Vectorize column function…
deon-sanchez fe36a36
New ingestion creation dialog
erichare d139d5b
[autofix.ci] apply automated fixes
autofix-ci[bot] 4cb23b7
Clean up the creation process for KB
erichare 6ece64b
[autofix.ci] apply automated fixes
autofix-ci[bot] 69aed9a
Clean up names and descriptions
erichare bd4ae10
Update kb_retrieval.py
erichare 1469ecf
Merge branch 'main' into feat-knowledge-bases
erichare a654109
chroma retrieval
erichare 5d0916d
[autofix.ci] apply automated fixes
autofix-ci[bot] a8ea48e
Further KB cleanup
erichare 4440e08
refactor: update KB ingestion component and enhance NodeDialog functi…
deon-sanchez 93b5149
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 16555cd
Hash the text as id
erichare 1e66ae2
[autofix.ci] apply automated fixes
autofix-ci[bot] 1c4c209
Update kb_retrieval.py
erichare 86c8e55
Merge branch 'main' into feat-knowledge-bases
erichare 4b7de6d
Merge branch 'main' into feat-knowledge-bases
erichare 4f49445
[autofix.ci] apply automated fixes
autofix-ci[bot] 0a43c94
Make sure to write out the source parquet
erichare 72d88c0
Remove unneeded old code
erichare 2048c42
Merge branch 'main' into feat-knowledge-bases
erichare cf7d64d
Add ability to block duplicate ingestion chunks
erichare 36fac5a
Merge branch 'main' into feat-knowledge-bases
erichare 9341c41
[autofix.ci] apply automated fixes
autofix-ci[bot] 45f14f7
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] e6ab6cb
Rename retrieval component
erichare 542984b
Better refresh mechanism for the retrieve
erichare 4864640
Clean up some unused functionality
erichare 3aeb0c5
Merge branch 'main' into feat-knowledge-bases
erichare 8ab4368
Update kb_ingest.py
erichare 80e223e
Fix dropdown component logic to include checks for refresh button and…
deon-sanchez 9058976
Test the API key before saving knowledge
erichare 03a8c2e
[autofix.ci] apply automated fixes
autofix-ci[bot] 96ee3f4
Allow storing updated api keys if provided at ingest time
erichare 896bf61
Merge branch 'main' into feat-knowledge-bases
erichare d3fc9e8
Add Knowledge Bases component and enhance Knowledge Base Empty State
deon-sanchez 5718eb3
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez b33a3c9
[autofix.ci] apply automated fixes
autofix-ci[bot] 602f39d
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] 502436d
Update Knowledge Bases.json
erichare 00da454
Update Knowledge Bases configuration and enhance UI components
deon-sanchez 76f0035
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez c9fbbdd
[autofix.ci] apply automated fixes
autofix-ci[bot] 5dcf0b8
Implement feature flag for Knowledge Bases functionality
deon-sanchez 14909d9
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 41ba6ec
[autofix.ci] apply automated fixes
autofix-ci[bot] 3662d50
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] 20d4382
Refactor Knowledge Bases feature flag implementation
deon-sanchez de4edf7
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 1e7ffce
revert
deon-sanchez 6e7b061
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez 8277cb6
Merge branch 'main' into feat-knowledge-bases
erichare ed009cd
[autofix.ci] apply automated fixes
autofix-ci[bot] 8700133
Remove Knowledge Bases JSON configuration and clean up KnowledgeBases…
deon-sanchez aaaae03
[autofix.ci] apply automated fixes
autofix-ci[bot] 02d4874
Enhance routing structure by adding admin and login routes with prote…
deon-sanchez ae0d378
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez 43ef981
added template back
deon-sanchez 9c21594
Use chroma for stats computation
erichare 71eaf96
Fix ruff issue
erichare 6ce2414
[autofix.ci] apply automated fixes
autofix-ci[bot] 86334cf
Update Knowledge Bases.json
erichare d3d176f
Update Knowledge Bases.json
erichare dfcfe7b
Rename to just knowledge
erichare e072f0d
Merge branch 'main' into feat-knowledge-bases
erichare 6645b25
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez 3efe3be
feat: enhance Jest configuration and add new tests for Knowledge Base…
deon-sanchez 2dc9c55
[autofix.ci] apply automated fixes
autofix-ci[bot] 8fa29e5
refactor: reorganize imports and clean up console log in Dropdown com…
deon-sanchez aacf468
[autofix.ci] apply automated fixes
autofix-ci[bot] f61689a
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] 6416d51
feat: add success callback for knowledge base creation in NodeDialog …
deon-sanchez b780edd
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez d20c2c6
refactor: update table component to handle single-toggle columns
deon-sanchez 8c40cf7
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez 5536a3d
[autofix.ci] apply automated fixes
autofix-ci[bot] 2a4dba8
feat: Add unit tests for KBIngestionComponent (#9246)
edwinjosechittilappilly de843c8
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez fb45847
[autofix.ci] apply automated fixes
autofix-ci[bot] c053983
fix: remove unnecessary drawer open state change in KnowledgePage
deon-sanchez 3f24571
[autofix.ci] apply automated fixes
autofix-ci[bot] 62a1023
[autofix.ci] apply automated fixes (attempt 2/3)
autofix-ci[bot] e80a68e
Remove kb_info output from KBIngestionComponent (#9275)
edwinjosechittilappilly 663b819
[autofix.ci] apply automated fixes
autofix-ci[bot] 414a7b9
Update Knowledge Bases.json
edwinjosechittilappilly 6498a83
Use settings service for knowledge base directory
edwinjosechittilappilly 60c6da5
Merge branch 'main' of https://github.com/langflow-ai/langflow into f…
deon-sanchez 4516cca
Fix knowledge bases mypy issue
erichare 9121c1d
test: Update file page tests for consistency and clarity
deon-sanchez 9a9717a
test: Update expected title in file upload component test for accuracy
deon-sanchez 1871c1d
Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…
deon-sanchez d8f3d0f
[autofix.ci] apply automated fixes
autofix-ci[bot] 7565e95
Fix tests on backend
erichare b62a7eb
Merge branch 'main' into feat-knowledge-bases
erichare 706040f
Update kb_ingest.py
erichare 4072499
[autofix.ci] apply automated fixes
autofix-ci[bot] 4ace8d8
Merge branch 'main' into feat-knowledge-bases
erichare baeb113
Merge branch 'main' into feat-knowledge-bases
erichare 11d7b17
Merge branch 'main' into feat-knowledge-bases
erichare dda21d7
Merge branch 'main' into feat-knowledge-bases
erichare fd1b2ae
Merge branch 'main' into feat-knowledge-bases
edwinjosechittilappilly b6b60fa
Merge branch 'main' into feat-knowledge-bases
erichare 600e0e9
Merge branch 'main' into feat-knowledge-bases
erichare 933233a
Merge branch 'main' into feat-knowledge-bases
erichare d88b479
Merge branch 'main' into feat-knowledge-bases
erichare fb5294c
Merge branch 'main' into feat-knowledge-bases
erichare ef664d8
Merge branch 'main' into feat-knowledge-bases
erichare 9c90eeb
Merge branch 'main' into feat-knowledge-bases
erichare a37c8a8
Switch to two templates for KB
erichare 0600f8c
Merge branch 'main' into feat-knowledge-bases
erichare f831d9b
Update names and descs
erichare 71ef5f5
[autofix.ci] apply automated fixes
autofix-ci[bot] 58044d0
Rename templates
erichare 4d49c95
[autofix.ci] apply automated fixes
autofix-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
124 changes: 124 additions & 0 deletions
124
src/backend/base/langflow/base/data/knowledgebase_utils.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| import math | ||
| from collections import Counter | ||
|
|
||
|
|
||
| def compute_tfidf(documents: list[str], query_terms: list[str]) -> list[float]: | ||
| """Compute TF-IDF scores for query terms across a collection of documents. | ||
|
|
||
| Args: | ||
| documents: List of document strings | ||
| query_terms: List of query terms to score | ||
|
|
||
| Returns: | ||
| List of TF-IDF scores for each document | ||
| """ | ||
| # Tokenize documents (simple whitespace splitting) | ||
| tokenized_docs = [doc.lower().split() for doc in documents] | ||
| n_docs = len(documents) | ||
|
|
||
| # Calculate document frequency for each term | ||
| df = {} | ||
|
Check failure on line 20 in src/backend/base/langflow/base/data/knowledgebase_utils.py
|
||
| for term in query_terms: | ||
| df[term] = sum(1 for doc in tokenized_docs if term.lower() in doc) | ||
|
|
||
| scores = [] | ||
|
|
||
| for doc_tokens in tokenized_docs: | ||
| doc_score = 0.0 | ||
| doc_length = len(doc_tokens) | ||
| term_counts = Counter(doc_tokens) | ||
|
|
||
| for term in query_terms: | ||
| term_lower = term.lower() | ||
|
|
||
| # Term frequency (TF) | ||
| tf = term_counts[term_lower] / doc_length if doc_length > 0 else 0 | ||
|
|
||
| # Inverse document frequency (IDF) | ||
| idf = math.log(n_docs / df[term]) if df[term] > 0 else 0 | ||
|
|
||
| # TF-IDF score | ||
| doc_score += tf * idf | ||
|
|
||
| scores.append(doc_score) | ||
|
|
||
| return scores | ||
|
|
||
|
|
||
| def compute_bm25(documents: list[str], query_terms: list[str], k1: float = 1.2, b: float = 0.75) -> list[float]: | ||
| """Compute BM25 scores for query terms across a collection of documents. | ||
|
|
||
| Args: | ||
| documents: List of document strings | ||
| query_terms: List of query terms to score | ||
| k1: Controls term frequency scaling (default: 1.2) | ||
| b: Controls document length normalization (default: 0.75) | ||
|
|
||
| Returns: | ||
| List of BM25 scores for each document | ||
| """ | ||
| # Tokenize documents | ||
| tokenized_docs = [doc.lower().split() for doc in documents] | ||
| n_docs = len(documents) | ||
|
|
||
| # Calculate average document length | ||
| avg_doc_length = sum(len(doc) for doc in tokenized_docs) / n_docs if n_docs > 0 else 0 | ||
|
|
||
| # Calculate document frequency for each term | ||
| df = {} | ||
|
Check failure on line 68 in src/backend/base/langflow/base/data/knowledgebase_utils.py
|
||
| for term in query_terms: | ||
| df[term] = sum(1 for doc in tokenized_docs if term.lower() in doc) | ||
|
|
||
| scores = [] | ||
|
|
||
| for doc_tokens in tokenized_docs: | ||
| doc_score = 0.0 | ||
| doc_length = len(doc_tokens) | ||
| term_counts = Counter(doc_tokens) | ||
|
|
||
| for term in query_terms: | ||
| term_lower = term.lower() | ||
|
|
||
| # Term frequency in document | ||
| tf = term_counts[term_lower] | ||
|
|
||
| # Inverse document frequency (IDF) | ||
| idf = math.log((n_docs - df[term] + 0.5) / (df[term] + 0.5)) if df[term] > 0 else 0 | ||
|
|
||
| # BM25 score calculation | ||
| numerator = tf * (k1 + 1) | ||
| denominator = tf + k1 * (1 - b + b * (doc_length / avg_doc_length)) | ||
|
|
||
| doc_score += idf * (numerator / denominator) | ||
|
|
||
| scores.append(doc_score) | ||
|
|
||
| return scores | ||
erichare marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| # Example usage | ||
| if __name__ == "__main__": | ||
| # Sample documents | ||
| docs = [ | ||
| "The quick brown fox jumps over the lazy dog", | ||
| "A quick brown dog runs fast", | ||
| "The lazy cat sleeps all day", | ||
| "Brown animals are quick and fast" | ||
| ] | ||
|
|
||
| # Query terms | ||
| query = ["quick", "brown"] | ||
|
|
||
| # Compute TF-IDF scores | ||
| tfidf_scores = compute_tfidf(docs, query) | ||
| print("TF-IDF Scores:") | ||
| for i, score in enumerate(tfidf_scores): | ||
| print(f"Document {i+1}: {score:.4f}") | ||
|
|
||
| print("\n" + "="*40 + "\n") | ||
|
|
||
| # Compute BM25 scores | ||
| bm25_scores = compute_bm25(docs, query) | ||
| print("BM25 Scores:") | ||
| for i, score in enumerate(bm25_scores): | ||
| print(f"Document {i+1}: {score:.4f}") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.