-
Notifications
You must be signed in to change notification settings - Fork 1k
Summarize retrieved edges to compact string [COG-1181] #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request introduces a new summarization engine and extends search functionality. A new prompt file is added for summarizing relationships into natural sentences. The search module is updated to support a new search type, Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant S as Search Module
participant G as Graph Summary Completion
participant DB as Graph Database
participant LLM as LLM Client
U->>S: Submit GRAPH_SUMMARY_COMPLETION query
S->>G: Delegate query processing
G->>DB: Execute query
DB-->>G: Return graph data
G->>LLM: Generate summary via retrieved_edges_to_summary
LLM-->>G: Return summarized response
G-->>S: Return graph summary result
S-->>U: Deliver final answer
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
cognee/infrastructure/llm/prompts/summarize_search_results.txt (1)
1-4: 🛠️ Refactor suggestionEnhance the prompt with examples and detailed instructions.
The current prompt is too minimal and lacks crucial details needed for consistent and reliable summarization. Consider enhancing it with:
- Example input-output pairs demonstrating the desired summarization
- Specific format requirements for the output
- Guidelines for handling edge cases (e.g., malformed relationships, duplicates)
- Error handling instructions
Here's a suggested enhancement:
-You are a top-tier summarization engine that is meant to eliminate redundancies. -The input contains relationships enclosed by "--" . -Summarize the input into natural sentences, listing all relationships. +You are a top-tier summarization engine designed to eliminate redundancies while preserving all unique relationships. + +Input Format: +- The input contains relationships enclosed by "--" +- Each relationship represents a connection between entities + +Output Requirements: +- Generate clear, natural sentences +- Preserve all unique relationships +- Group related information +- Maintain a consistent writing style +- Use proper punctuation and conjunctions + +Example Input: +"David Thompson --has experience-- Creative Design" +"David Thompson --worked at-- Design Studio X" +"David Thompson --has experience-- Graphic Design" +"David Thompson --graduated from-- Art Institute" + +Example Output: +"David Thompson has experience in Creative and Graphic Design. He worked at Design Studio X and graduated from Art Institute." + +Error Handling: +- If a relationship is malformed (missing "--"), preserve the text as-is +- For duplicate relationships, include only unique instances +- If the input is empty, return "No relationships found."
🧹 Nitpick comments (5)
cognee/tasks/completion/graph_query_summary_completion.py (3)
1-9: Consider adding inline documentation or comments for imports.
While the import statements are generally self-explanatory, adding short clarifications or docstrings about their usage context can improve readability, especially for modules likeget_all_subclassesorretrieved_edges_to_stringwhich may be less obvious to new contributors.
10-24: Add a dedicated test forretrieved_edges_to_summary.
This function is a core piece of logic that generates a summarized context from retrieved edges. A separate unit test would help ensure that edge cases (e.g., empty lists, very large lists) are handled consistently and that the summarized context remains correct and stable over time.Would you like me to propose or generate a test suite for this function?
26-73: Refine return type for consistency and clarity.
graph_query_summary_completioncurrently returns a list with one element that holds the computed answer string. Depending on usage across the codebase, returning just the string directly may simplify consumers of this function. If multiple outputs are ever needed, consider returning a dictionary with well-defined keys.Consider explicit error handling for LLM calls.
While the function properly handles the case of no triplets found, consider wrapping the LLM call in a try-except block to handle network issues, timeouts, or unexpected response formats, and either retry or provide a more descriptive error message.Should I open an issue for adding robust error handling logic for the LLM calls?
cognee/modules/search/types/SearchType.py (1)
10-10: Confirm naming alignment with existing enum members.
AddingGRAPH_SUMMARY_COMPLETIONis a reasonable extension, though consider aligning it with the naming style ofGRAPH_COMPLETIONif these two operations perform related tasks. This helps maintain consistent naming conventions throughout the application.cognee/modules/search/methods/search.py (1)
53-53: Encourage a unified interface among search tasks.
Each search task invoked byspecific_searchaccepts(query)except for certain tasks that might need additional arguments (e.g.,user,datasets). If needed, consider updatinggraph_query_summary_completionor other tasks to consistently accept the same signature for uniformity, potentially by includinguserif user-based context is relevant.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cognee/infrastructure/llm/prompts/summarize_search_results.txt(1 hunks)cognee/modules/search/methods/search.py(2 hunks)cognee/modules/search/types/SearchType.py(1 hunks)cognee/tasks/completion/__init__.py(1 hunks)cognee/tasks/completion/graph_query_summary_completion.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- cognee/tasks/completion/init.py
⏰ Context from checks skipped due to timeout of 90000ms (16)
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: docker-compose-test
🔇 Additional comments (1)
cognee/modules/search/methods/search.py (1)
17-17: Validate the import usage.
Ensuregraph_query_summary_completionis utilized in the codebase (as it is in thesearch_tasksdictionary). If new tests are planned, confirm that you have coverage for this newly added function.
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 8719688 | Triggered | Generic Password | bdbf66b | notebooks/cognee_graphiti_demo.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
cognee/tasks/completion/graph_query_summary_completion.py (2)
9-13: Complete the docstring with parameters and return type.The docstring should include the parameters and return type for better documentation.
async def retrieved_edges_to_summary(retrieved_edges: list) -> str: """ Converts a list of retrieved graph edges into a summary without redundancies. - + + Parameters: + - retrieved_edges (list): List of graph edges to be summarized. + + Returns: + - str: A summarized string representation of the graph edges. """
25-26: Add a docstring to explain the function's purpose.The function should have a docstring explaining its purpose, parameters, and return type.
async def graph_query_summary_completion(query: str) -> list: + """ + Executes a graph query and returns a summarized completion of the results. + + Parameters: + - query (str): The query string to execute. + + Returns: + - list: A list containing the summarized completion. + """ return await graph_query_completion(query, edges_to_string_converter=retrieved_edges_to_summary)cognee/tasks/completion/graph_query_completion.py (2)
10-22: Remove unnecessary async declaration.The function contains no async operations, so it doesn't need to be async. This would avoid the overhead of creating and managing coroutines.
-async def retrieved_edges_to_string(retrieved_edges: list) -> str: +def retrieved_edges_to_string(retrieved_edges: list) -> str:
25-41: Update docstring to include the new parameter.The docstring should document the new
edges_to_string_converterparameter.async def graph_query_completion( query: str, edges_to_string_converter: Callable = retrieved_edges_to_string ) -> list: """ Executes a query on the graph database and retrieves a relevant completion based on the found data. Parameters: - query (str): The query string to compute. + - edges_to_string_converter (Callable): Function to convert edges to string format. + Defaults to retrieved_edges_to_string. Returns: - list: Answer to the query. Notes: - The `brute_force_triplet_search` is used to retrieve relevant graph data. - Prompts are dynamically rendered and provided to the LLM for contextual understanding. - Ensure that the LLM client and graph database are properly configured and accessible. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/tasks/completion/graph_query_completion.py(3 hunks)cognee/tasks/completion/graph_query_summary_completion.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: docker-compose-test
- GitHub Check: profiler
hajdul88
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its good. I am curious what difference does it make to summarize the triplets like this. Did you check them manually how do they compare to the other edges_to_string_converter outputs? Would be nice to see some comparison of the prompts.
|
@hajdul88 yes! |
lxobr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left one minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
cognee/tasks/completion/graph_query_completion.py (1)
10-22: 🛠️ Refactor suggestionRemove unnecessary async keyword.
The function doesn't contain any asynchronous operations (no
awaitstatements). Making it async adds unnecessary overhead since it only performs synchronous string manipulation and list operations.Apply this diff to make the function synchronous:
-async def retrieved_edges_to_string(retrieved_edges: list) -> str: +def retrieved_edges_to_string(retrieved_edges: list) -> str:
🧹 Nitpick comments (1)
cognee/tasks/completion/graph_query_completion.py (1)
25-27: LGTM! Consider an alternative approach for defaulting.The parameter changes look good. However, considering the past discussion, here's an alternative approach that might improve readability:
-async def graph_query_completion( - query: str, edges_to_string_converter: Callable = retrieved_edges_to_string -) -> list: +async def graph_query_completion( + query: str, edges_to_string_converter: Callable | None = None +) -> list: + if edges_to_string_converter is None: + edges_to_string_converter = retrieved_edges_to_stringThis approach:
- Makes it explicit that the parameter is optional
- Follows the Python idiom of using None as default
- Maintains the same functionality
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/modules/search/methods/search.py(2 hunks)cognee/tasks/completion/graph_query_completion.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cognee/modules/search/methods/search.py
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on macos-15
- GitHub Check: lint (ubuntu-latest, 3.11.x)
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: windows-latest
- GitHub Check: Test on macos-13
- GitHub Check: lint (ubuntu-latest, 3.10.x)
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: docker-compose-test
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: profiler
🔇 Additional comments (1)
cognee/tasks/completion/graph_query_completion.py (1)
61-61: LGTM! Correct usage of async/await.The function correctly awaits the edges_to_string_converter, which is necessary since it could be any async callable passed as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cognee/tasks/completion/graph_query_completion.py (1)
25-39: Add documentation for the new parameter.The function's docstring is missing documentation for the
context_resolverparameter.Add the parameter to the docstring:
Parameters: - query (str): The query string to compute. + - context_resolver (Callable, optional): Function to convert retrieved edges to string format. + Defaults to retrieved_edges_to_string if None.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/tasks/completion/graph_query_completion.py(3 hunks)cognee/tasks/completion/graph_query_summary_completion.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cognee/tasks/completion/graph_query_summary_completion.py
⏰ Context from checks skipped due to timeout of 90000ms (27)
- GitHub Check: Test on macos-15
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on macos-13
- GitHub Check: run_notebook_test / test
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: Test on macos-15
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: docker-compose-test
- GitHub Check: profiler
🔇 Additional comments (3)
cognee/tasks/completion/graph_query_completion.py (3)
7-7: LGTM!The
Callableimport is appropriately added to support type hinting for the newcontext_resolverparameter.
57-63: LGTM!The implementation of the optional context resolver with a default fallback is clean and follows good practices. The approach allows for flexible customization while maintaining backward compatibility.
10-22:Details
❓ Verification inconclusive
Verify the need for async conversion.
The function has been made async but contains no async operations. While this might be for future-proofing or consistency, consider if this change is necessary.
Let's check if there are any async operations planned for this function:
🏁 Script executed:
#!/bin/bash # Search for TODOs or related comments about async operations rg -i "todo.*async.*edge.*string" || rg -i "todo.*retrieved_edges_to_string" # Check if this function is used in other async contexts ast-grep --pattern 'await retrieved_edges_to_string'Length of output: 132
Async Usage Consideration for
retrieved_edges_to_string
- The function is now declared async but does not invoke any awaitable or asynchronous operations.
- If this change is intended for consistency or future enhancements, please add a clarifying comment to indicate its purpose.
- Otherwise, consider reverting it to a synchronous function to avoid unnecessary async overhead.
Description
Summarize retrieved edges to compact string with no redundancies.
Example:
Before summarization:
CV example:
visual innovations -- employs -- visual innovations
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: [email protected]
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
-- contains -- creativeworks agency
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: [email protected]
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
-- contains -- visual innovations
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: [email protected]
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography
-- contains -- rhode island school of design
Experienced Graphic Designer with over 8 years in visual design and branding, specializing in Adobe Creative Suite and enthusiastic about producing engaging visuals. -- made_from --
CV 4: Not Relevant
Name: David Thompson
Contact Information:
Email: [email protected]
Phone: (555) 456-7890
Summary:
Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.
Education:
B.F.A. in Graphic Design, Rhode Island School of Design (2012)
Experience:
Senior Graphic Designer, CreativeWorks Agency (2015 – Present)
Led design projects for clients in various industries.
Created branding materials that increased client engagement by 30%.
Graphic Designer, Visual Innovations (2012 – 2015)
Designed marketing collateral, including brochures, logos, and websites.
Collaborated with the marketing team to develop cohesive brand strategies.
Skills:
Design Software: Adobe Photoshop, Illustrator, InDesign
Web Design: HTML, CSS
Specialties: Branding and Identity, Typography���
After summarization:
David Thompson is a Creative Graphic Designer with over 8 years of experience in visual design and branding, proficient in Adobe Creative Suite and passionate about creating compelling visuals. He holds a B.F.A. in Graphic Design from the Rhode Island School of Design (2012). His experience includes working as a Senior Graphic Designer at CreativeWorks Agency (2015 – Present), where he led design projects and created branding materials that increased client engagement by 30%, and as a Graphic Designer at Visual Innovations (2012 – 2015), where he designed marketing collateral and collaborated with the marketing team to develop cohesive brand strategies. His skills include design software such as Adobe Photoshop, Illustrator, and InDesign, as well as web design in HTML and CSS, with specialties in Branding and Identity and Typography.
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit