Skip to content

Conversation

@dm1tryG
Copy link
Contributor

@dm1tryG dm1tryG commented Mar 24, 2025

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

I added one example "get all connected nodes to entity"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2025

Walkthrough

This pull request introduces a new class, NaturalLanguageRetriever, which extends BaseRetriever to support natural language queries against a graph database. The class defines asynchronous methods for fetching graph schemas, generating and executing Cypher queries, retrieving context, and obtaining completions. Additionally, the PR updates the search module to incorporate the new SearchType.NATURAL_LANGUAGE and integrates the retriever into the specific_search function. Test cases have been modified to reflect the updated expected outputs for search results and history.

Changes

File(s) Summary
cognee/modules/retrieval/natural_language_retriever.py Introduces NaturalLanguageRetriever with new async methods: _get_graph_schema, _generate_cypher_query, _execute_cypher_query, get_context, and get_completion.
cognee/modules/search/methods/search.py, cognee/modules/search/types/SearchType.py Updates search functionality by importing NaturalLanguageRetriever and mapping SearchType.NATURAL_LANGUAGE to its get_completion method; adds new enum value.
cognee/tests/test_neo4j.py Modifies assertions and output messages to match the updated behavior in search results and history length expectations.
cognee/infrastructure/llm/prompts/natural_language_retriever_system.txt Introduces a new file outlining guidelines for generating Cypher queries from natural language, detailing graph schema requirements and optimization strategies.

Possibly related issues

Possibly related PRs

  • feat: adds cypher search to retrievers module #648: Introduces the NaturalLanguageRetriever with methods that include get_completion, which is directly utilized in the specific_search function of the retrieved PR.
  • test: test retrievers [cog-1433] #635: The changes in the main PR, which introduce the NaturalLanguageRetriever class and its methods, are directly related to the modifications in the retrieved PR as it adds a new retrieval option in the specific_search function.
  • feat: entity brute force triplet search [COG-1325] #589: The changes in the main PR introduce a new class NaturalLanguageRetriever with methods that include get_completion, which is directly utilized in the specific_search function of the retrieved PR, indicating a strong connection at the code level.

Suggested reviewers

  • Vasilije1990
  • borisarzentar
  • hajdul88
  • dexters1

Poem

I'm a rabbit hopping in code so spry,
New queries bloom like carrots in the sky.
With NaturalLanguage magic on the run,
Graph quests now shine just like the sun.
Debug and test—we cheer and sigh,
For changes crafted with a twinkle in my eye!
🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
cognee/tests/test_neo4j.py (1)

71-71: Refine log message
While “Extracted results” is acceptable, consider providing a more context-specific log message to help differentiate between different types of returned data in your logs.

cognee/modules/retrieval/natural_language_retriever.py (5)

7-7: Remove unused import
generate_completion is imported but never used. It is best practice to remove unused imports to keep the module lean.

- from cognee.modules.retrieval.utils.completion import generate_completion
🧰 Tools
🪛 Ruff (0.8.2)

7-7: cognee.modules.retrieval.utils.completion.generate_completion imported but unused

Remove unused import: cognee.modules.retrieval.utils.completion.generate_completion

(F401)


28-45: Check for empty schema
This method retrieves node and edge schemas from the graph. Consider handling scenarios where no relevant nodes or edges exist in the database to avoid unexpected results.


46-131: Externalize large system prompt
The embedded system prompt is quite extensive. For clarity, consider storing the prompt in a separate file or using a dedicated prompt manager to improve readability and maintainability.


132-169: Enhance retry handling
The multi-attempt logic here is good. As an optional improvement, consider adding a small delay or exponential backoff between attempts, especially if rate limits or transient errors could occur.


183-189: Return structure clarity
Returning context directly in get_completion is straightforward. If you plan to do any additional processing on the context, consider clarifying or documenting the structure of the returned data for maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bf30f7 and 7357cc6.

📒 Files selected for processing (4)
  • cognee/modules/retrieval/natural_language_retriever.py (1 hunks)
  • cognee/modules/search/methods/search.py (2 hunks)
  • cognee/modules/search/types/SearchType.py (1 hunks)
  • cognee/tests/test_neo4j.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
cognee/tests/test_neo4j.py (1)
cognee/api/v1/search/routers/get_search_router.py (1)
  • get_search_history (27-33)
🪛 Ruff (0.8.2)
cognee/modules/retrieval/natural_language_retriever.py

7-7: cognee.modules.retrieval.utils.completion.generate_completion imported but unused

Remove unused import: cognee.modules.retrieval.utils.completion.generate_completion

(F401)

🔇 Additional comments (5)
cognee/modules/search/types/SearchType.py (1)

13-13: Addition of new enum value
This addition cleanly expands the existing search types to support natural language. Good job!

cognee/tests/test_neo4j.py (1)

77-77: Validate updated history length
Verifying that the length of the history is now exactly 10 could introduce fragility if the test environment or search sequence changes. Please confirm that 10 is not an arbitrary magic number and that each new search action is truly expected to increment the history up to 10.

cognee/modules/search/methods/search.py (2)

16-16: Import usage
The NaturalLanguageRetriever import looks good, and it is indeed used below.


71-71: Appropriate new search task
Great job mapping SearchType.NATURAL_LANGUAGE to NaturalLanguageRetriever().get_completion. This cleanly integrates the new retriever into existing search logic.

cognee/modules/retrieval/natural_language_retriever.py (1)

170-182: Context retrieval logic
This function nicely handles the graph engine check and raises a SearchTypeNotSupported if using NetworkX. The error handling is clear and properly logged.

@dm1tryG
Copy link
Contributor Author

dm1tryG commented Mar 24, 2025

@Vasilije1990 @borisarzentar @hajdul88

added new PR (instead #646) - can you check it please and give a feedback?

@dm1tryG dm1tryG changed the title Feat: Natural Language Retriever Feat: Natural Language Retriever (text2cypher) Mar 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cognee/tests/test_neo4j.py (1)

73-75: Consider removing or converting print statements.
Using print in test code can clutter logs. Consider using logging instead for better maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7357cc6 and 19c44cc.

📒 Files selected for processing (1)
  • cognee/tests/test_neo4j.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
cognee/tests/test_neo4j.py (2)
cognee/api/v1/search/search_v2.py (1)
  • search (10-31)
cognee/api/v1/search/routers/get_search_router.py (2)
  • search (36-47)
  • get_search_history (27-33)
🪛 GitHub Actions: test | neo4j
cognee/tests/test_neo4j.py

[error] 106-106: OSError: Missing required Neo4j credentials.


[warning] 1-1: Ontology file 'None' not found. Using fallback ontology at http://example.org/empty_ontology.


[warning] 1-1: Ontology file 'None' not found. Using fallback ontology at http://example.org/empty_ontology.

⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: chromadb test
  • GitHub Check: Test on macos-13
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_notebook_test / test
  • GitHub Check: windows-latest
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: run_simple_example_test
  • GitHub Check: lint (ubuntu-latest, 3.10.x)
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: docker-compose-test
  • GitHub Check: Build Cognee Backend Docker App Image
🔇 Additional comments (3)
cognee/tests/test_neo4j.py (3)

72-72: Text update looks good.
Renaming "Extracted summaries are:" to "Extracted results are:" is more aligned with the broader result set.


76-81: Good addition of NATURAL_LANGUAGE query test.
This addresses the request to verify the new search type and ensures the results are not empty.


87-87:

Details

✅ Verification successful

Confirm updated search history length.
Ensure the test environment indeed logs 10 entries, given the new searches introduced.


🏁 Script executed:

#!/bin/bash
# Searching for references to the logging of search history in the codebase.
rg 'search_history' -A 5 -B 5

Length of output: 9744


Confirm updated search history length

The assertion in cognee/tests/test_neo4j.py now correctly expects 10 search history entries. Please verify that the test environment is logging exactly 10 entries for Neo4j after incorporating the new searches.

borisarzentar
borisarzentar previously approved these changes Mar 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
cognee/modules/retrieval/natural_language_retriever.py (4)

17-25: Update docstring to match the current parameters

The docstring mentions "custom prompt paths" (plural) but the constructor only accepts a single prompt path parameter (system_prompt_path).

-        """Initialize retriever with optional custom prompt paths."""
+        """Initialize retriever with optional custom system prompt path and maximum retry attempts."""

67-71: Simplify log string formatting

The conditional formatting of the log message is a bit verbose. Consider using a simpler approach.

-                logger.info(
-                    f"Executing generated Cypher query (attempt {attempt + 1}): {cypher_query[:100]}..."
-                    if len(cypher_query) > 100
-                    else cypher_query
-                )
+                log_query = f"{cypher_query[:100]}..." if len(cypher_query) > 100 else cypher_query
+                logger.info(f"Executing generated Cypher query (attempt {attempt + 1}): {log_query}")

14-16: Enhance class docstring with more details

The class docstring is quite brief. Consider expanding it to provide more context about what natural language search entails and how this retriever works.

-    """Retriever for handling natural language search"""
+    """
+    Retriever for handling natural language search.
+    
+    This class converts natural language queries into Cypher queries that can be
+    executed against a graph database. It uses LLM to generate the Cypher queries
+    based on the graph schema and implements retry logic to handle failed attempts.
+    """

105-110: Consider adding more functionality to get_completion

The get_completion method currently just returns the context. Consider enhancing it to provide more valuable completions based on the context, perhaps by using an LLM to generate a response.

     async def get_completion(self, query: str, context: Optional[Any] = None) -> Any:
-        """Returns a completion based on the query and context."""
+        """
+        Returns a completion based on the query and context.
+        
+        Currently, this method simply returns the context. Consider enhancing it
+        to generate more informative responses by using an LLM to process the
+        context and generate a human-readable answer.
+        """
         if context is None:
             context = await self.get_context(query)
 
         return context
cognee/infrastructure/llm/prompts/natural_language_retriever_system.txt (4)

60-61: Fix grammar in the prompt text

There's a grammatical error in the sentence structure.

-This queries doesn't work. Do NOT use them:
+These queries don't work. Do NOT use them:
🧰 Tools
🪛 LanguageTool

[grammar] ~60-~60: Consider using the singular form after the singular determiner “This”.
Context: ...p properties): {{edge_schemas}} This queries doesn't work. Do NOT use them: `{{previ...

(AGREEMENT_SENT_START)


1-66: Consider adding more examples to the prompt

The prompt currently only contains one example (Get all nodes connected to John). More examples of varying complexity would help the LLM better understand the expected output format and handle a wider range of user queries.

Consider adding 2-3 more examples with different query patterns, such as:

  1. Finding paths between entities
  2. Filtering based on properties
  3. Aggregating data

For example:

Example 2:
Find all TextDocuments created after Jan 1, 2023 that mention "AI"
MATCH (d:TextDocument)
WHERE d.created_at > '2023-01-01' AND d.text CONTAINS "AI" 
RETURN d.id, d.name, d.created_at
LIMIT 10

Example 3:
Find the shortest path between entities "Machine Learning" and "Natural Language Processing"
MATCH p=shortestPath((a:Entity {name: 'Machine Learning'})-[*]-(b:Entity {name: 'Natural Language Processing'}))
RETURN p
🧰 Tools
🪛 LanguageTool

[grammar] ~60-~60: Consider using the singular form after the singular determiner “This”.
Context: ...p properties): {{edge_schemas}} This queries doesn't work. Do NOT use them: `{{previ...

(AGREEMENT_SENT_START)


57-58: Enhance edge schema template placeholder

The current placeholder {{edge_schemas}} will be replaced with a list of edge properties, but it doesn't provide context about how these edge schemas will be structured in the template.

Consider adding a sample format comment to indicate how the edge schemas will be presented:

Edge schema (relationship properties):
-`{{edge_schemas}}`
+`{{edge_schemas}}` 
+<!-- Example format:
+- HAS_ENTITY: Connects TextDocument to Entity
+- BELONGS_TO: Connects Entity to EntityType
+- CONTAINS: Connects TextDocument to DocumentChunk
+-->

This will make the template more maintainable and easier to understand when reviewing the code.


13-20: Add guidance for handling complex results

The "QUERY REQUIREMENTS" section focuses on query generation but doesn't provide guidance on handling complex result structures. This could be crucial for ensuring the generated queries return results in a useful format.

QUERY REQUIREMENTS:
1. Return ONLY the exact Cypher query with NO explanations, comments, or markdown
2. Generate syntactically correct Neo4j Cypher code (Neo4j 4.4+ compatible)
3. Be precise - match the exact property names and relationship types from the schema
4. Handle complex queries by breaking them into logical pattern matching parts
5. Use parameters (e.g., $name) for literal values when appropriate
6. Use appropriate data types for parameters (strings, numbers, booleans)
+7. Structure results appropriately - use aliases and object construction for complex results
+8. Consider the intended use of the results - prioritize properties that answer the user's question
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9529f2c and ee585d5.

📒 Files selected for processing (2)
  • cognee/infrastructure/llm/prompts/natural_language_retriever_system.txt (1 hunks)
  • cognee/modules/retrieval/natural_language_retriever.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cognee/infrastructure/llm/prompts/natural_language_retriever_system.txt

[grammar] ~60-~60: Consider using the singular form after the singular determiner “This”.
Context: ...p properties): {{edge_schemas}} This queries doesn't work. Do NOT use them: `{{previ...

(AGREEMENT_SENT_START)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: windows-latest
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
cognee/modules/retrieval/natural_language_retriever.py (1)

1-118: 🛠️ Refactor suggestion

Missing documentation about system prompt format requirements

The class depends heavily on a system prompt template but doesn't provide any documentation about its required format or content, making it harder for other developers to work with this class.

Add class-level documentation explaining the expected format and variables for the system prompt:

 class NaturalLanguageRetriever(BaseRetriever):
-    """Retriever for handling natural language search"""
+    """Retriever for handling natural language search
+    
+    This class converts natural language queries into Cypher queries for Neo4j graph databases.
+    It requires a system prompt template that instructs the LLM how to generate valid Cypher.
+    
+    The system prompt should include the following variables:
+    - `{{edge_schemas}}`: Will be replaced with schema information from the graph database
+    - `{{previous_attempts}}`: Will be replaced with information about previous query attempts
+    
+    The prompt should instruct the LLM to:
+    1. Analyze the natural language query
+    2. Understand the available schema
+    3. Generate a valid Cypher query that answers the question
+    4. Handle errors from previous attempts (if any)
+    5. Return only the Cypher query as a string without any explanations
+    """
♻️ Duplicate comments (1)
cognee/modules/retrieval/natural_language_retriever.py (1)

13-25: 🛠️ Refactor suggestion

Class initializer needs to handle the prompt path more robustly

The class initializes with a hard-coded prompt path that doesn't include the full directory structure. This might lead to issues finding the prompt file.

-    def __init__(
-        self,
-        system_prompt_path: str = "natural_language_retriever_system.txt",
-        max_attempts: int = 3,
-    ):
-        """Initialize retriever with optional custom prompt paths."""
-        self.system_prompt_path = system_prompt_path
+    def __init__(
+        self,
+        system_prompt_path: str = "llm/prompts/natural_language_retriever_system.txt",
+        max_attempts: int = 3,
+    ):
+        """Initialize retriever with optional custom prompt paths.
+        
+        Args:
+            system_prompt_path: Path to the system prompt template file relative to prompts directory
+            max_attempts: Maximum number of attempts to generate and execute a valid Cypher query
+        """
+        self.system_prompt_path = system_prompt_path
🧹 Nitpick comments (5)
cognee/modules/retrieval/natural_language_retriever.py (5)

26-43: Graph schema query could be more comprehensive

The node schema query returns labels and properties, but the edge schema query only returns property keys without relationship types, which may limit the ability to generate accurate Cypher queries.

     async def _get_graph_schema(self, graph_engine) -> tuple:
         """Retrieve the node and edge schemas from the graph database."""
         node_schemas = await graph_engine.query(
             """
             MATCH (n)
             UNWIND keys(n) AS prop
             RETURN DISTINCT labels(n) AS NodeLabels, collect(DISTINCT prop) AS Properties;
             """
         )
         edge_schemas = await graph_engine.query(
             """
-            MATCH ()-[r]->()
-            UNWIND keys(r) AS key
-            RETURN DISTINCT key;
+            MATCH ()-[r]->()
+            UNWIND keys(r) AS prop
+            RETURN DISTINCT type(r) AS RelationshipType, collect(DISTINCT prop) AS Properties;
             """
         )
         return node_schemas, edge_schemas

61-98: Improve error handling and result validation in query execution

The method catches all exceptions without differentiating between query generation errors and execution errors. Additionally, there's no validation of the context format or structure.

     async def _execute_cypher_query(self, query: str, graph_engine: GraphDBInterface) -> Any:
         """Execute the natural language query against Neo4j with multiple attempts."""
         node_schemas, edge_schemas = await self._get_graph_schema(graph_engine)
         previous_attempts = ""
         cypher_query = ""
 
         for attempt in range(self.max_attempts):
             logger.info(f"Starting attempt {attempt + 1}/{self.max_attempts} for query generation")
             try:
-                cypher_query = await self._generate_cypher_query(
-                    query, edge_schemas, previous_attempts
-                )
+                try:
+                    cypher_query = await self._generate_cypher_query(
+                        query, edge_schemas, previous_attempts
+                    )
+                except Exception as e:
+                    logger.error(f"Error in query generation (attempt {attempt + 1}): {str(e)}")
+                    previous_attempts += f"Query generation error: {str(e)}\n"
+                    continue
 
                 logger.info(
                     f"Executing generated Cypher query (attempt {attempt + 1}): {cypher_query[:100]}..."
                     if len(cypher_query) > 100
                     else cypher_query
                 )
-                context = await graph_engine.query(cypher_query)
+                try:
+                    context = await graph_engine.query(cypher_query)
+                except Exception as e:
+                    logger.error(f"Error executing Cypher query (attempt {attempt + 1}): {str(e)}")
+                    previous_attempts += f"Query: {cypher_query} -> Execution error: {str(e)}\n"
+                    continue
 
                 if context:
                     result_count = len(context) if isinstance(context, list) else 1
                     logger.info(
                         f"Successfully executed query (attempt {attempt + 1}): returned {result_count} result(s)"
                     )
                     return context
+                elif context == []:
+                    logger.info(f"Query executed successfully but returned empty results (attempt {attempt + 1})")
+                    previous_attempts += f"Query: {cypher_query} -> Result: Empty list\n"
+                else:
+                    logger.warning(f"Query returned unexpected result type: {type(context)} (attempt {attempt + 1})")
+                    previous_attempts += f"Query: {cypher_query} -> Result: Unexpected type {type(context)}\n"
-                previous_attempts += f"Query: {cypher_query} -> Result: None\n"
+                previous_attempts += f"Query: {cypher_query} -> Result: No data returned\n"
 
             except Exception as e:
                 previous_attempts += f"Query: {cypher_query if 'cypher_query' in locals() else 'Not generated'} -> Executed with error: {e}\n"
                 logger.error(f"Error executing query: {str(e)}")
 
         logger.warning(
             f"Failed to get results after {self.max_attempts} attempts for query: '{query[:50]}...'"
         )
         return []

99-111: Add more specific exception handling in get_context

The method catches all exceptions without distinguishing between different types of errors that could occur.

     async def get_context(self, query: str) -> Optional[Any]:
         """Retrieves relevant context using a natural language query converted to Cypher."""
         try:
             graph_engine = await get_graph_engine()
 
             if isinstance(graph_engine, (NetworkXAdapter)):
                 raise SearchTypeNotSupported("Natural language search type not supported.")
 
             return await self._execute_cypher_query(query, graph_engine)
+        except SearchTypeNotSupported as e:
+            logger.error("Natural language search not supported with current graph engine: %s", str(e))
+            raise
+        except ValueError as e:
+            logger.error("Invalid query or parameters: %s", str(e))
+            raise
         except Exception as e:
-            logger.error("Failed to execute natural language search retrieval: %s", str(e))
-            raise e
+            logger.error("Unexpected error during natural language search retrieval: %s", str(e))
+            raise

112-118: Consider enhancing the get_completion method

The current implementation simply returns the context without any processing. Since this is a natural language retriever, consider formatting the results or generating a textual response based on the retrieved context.

     async def get_completion(self, query: str, context: Optional[Any] = None) -> Any:
         """Returns a completion based on the query and context."""
         if context is None:
             context = await self.get_context(query)
 
-        return context
+        # If no context was found, return an informative empty result
+        if not context:
+            return {"result": [], "message": "No data found matching your query."}
+            
+        # Return structured response with the original query and results
+        return {
+            "query": query,
+            "result": context,
+            "message": f"Found {len(context) if isinstance(context, list) else 1} result(s) matching your query."
+        }

19-21: Consider adding parameter type annotations

The constructor parameters lack type annotations, which would improve code readability and IDE support.

     def __init__(
         self,
-        system_prompt_path: str = "natural_language_retriever_system.txt",
-        max_attempts: int = 3,
+        system_prompt_path: str = "natural_language_retriever_system.txt",
+        max_attempts: int = 3,
     ):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee585d5 and b21726c.

📒 Files selected for processing (1)
  • cognee/modules/retrieval/natural_language_retriever.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (27)
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-15
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: chromadb test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: run_simple_example_test
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: docker-compose-test
🔇 Additional comments (1)
cognee/modules/retrieval/natural_language_retriever.py (1)

1-11: Imports are well-organized and appropriate

The imports are logically grouped and include all necessary components for the natural language retriever functionality.

@dm1tryG
Copy link
Contributor Author

dm1tryG commented Mar 27, 2025

@borisarzentar I have fixed by your comments (tests + moved prompt to llm/prompts)

@borisarzentar
Copy link
Member

@borisarzentar I have fixed by your comments (tests + moved prompt to llm/prompts)

Great!

@borisarzentar borisarzentar merged commit de5b7f2 into topoteretes:dev Mar 27, 2025
10 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants