Skip to content

Conversation

@Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Jan 10, 2025

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added visualization tool for knowledge graph.
    • Introduced functions for named entity recognition and part-of-speech tagging.
  • Improvements

    • Enhanced server's image processing capabilities.
    • Added functionality to retrieve the most recent PNG file.
    • Integrated image handling with PIL library.
    • Enabled saving graph visualizations as PNG images with timestamps.
    • Updated documentation with debugging and installation reset instructions.
  • Dependency Updates

    • Updated various dependencies to newer versions.
    • Specified minimum Python version requirement.
  • Workflows

    • Introduced automated DCO validation for pull requests.
    • Streamlined Docker image build and push workflow.
    • Added steps to download NLTK tokenizer data for testing workflows.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 10, 2025

Walkthrough

The pull request introduces enhanced image processing and visualization capabilities for the Cognee knowledge graph system. The changes span several files, primarily cognee_mcp/server.py and cognee/shared/utils.py. Modifications include the addition of a new "visualize" tool for graph visualization, the ability to save visualizations as PNG files, and a function to retrieve the most recently created PNG file. Additionally, updates to the README and pyproject.toml ensure proper documentation and dependency management.

Changes

File Change Summary
cognee_mcp/server.py - Added get_freshest_png() function to retrieve the latest PNG file
- Introduced "visualize" tool in handle_list_tools
- Updated handle_call_tool to support visualization
cognee/shared/utils.py - Added get_entities() and extract_pos_tags() functions
- Updated create_cognee_style_network_with_logo() for PNG export
cognee-mcp/README.md - Added debugging and installation reset instructions
cognee-mcp/pyproject.toml - Updated Python version requirement and several package dependencies
cognee/__init__.py - Replaced import of visualize with visualize_graph
pyproject.toml - Specified version for fastapi-users and updated nltk dependency
.github/workflows/dockerhub.yml - Updated workflow name and steps for building and pushing Docker images
.github/pull_request_template.md - Introduced a new pull request template with sections for description and DCO affirmation
.github/workflows/approve_dco.yaml - Added a workflow to validate DCO compliance in pull requests

Possibly related PRs

  • Format with Ruff 0.9.0 #425: The changes in the get_filtered_graph_data method of the Neo4jAdapter class involve string formatting in a query, which is related to the overall theme of improving code readability and consistency, similar to the formatting changes made in the main PR's cognee_mcp/server.py file.

Poem

🐰 A rabbit's tale of graphs so bright,
Visualizing knowledge with pure delight,
PNG images, fresh and new,
Timestamps dancing, a graphing breakthrough!
Cognee's magic, now clear to see! 🖼️

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Vasilije1990 Vasilije1990 changed the title Add data visualization for Anthropic feat: Add data visualization for Anthropic Jan 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e983c21 and daf2d54.

📒 Files selected for processing (2)
  • cognee-mcp/cognee_mcp/server.py (3 hunks)
  • cognee/shared/utils.py (3 hunks)
🧰 Additional context used
🪛 GitHub Actions: ruff format
cognee-mcp/cognee_mcp/server.py

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

🔇 Additional comments (1)
cognee-mcp/cognee_mcp/server.py (1)

92-101: LGTM: Tool definition looks good

The "visualize" tool is properly defined with a clear description and appropriate schema.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

Comment on lines 13 to 14
from PIL import Image
from PIL import Image as PILImage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove duplicate PIL Image import

The Image module from PIL is imported twice with different aliases. This is redundant and could cause confusion.

-from PIL import Image
-from PIL import Image as PILImage
+from PIL import Image
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from PIL import Image
from PIL import Image as PILImage
from PIL import Image
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

Comment on lines 447 to 454
# Get the latest Unix timestamp as an integer
timestamp = int(time.time())

# Construct your filename
filename = f"{timestamp}.png"

export_png(p, filename=filename)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for PNG export

The PNG export functionality lacks error handling and cleanup of old files.

     # Get the latest Unix timestamp as an integer
     timestamp = int(time.time())

     # Construct your filename
     filename = f"{timestamp}.png"

-    export_png(p, filename=filename)
+    try:
+        # Cleanup old PNG files to prevent disk space issues
+        cleanup_old_pngs(directory=".", keep_latest=5)
+        
+        # Export the new PNG
+        export_png(p, filename=filename)
+    except Exception as e:
+        logging.error(f"Failed to export PNG: {str(e)}")
+        raise

Consider adding a helper function to cleanup old PNG files:

def cleanup_old_pngs(directory: str, keep_latest: int = 5):
    """Cleanup old PNG files, keeping only the N latest files."""
    png_files = [f for f in os.listdir(directory) if f.endswith('.png')]
    if len(png_files) <= keep_latest:
        return
        
    # Sort by timestamp in filename
    sorted_files = sorted(png_files, key=lambda x: int(x.replace(".png", "")))
    
    # Remove older files
    for f in sorted_files[:-keep_latest]:
        try:
            os.remove(os.path.join(directory, f))
        except OSError as e:
            logging.warning(f"Failed to remove old PNG file {f}: {str(e)}")

Vasilije1990 and others added 2 commits January 11, 2025 16:57
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
cognee-mcp/cognee_mcp/server.py (1)

Fix import statements organization

The file has the following formatting issues:

  • Remove duplicate PIL import: from PIL import Image as PILImage is redundant since Image is already imported
  • Organize imports according to PEP 8 style guide:
    1. Standard library imports
    2. Third-party imports
    3. Local application imports
🔗 Analysis chain

Line range hint 1-230: Fix formatting issues

The file requires formatting according to Ruff's standards. Please run the formatter:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

ruff format cognee-mcp/cognee_mcp/server.py

Length of output: 140


Script:

#!/bin/bash
# Check for basic formatting patterns
echo "=== Checking import statements ==="
rg "^import|^from.*import" cognee-mcp/cognee_mcp/server.py

echo -e "\n=== Checking blank lines between functions ==="
rg "^def|^async def" -A 1 cognee-mcp/cognee_mcp/server.py

echo -e "\n=== Checking indentation ==="
rg "^[[:space:]]+[^[:space:]]" cognee-mcp/cognee_mcp/server.py

Length of output: 8158

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

🧹 Nitpick comments (1)
cognee-mcp/cognee_mcp/server.py (1)

105-130: Add docstring to document timestamp format requirement

The function assumes PNG files are named with timestamps but this requirement isn't documented.

 def get_freshest_png(directory: str) -> Image.Image:
+    """
+    Retrieve the most recently created PNG file from a directory.
+    
+    Args:
+        directory (str): Path to directory containing PNG files
+        
+    Returns:
+        Image.Image: PIL Image object of the most recent PNG
+        
+    Raises:
+        FileNotFoundError: If directory doesn't exist or no PNG files found
+        ValueError: If PNG filenames don't follow timestamp format (e.g., '1673185134.png')
+        IOError: If PNG file can't be opened
+        
+    Note:
+        PNG files must be named with Unix timestamps (e.g., '1673185134.png')
+    """
     if not os.path.exists(directory):
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between daf2d54 and 7b0bfe9.

📒 Files selected for processing (1)
  • cognee-mcp/cognee_mcp/server.py (3 hunks)
🧰 Additional context used
🪛 GitHub Actions: ruff format
cognee-mcp/cognee_mcp/server.py

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: docker-compose-test
🔇 Additional comments (1)
cognee-mcp/cognee_mcp/server.py (1)

13-14: Remove duplicate PIL Image import

The Image module from PIL is imported twice with different aliases. This is redundant and could cause confusion.

-from PIL import Image
-from PIL import Image as PILImage
+from PIL import Image
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

with redirect_stdout(fnull), redirect_stderr(fnull):
try:
await cognee.visualize
img = get_freshest_png(".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use configured output directory instead of current directory

Using "." as the directory path is brittle and depends on the current working directory when the server starts. Consider using a configured output directory.

-                    img = get_freshest_png(".")
+                    img = get_freshest_png(cognee.config.visualization_output_dir)

You'll need to ensure the output directory is properly configured in the Cognee configuration.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

Comment on lines +92 to +101
types.Tool(
name="visualize",
description="Visualize the knowledge graph.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"},
},
},
),
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clarify query parameter requirement in visualization tool schema

The tool's schema defines a "query" property but doesn't mark it as required. Either:

  1. Mark it as required if visualization needs a query parameter
  2. Document that it's optional if visualization can work without a query
         types.Tool(
             name="visualize",
             description="Visualize the knowledge graph.",
             inputSchema={
                 "type": "object",
                 "properties": {
                     "query": {"type": "string"},
                 },
+                "required": ["query"],  # Add this if query is required
             },
         ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
types.Tool(
name="visualize",
description="Visualize the knowledge graph.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"},
},
},
),
types.Tool(
name="visualize",
description="Visualize the knowledge graph.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"},
},
"required": ["query"], # Add this if query is required
},
),
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion from coderabbit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@borisarzentar Thank you for confirming! I'm glad the suggestion was helpful.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣴⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣆⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣿⣿⡄⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⡿⢿⣿⡿⠟⢻⣷⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⣰⣿⣿⠟⠉⠀⢸⣿⠀⠀⢸⣿⣇⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⣰⣿⣿⠋⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⡄⠀⠀⠀⠀⠀
⠀⠀⠀⣰⣿⣿⠃⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣷⠀⠀⠀⠀⠀
⠀⠀⢠⣿⣿⠇⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⡄⠀⠀⠀⠀
⠀⢠⣿⣿⡏⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣷⡀⠀⠀⠀
⢠⣿⣿⡟⠀⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣿⣷⡀⠀⠀
⣿⣿⡟⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣿⣿⣿⡄⠀
⠛⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⠀

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
from mcp.server import NotificationOptions, Server
from mcp.server.models import InitializationOptions
from PIL import Image
from PIL import Image as PILImage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double import here

Comment on lines +92 to +101
types.Tool(
name="visualize",
description="Visualize the knowledge graph.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"},
},
},
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion from coderabbit

with open(os.devnull, "w") as fnull:
with redirect_stdout(fnull), redirect_stderr(fnull):
try:
await cognee.visualize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should visualize receive the parameter where to put the image it generates? Then we can send that path to get_freshest_png to retrieve the image.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
cognee-mcp/README.md (2)

88-91: Minor grammar fix needed in documentation.

Add "the" before "debugger" for better readability.

-To use debugger, run:
+To use the debugger, run:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~88-~88: You might be missing the article “the” here.
Context: ...y Restart your Claude desktop. To use debugger, run: ```bash npx @modelcontextprotocol...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


93-94: Add command description for clarity.

The reset installation command would benefit from a brief description of when to use it.

-To reset the installation
-uv sync --dev --all-extras --reinstall 
+To reset the installation (useful when dependencies are corrupted):
+```bash
+uv sync --dev --all-extras --reinstall
+```
cognee-mcp/cognee_mcp/server.py (1)

6-7: Remove unused logging import.

The logging import from sqlalchemy.testing.plugin.plugin_base is not used in the code.

-from sqlalchemy.testing.plugin.plugin_base import logging
cognee/shared/utils.py (1)

Line range hint 377-494: Consider removing duplicate logo embedding.

The logo is embedded twice in the visualization, at both bottom-right and top-left positions. This might be unnecessary and could affect performance. Consider keeping only one logo instance unless there's a specific requirement for two logos.

    logging.info("Embedding logo into visualization...")
    embed_logo(p, layout_scale, logo_alpha, "bottom_right")
-    embed_logo(p, layout_scale, logo_alpha, "top_left")
🧰 Tools
🪛 Ruff (0.8.2)

429-429: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf4737b and 047948a.

⛔ Files ignored due to path filters (2)
  • cognee-mcp/uv.lock is excluded by !**/*.lock
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • cognee-mcp/README.md (1 hunks)
  • cognee-mcp/cognee_mcp/server.py (3 hunks)
  • cognee-mcp/pyproject.toml (4 hunks)
  • cognee/__init__.py (1 hunks)
  • cognee/shared/utils.py (4 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cognee-mcp/README.md

[uncategorized] ~88-~88: You might be missing the article “the” here.
Context: ...y Restart your Claude desktop. To use debugger, run: ```bash npx @modelcontextprotocol...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 Ruff (0.8.2)
cognee/shared/utils.py

429-429: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🔇 Additional comments (6)
cognee/__init__.py (1)

7-7: LGTM! Verify function usage across codebase.

The renamed import to visualize_graph is more descriptive of its purpose.

Let's verify all usages of the old and new function names:

✅ Verification successful

✅ Function rename verification complete

The visualize to visualize_graph rename is consistently applied across the codebase. The new function is properly used in cognee-mcp/cognee_mcp/server.py and no instances of the old function name remain.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining usages of old 'visualize' function
rg -l "\.visualize\(" 

# Search for usages of new 'visualize_graph' function
rg -l "\.visualize_graph\("

Length of output: 85


Script:

#!/bin/bash
# Check the implementation in server.py
rg "visualize_graph" cognee-mcp/cognee_mcp/server.py -A 2 -B 2

Length of output: 273

cognee-mcp/pyproject.toml (2)

94-94: LGTM! Script entry point is properly configured.

The cognee script entry point is correctly defined.


6-6: Verify compatibility with major version updates.

Several dependencies have significant version jumps:

  • pandas (2.0.3 → 2.2.3)
  • instructor (1.5.2 → 1.7.2)
  • litellm (1.49.1 → 1.57.2)

Let's check for any breaking changes in the major updates:

Also applies to: 9-9, 24-25, 28-28, 35-35, 47-48, 54-54, 59-59

✅ Verification successful

Dependencies are compatible with current usage

The codebase uses basic, stable features from all updated libraries:

  • pandas: Simple DataFrame creation and iteration
  • instructor: Basic litellm integration
  • litellm: Standard completion API calls
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for pandas DataFrame operations that might be affected
rg "pd\.DataFrame" -A 3

# Search for instructor decorators and usage
rg "@instruction" -A 3

# Search for litellm configurations
rg "litellm\.completion" -A 3

Length of output: 1927

pyproject.toml (1)

66-66: LGTM! Version alignment with cognee-mcp.

The fastapi-users version is properly pinned and aligned with cognee-mcp's requirements.

cognee-mcp/cognee_mcp/server.py (1)

94-103: Clarify query parameter requirement in visualization tool schema.

The tool's schema defines a "query" property but doesn't mark it as required. Either mark it as required if visualization needs a query parameter, or document that it's optional.

cognee/shared/utils.py (1)

425-432: 🛠️ Refactor suggestion

Remove unused variable and add PNG export.

The filename variable is created but never used. Additionally, the PNG export functionality is incomplete.

    # Get the latest Unix timestamp as an integer
    timestamp = int(time.time())

    # Construct your filename
    filename = f"{timestamp}.png"
+
+    # Export the visualization as PNG
+    try:
+        export_png(p, filename=filename)
+    except Exception as e:
+        logging.error(f"Failed to export PNG: {str(e)}")
+        raise

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

429-429: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

Comment on lines 198 to 212
elif name == "visualize":
with open(os.devnull, "w") as fnull:
with redirect_stdout(fnull), redirect_stderr(fnull):
try:
result = await cognee.visualize_graph()
results = retrieved_edges_to_string(result)

return [
types.TextContent(
type="text",
text=results,
)
]
except (FileNotFoundError, IOError, ValueError) as e:
raise ValueError(f"Failed to create visualization: {str(e)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return visualization output instead of text content.

The visualization tool returns text content (types.TextContent) despite being described as a tool to "Visualize the knowledge graph." Consider returning the visualization as an image using types.ImageContent or as an embedded resource using types.EmbeddedResource.

                    return [
-                       types.TextContent(
-                           type="text",
-                           text=results,
-                       )
+                       types.ImageContent(
+                           type="image",
+                           data=await cognee.visualize_graph(),
+                           format="png"
+                       )
                    ]

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
cognee/shared/utils.py (1)

Line range hint 1-526: Format code using Ruff

The pipeline indicates that the file requires formatting using the Ruff formatter.

Run the following command to format the code:

ruff format cognee/shared/utils.py
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

♻️ Duplicate comments (1)
cognee/shared/utils.py (1)

453-460: ⚠️ Potential issue

Add error handling for PNG export

The PNG export functionality lacks error handling and cleanup of old files.

A past review has already suggested adding error handling and cleanup functionality for PNG exports. The same issues still apply here.

🧰 Tools
🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable filename is assigned to but never used

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

🧹 Nitpick comments (2)
cognee/shared/utils.py (2)

44-47: Optimize NLTK resource downloads

Downloading NLTK resources on every function call is inefficient. Consider moving these downloads to a module initialization or a dedicated setup function.

+def setup_nltk_resources():
+    """Download required NLTK resources if not already present."""
+    try:
+        nltk.download("words", quiet=True)
+        nltk.download("punkt", quiet=True)
+        nltk.download("averaged_perceptron_tagger", quiet=True)
+        nltk.download("maxent_ne_chunker", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise
+
 def extract_pos_tags(sentence):
     """Extract Part-of-Speech (POS) tags for words in a sentence."""
-    # Ensure that the necessary NLTK resources are downloaded
-    nltk.download("words", quiet=True)
-    nltk.download("punkt", quiet=True)
-    nltk.download("averaged_perceptron_tagger", quiet=True)
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter


Line range hint 405-526: Consider splitting visualization code into smaller components

The create_cognee_style_network_with_logo function is handling too many responsibilities, making it hard to maintain and test.

Consider:

  1. Extracting logo embedding into a separate class
  2. Creating a dedicated NetworkVisualizer class
  3. Separating layout and styling logic
  4. Moving PNG export functionality to a dedicated service

This will improve maintainability, testability, and reusability of the visualization components.

🧰 Tools
🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable filename is assigned to but never used

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ba98b2 and ad07bae.

📒 Files selected for processing (1)
  • cognee/shared/utils.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/shared/utils.py

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff lint
cognee/shared/utils.py

[error] 457-457: Local variable filename is assigned to but never used

🪛 GitHub Actions: ruff format
cognee/shared/utils.py

[warning] File requires formatting using Ruff formatter

🪛 GitHub Actions: test | weaviate
cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the required package using 'poetry add nltk'.

🪛 GitHub Actions: test | milvus
cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the required package using 'pip install nltk' or add it to poetry dependencies.

🪛 GitHub Actions: test | neo4j
cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the package using poetry or pip.

🪛 GitHub Actions: test | qdrant
cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the nltk package.

🪛 GitHub Actions: test | deduplication
cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the package using poetry add nltk or add it to pyproject.toml.

🪛 GitHub Actions: test | pgvector
cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the nltk package.

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: profiler
  • GitHub Check: docker-compose-test

Comment on lines +34 to +38
def get_entities(tagged_tokens):
nltk.download("maxent_ne_chunker", quiet=True)
from nltk.chunk import ne_chunk

return ne_chunk(tagged_tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add docstring and error handling for get_entities()

The function lacks a docstring explaining its purpose and parameters. Also, NLTK resource downloads should be handled with error checking.

 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

         return ne_chunk(tagged_tokens)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_entities(tagged_tokens):
nltk.download("maxent_ne_chunker", quiet=True)
from nltk.chunk import ne_chunk
return ne_chunk(tagged_tokens)
def get_entities(tagged_tokens):
"""Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
Args:
tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
Returns:
A tree containing chunks of named entities
"""
try:
nltk.download("maxent_ne_chunker", quiet=True)
from nltk.chunk import ne_chunk
except Exception as e:
logging.error(f"Failed to download NLTK resources: {str(e)}")
raise
return ne_chunk(tagged_tokens)
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
cognee-mcp/README.md (3)

40-42: Add language specifications to code blocks.

The code blocks should specify their language for proper syntax highlighting.

Apply these changes:

-```
+```bash
cd ~/Library/Application\ Support/Claude/

- +bash
nano claude_desktop_config.json



Also applies to: 46-48

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

40-40: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

---

`44-45`: **Improve configuration file instructions.**

The instructions about adding paths and API key could be more specific and include a warning about securing sensitive information.

Consider adding:
- A note about keeping the API key secure
- Examples of correct path formatting
- Warning about not committing the config file with API keys

---

`101-105`: **Enhance development workflow documentation.**

The development workflow steps could be more detailed to prevent confusion.

Consider:
1. Explaining why each step is necessary
2. Adding expected output or success indicators
3. Mentioning potential troubleshooting steps

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between ad07bae9a781c909c67658261d579de5834ea3b1 and 61118dda0353158ed949d04acf95867d30011d47.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `cognee-mcp/README.md` (2 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

<details>
<summary>cognee-mcp/README.md</summary>

40-40: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

46-46: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->


To use debugger, run:
```bash
npx @modelcontextprotocol/inspector uv --directory /Users/name/folder run cognee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace hardcoded paths with placeholders.

The instructions contain specific user paths that won't work for other users.

Apply these changes:

-npx @modelcontextprotocol/inspector uv --directory /Users/name/folder run cognee
+npx @modelcontextprotocol/inspector uv --directory /Users/{username}/path/to/folder run cognee

-npx @modelcontextprotocol/inspector uv --directory /Users/vasilije/cognee/cognee-mcp run cognee
+npx @modelcontextprotocol/inspector uv --directory /Users/{username}/cognee/cognee-mcp run cognee

Also applies to: 105-105

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
.github/workflows/dockerhub.yml (2)

44-45: Consider using dynamic cache tags.

The current cache configuration uses a static buildcache tag which could lead to race conditions in parallel builds. Consider using a branch-specific cache tag.

-          cache-from: type=registry,ref=cognee/cognee:buildcache
-          cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
+          cache-from: type=registry,ref=cognee/cognee:buildcache-${{ github.ref_name }}
+          cache-to: type=registry,ref=cognee/cognee:buildcache-${{ github.ref_name }},mode=max

26-35: Enhance image metadata.

Consider adding more metadata to improve image discoverability and traceability:

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: cognee/cognee
          tags: |
            type=ref,event=branch
            type=sha,prefix={{branch}}-
            type=raw,value=latest,enable={{is_default_branch}}
+          labels: |
+            org.opencontainers.image.title=Cognee
+            org.opencontainers.image.description=Knowledge graph system with visualization capabilities
+            maintainer=Cognee Team
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61118dd and 933d21a.

📒 Files selected for processing (1)
  • .github/workflows/dockerhub.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/dockerhub.yml

48-48: property "build" is not defined in object type {meta: {conclusion: string; outcome: string; outputs: {annotations: string; bake-file: string; bake-file-annotations: string; bake-file-labels: string; bake-file-tags: string; json: string; labels: string; tags: string; version: string}}}

(expression)

🪛 yamllint (1.35.1)
.github/workflows/dockerhub.yml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: profiler
  • GitHub Check: docker-compose-test
🔇 Additional comments (1)
.github/workflows/dockerhub.yml (1)

1-8: LGTM! Good addition of the dev branch.

The workflow trigger configuration is well-structured, allowing for testing in the dev branch before changes reach main.

Comment on lines +47 to +48
- name: Image digest
run: echo ${{ steps.build.outputs.digest }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the image digest step.

There are two issues to address:

  1. The digest step references an undefined step ID 'build'
  2. The file is missing a newline at the end

Apply this diff to fix both issues:

      - name: Image digest
-        run: echo ${{ steps.build.outputs.digest }}
+        run: echo ${{ steps.meta.outputs.json }}
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Image digest
run: echo ${{ steps.build.outputs.digest }}
- name: Image digest
run: echo ${{ steps.meta.outputs.json }}
🧰 Tools
🪛 actionlint (1.7.4)

48-48: property "build" is not defined in object type {meta: {conclusion: string; outcome: string; outputs: {annotations: string; bake-file: string; bake-file-annotations: string; bake-file-labels: string; bake-file-tags: string; json: string; labels: string; tags: string; version: string}}}

(expression)

🪛 yamllint (1.35.1)

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

Comment on lines 9 to 13
docker-build-and-push:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add explicit permissions configuration.

Consider adding explicit permissions to follow the principle of least privilege:

  docker-build-and-push:
    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
steps:
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/approve_dco.yaml (2)

15-23: Consider making the DCO validation more robust.

The current exact text matching might be too strict. Consider:

  1. Case-insensitive matching
  2. Normalizing whitespace
  3. Supporting multiple statement formats

Here's a suggested improvement:

-            const requiredStatement = "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin";
-            
-            if (!prBody.includes(requiredStatement)) {
+            const requiredStatement = "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin";
+            const normalizedPrBody = prBody.toLowerCase().replace(/\s+/g, ' ').trim();
+            const normalizedStatement = requiredStatement.toLowerCase().replace(/\s+/g, ' ').trim();
+            
+            if (!normalizedPrBody.includes(normalizedStatement)) {
🧰 Tools
🪛 yamllint (1.35.1)

[error] 18-18: trailing spaces

(trailing-spaces)


18-18: Remove trailing whitespace.

There is trailing whitespace on line 18 that should be removed.

-            
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 18-18: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 662faeb and 4a87df9.

📒 Files selected for processing (2)
  • .github/pull_request_template.md (1 hunks)
  • .github/workflows/approve_dco.yaml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .github/pull_request_template.md
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/approve_dco.yaml

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 yamllint (1.35.1)
.github/workflows/approve_dco.yaml

[error] 18-18: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: profiler
  • GitHub Check: docker-compose-test
🔇 Additional comments (1)
.github/workflows/approve_dco.yaml (1)

1-5: LGTM! Well-defined workflow triggers.

The workflow name is descriptive and the trigger configuration comprehensively covers all relevant pull request events.

Comment on lines +8 to +13
check-dco:
runs-on: ubuntu-latest
steps:
- name: Validate Developer Certificate of Origin statement
uses: actions/github-script@v6
with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update actions/github-script from v6 to v7

The latest version is v7.0.1, which includes improvements and bug fixes. Since this is a major version update, review the changelog for any breaking changes before updating.

🔗 Analysis chain

Update the GitHub Script action version.

The current version of actions/github-script@v6 might be outdated. Consider updating to the latest version to ensure compatibility and security.

Let's verify the latest version and any security advisories:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the latest version and security advisories for actions/github-script
gh api repos/actions/github-script/releases/latest --jq .tag_name

# Check for any security advisories
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: ACTIONS, package: "github-script") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 425

🧰 Tools
🪛 actionlint (1.7.4)

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
.github/workflows/approve_dco.yaml (1)

12-12: ⚠️ Potential issue

Update actions/github-script from v6 to v7.

The current version is outdated and should be updated to the latest v7 for compatibility and security improvements.

-        uses: actions/github-script@v6
+        uses: actions/github-script@v7
🧰 Tools
🪛 actionlint (1.7.4)

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🧹 Nitpick comments (1)
.github/workflows/approve_dco.yaml (1)

37-44: Enhance error handling with specific error messages.

The error handling could be more informative by providing specific error messages for different error scenarios.

             } catch (error) {
               // If we get a 404, user is NOT an org member
               if (error.status === 404) {
                 console.log(`${prUser} is NOT a member of ${orgName}. Enforcing DCO check.`);
               } else {
                 // Some other error—fail the workflow or handle accordingly
-                core.setFailed(`Error checking organization membership: ${error.message}`);
+                core.setFailed(
+                  `Failed to verify organization membership: ${error.status === 403 ? 
+                    'Insufficient permissions. Ensure GITHUB_TOKEN has read:org scope.' : 
+                    `Unexpected error (${error.status}): ${error.message}`}`
+                );
               }
             }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a87df9 and b2355de.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • .github/workflows/approve_dco.yaml (1 hunks)
  • cognee/shared/utils.py (5 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/approve_dco.yaml

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Ruff (0.8.2)
cognee/shared/utils.py

26-26: Redefinition of unused nltk from line 14

Remove definition: nltk

(F811)


457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff format
cognee/shared/utils.py

[warning] File needs formatting according to Ruff standards

🪛 GitHub Actions: ruff lint
cognee/shared/utils.py

[error] 26-26: Redefinition of unused 'nltk' from line 14


[error] 457-457: Local variable 'filename' is assigned to but never used

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
🔇 Additional comments (3)
.github/workflows/approve_dco.yaml (1)

1-6: LGTM! Comprehensive PR trigger events.

The workflow configuration is well-structured with appropriate trigger events for PR-related actions.

cognee/shared/utils.py (2)

Line range hint 60-550: LGTM! Well-structured utility functions

The remaining utility functions are well-implemented with proper error handling and documentation.

🧰 Tools
🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable 'filename' is assigned to but never used


34-38: 🛠️ Refactor suggestion

Add error handling and docstring for get_entities()

The function lacks a docstring and proper error handling for NLTK resource downloads.

 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    
+    Raises:
+        Exception: If NLTK resource download fails
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

         return ne_chunk(tagged_tokens)

Likely invalid or redundant comment.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

# In GitHub Enterprise or private orgs, you might need a PAT (personal access token) with read:org scope.
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace placeholder organization name.

The organization name is currently set to a placeholder value 'YOUR_ORGANIZATION_NAME'. This needs to be replaced with the actual organization name 'topoteretes' as indicated in the PR objectives.

-            const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org
+            const orgName = 'topoteretes';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org
const orgName = 'topoteretes';

import pandas as pd
import matplotlib.pyplot as plt
import tiktoken
import nltk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate nltk import

The nltk module is imported twice. Remove the duplicate import on line 26 to fix the F811 error.

import nltk
import base64
import time
...
- import nltk
from cognee.shared.exceptions import IngestionError

Also applies to: 26-26

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

Comment on lines 453 to 460
# Get the latest Unix timestamp as an integer
timestamp = int(time.time())

# Construct your filename
filename = f"{timestamp}.png"



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and use the filename variable for PNG export

The PNG export functionality lacks error handling and cleanup of old files. Also, the filename variable is created but never used.

     # Get the latest Unix timestamp as an integer
     timestamp = int(time.time())

     # Construct your filename
     filename = f"{timestamp}.png"

+    try:
+        # Cleanup old PNG files to prevent disk space issues
+        cleanup_old_pngs(directory=".", keep_latest=5)
+        
+        # Export the new PNG
+        export_png(p, filename=filename)
+    except Exception as e:
+        logging.error(f"Failed to export PNG: {str(e)}")
+        raise

Consider adding a helper function to cleanup old PNG files:

def cleanup_old_pngs(directory: str, keep_latest: int = 5):
    """Cleanup old PNG files, keeping only the N latest files."""
    png_files = [f for f in os.listdir(directory) if f.endswith('.png')]
    if len(png_files) <= keep_latest:
        return
        
    # Sort by timestamp in filename
    sorted_files = sorted(png_files, key=lambda x: int(x.replace(".png", "")))
    
    # Remove older files
    for f in sorted_files[:-keep_latest]:
        try:
            os.remove(os.path.join(directory, f))
        except OSError as e:
            logging.warning(f"Failed to remove old PNG file {f}: {str(e)}")
🧰 Tools
🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable 'filename' is assigned to but never used

Comment on lines +41 to +59
def extract_pos_tags(sentence):
"""Extract Part-of-Speech (POS) tags for words in a sentence."""

# Ensure that the necessary NLTK resources are downloaded
nltk.download("words", quiet=True)
nltk.download("punkt", quiet=True)
nltk.download("averaged_perceptron_tagger", quiet=True)

from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize

# Tokenize the sentence into words
tokens = word_tokenize(sentence)

# Tag each word with its corresponding POS tag
pos_tags = pos_tag(tokens)

return pos_tags

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and improve error handling for extract_pos_tags()

The function should validate input and handle NLTK resource downloads more robustly.

 def extract_pos_tags(sentence):
     """Extract Part-of-Speech (POS) tags for words in a sentence.
+    
+    Args:
+        sentence (str): Input sentence to be POS tagged
+    
+    Returns:
+        list: A list of tuples containing (word, POS_tag)
+    
+    Raises:
+        ValueError: If sentence is not a string or is empty
+        Exception: If NLTK resource download fails
+    """
+    if not isinstance(sentence, str) or not sentence.strip():
+        raise ValueError("Input must be a non-empty string")

+    try:
         # Ensure that the necessary NLTK resources are downloaded
         nltk.download("words", quiet=True)
         nltk.download("punkt", quiet=True)
         nltk.download("averaged_perceptron_tagger", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     from nltk.tag import pos_tag
     from nltk.tokenize import word_tokenize

     # Tokenize the sentence into words
     tokens = word_tokenize(sentence)

     # Tag each word with its corresponding POS tag
     pos_tags = pos_tag(tokens)

     return pos_tags
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_pos_tags(sentence):
"""Extract Part-of-Speech (POS) tags for words in a sentence."""
# Ensure that the necessary NLTK resources are downloaded
nltk.download("words", quiet=True)
nltk.download("punkt", quiet=True)
nltk.download("averaged_perceptron_tagger", quiet=True)
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
# Tokenize the sentence into words
tokens = word_tokenize(sentence)
# Tag each word with its corresponding POS tag
pos_tags = pos_tag(tokens)
return pos_tags
def extract_pos_tags(sentence):
"""Extract Part-of-Speech (POS) tags for words in a sentence.
Args:
sentence (str): Input sentence to be POS tagged
Returns:
list: A list of tuples containing (word, POS_tag)
Raises:
ValueError: If sentence is not a string or is empty
Exception: If NLTK resource download fails
"""
if not isinstance(sentence, str) or not sentence.strip():
raise ValueError("Input must be a non-empty string")
try:
# Ensure that the necessary NLTK resources are downloaded
nltk.download("words", quiet=True)
nltk.download("punkt", quiet=True)
nltk.download("averaged_perceptron_tagger", quiet=True)
except Exception as e:
logging.error(f"Failed to download NLTK resources: {str(e)}")
raise
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
# Tokenize the sentence into words
tokens = word_tokenize(sentence)
# Tag each word with its corresponding POS tag
pos_tags = pos_tag(tokens)
return pos_tags
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
.github/workflows/test_python_3_10.yml (1)

45-47: ⚠️ Potential issue

Fix NLTK tokenizer name and add download verification.

The NLTK tokenizer package name is incorrect. It should be 'punkt' instead of 'punkt_tab'. Also, consider adding download verification.

.github/workflows/test_python_3_11.yml (1)

47-49: ⚠️ Potential issue

Fix NLTK tokenizer name and add download verification.

The NLTK tokenizer package name is incorrect. It should be 'punkt' instead of 'punkt_tab'. Also, consider adding download verification.

🧹 Nitpick comments (4)
.github/workflows/test_python_3_10.yml (1)

48-48: Remove extra blank line.

Remove the extra blank line after the NLTK download step to maintain consistent spacing between workflow steps.

.github/workflows/test_python_3_11.yml (1)

50-51: Remove extra blank lines.

Remove the two extra blank lines after the NLTK download step to maintain consistent spacing between workflow steps.

cognee/tests/unit/processing/chunks/chunk_by_word_test.py (1)

Line range hint 1-22: Consider batch formatting all test files.

Given the consistent pattern of assertion formatting changes across multiple test files, consider:

  1. Running ruff format on all test files in a single batch
  2. Adding a pre-commit hook to automatically format files
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

cognee/tests/test_deduplication.py (1)

33-35: LGTM! The assertion reformatting improves readability.

The reformatting of assertions using parentheses improves code readability while maintaining the same functionality.

Please run the Ruff formatter to fix the formatting issues flagged by the pipeline:

#!/bin/bash
# Description: Check which files need formatting
ruff format --check .

Also applies to: 64-66

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2355de and bbd51e8.

📒 Files selected for processing (18)
  • .github/workflows/test_python_3_10.yml (1 hunks)
  • .github/workflows/test_python_3_11.yml (1 hunks)
  • .github/workflows/test_python_3_12.yml (1 hunks)
  • cognee-mcp/cognee_mcp/server.py (3 hunks)
  • cognee/shared/utils.py (4 hunks)
  • cognee/tests/integration/documents/AudioDocument_test.py (1 hunks)
  • cognee/tests/integration/documents/ImageDocument_test.py (1 hunks)
  • cognee/tests/integration/documents/PdfDocument_test.py (1 hunks)
  • cognee/tests/integration/documents/TextDocument_test.py (1 hunks)
  • cognee/tests/integration/documents/UnstructuredDocument_test.py (1 hunks)
  • cognee/tests/test_deduplication.py (2 hunks)
  • cognee/tests/test_falkordb.py (1 hunks)
  • cognee/tests/test_library.py (1 hunks)
  • cognee/tests/test_pgvector.py (2 hunks)
  • cognee/tests/unit/processing/chunks/chunk_by_paragraph_2_test.py (3 hunks)
  • cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py (1 hunks)
  • cognee/tests/unit/processing/chunks/chunk_by_sentence_test.py (2 hunks)
  • cognee/tests/unit/processing/chunks/chunk_by_word_test.py (1 hunks)
✅ Files skipped from review due to trivial changes (9)
  • cognee/tests/test_falkordb.py
  • cognee/tests/unit/processing/chunks/chunk_by_sentence_test.py
  • cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py
  • cognee/tests/integration/documents/ImageDocument_test.py
  • cognee/tests/integration/documents/PdfDocument_test.py
  • cognee/tests/test_library.py
  • cognee/tests/integration/documents/TextDocument_test.py
  • cognee/tests/test_pgvector.py
  • cognee/tests/unit/processing/chunks/chunk_by_paragraph_2_test.py
🧰 Additional context used
🪛 GitHub Actions: ruff format
cognee/tests/unit/processing/chunks/chunk_by_word_test.py

[warning] File requires formatting with Ruff formatter

cognee/tests/integration/documents/AudioDocument_test.py

[warning] File requires formatting with Ruff formatter

cognee/tests/test_deduplication.py

[warning] File requires formatting with Ruff formatter

cognee/tests/integration/documents/UnstructuredDocument_test.py

[warning] File requires formatting with Ruff formatter

⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
🔇 Additional comments (8)
cognee/tests/integration/documents/AudioDocument_test.py (1)

Line range hint 1-47: Verify PR scope and title accuracy.

This file contains only test assertion formatting changes, which seems unrelated to the PR's stated purpose of "Add data visualization for Anthropic". Consider either:

  1. Updating the PR title/description to reflect these formatting changes, or
  2. Moving these changes to a separate PR focused on test improvements
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

cognee/tests/unit/processing/chunks/chunk_by_word_test.py (1)

Line range hint 1-1: Overall PR Review Summary

  1. Scope Mismatch: The PR title suggests adding data visualization features, but the changes only contain test assertion formatting modifications. Consider either:

    • Updating the PR title/description to accurately reflect these changes
    • Moving these formatting changes to a separate PR
    • Adding the missing data visualization changes
  2. Formatting Issues: All modified files have Ruff formatting warnings. Recommend:

    • Running ruff format on all modified files
    • Setting up pre-commit hooks to prevent formatting issues
    • Consider batch formatting all test files for consistency
🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

cognee-mcp/cognee_mcp/server.py (3)

94-103: Mark query parameter as required in visualization tool schema.

The tool's schema defines a "query" property but doesn't mark it as required. This should be fixed to ensure proper validation.

         types.Tool(
             name="visualize",
             description="Visualize the knowledge graph.",
             inputSchema={
                 "type": "object",
                 "properties": {
                     "query": {"type": "string"},
                 },
+                "required": ["query"],
             },
         ),

107-132: LGTM! Well-implemented function with proper error handling.

The get_freshest_png function is well-implemented with:

  • Proper error handling for directory and file operations
  • Clear documentation
  • Good use of type hints

199-213: Return visualization output as image content.

The visualization tool returns text content despite being described as a tool to "Visualize the knowledge graph."

                    return [
-                       types.TextContent(
-                           type="text",
-                           text=results,
-                       )
+                       types.ImageContent(
+                           type="image",
+                           data=await cognee.visualize_graph(),
+                           format="png"
+                       )
                    ]
cognee/shared/utils.py (3)

397-397: Add error handling for PNG export.

The PNG export functionality lacks error handling and cleanup of old files.

     # Get the latest Unix timestamp as an integer
     timestamp = int(time.time())

     # Construct your filename
     filename = f"{timestamp}.png"

-    export_png(p, filename=filename)
+    try:
+        # Cleanup old PNG files to prevent disk space issues
+        cleanup_old_pngs(directory=".", keep_latest=5)
+        
+        # Export the new PNG
+        export_png(p, filename=filename)
+    except Exception as e:
+        logging.error(f"Failed to export PNG: {str(e)}")
+        raise

31-35: 🛠️ Refactor suggestion

Add docstring and error handling for get_entities().

The function lacks a docstring explaining its purpose and parameters. Also, NLTK resource downloads should be handled with error checking.

 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    
+    Raises:
+        Exception: If NLTK resource download fails
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     return ne_chunk(tagged_tokens)

Likely invalid or redundant comment.


38-56: 🛠️ Refactor suggestion

Add input validation and improve error handling for extract_pos_tags().

The function should validate input and handle NLTK resource downloads more robustly.

 def extract_pos_tags(sentence):
-    """Extract Part-of-Speech (POS) tags for words in a sentence."""
+    """Extract Part-of-Speech (POS) tags for words in a sentence.
+    
+    Args:
+        sentence (str): Input sentence to be POS tagged
+    
+    Returns:
+        list: A list of tuples containing (word, POS_tag)
+    
+    Raises:
+        ValueError: If sentence is not a string or is empty
+        Exception: If NLTK resource download fails
+    """
+    if not isinstance(sentence, str) or not sentence.strip():
+        raise ValueError("Input must be a non-empty string")

+    try:
         nltk.download("words", quiet=True)
         nltk.download("punkt", quiet=True)
         nltk.download("averaged_perceptron_tagger", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     from nltk.tag import pos_tag
     from nltk.tokenize import word_tokenize

     tokens = word_tokenize(sentence)
     pos_tags = pos_tag(tokens)

     return pos_tags

Likely invalid or redundant comment.

Comment on lines +46 to +48
- name: Download NLTK tokenizer data
run: |
poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix NLTK tokenizer name and add download verification.

The NLTK tokenizer package name is incorrect. It should be 'punkt' instead of 'punkt_tab'. Also, consider adding download verification.

Apply this diff to fix the issues:

       - name: Download NLTK tokenizer data
         run: |
-          poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng
+          poetry run python -c "
+            import nltk
+            for package in ['punkt', 'averaged_perceptron_tagger']:
+                nltk.download(package)
+                # Verify download
+                try:
+                    if package == 'punkt':
+                        nltk.data.find('tokenizers/punkt')
+                    else:
+                        nltk.data.find('taggers/averaged_perceptron_tagger')
+                    print(f'Successfully downloaded and verified {package}')
+                except LookupError as e:
+                    print(f'Failed to verify {package}')
+                    exit(1)
+          "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Download NLTK tokenizer data
run: |
poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng
- name: Download NLTK tokenizer data
run: |
poetry run python -c "
import nltk
for package in ['punkt', 'averaged_perceptron_tagger']:
nltk.download(package)
# Verify download
try:
if package == 'punkt':
nltk.data.find('tokenizers/punkt')
else:
nltk.data.find('taggers/averaged_perceptron_tagger')
print(f'Successfully downloaded and verified {package}')
except LookupError as e:
print(f'Failed to verify {package}')
exit(1)
"

Comment on lines +39 to +47
assert (
ground_truth["word_count"] == paragraph_data.word_count
), f'{ground_truth["word_count"] = } != {paragraph_data.word_count = }'
assert ground_truth["len_text"] == len(
paragraph_data.text
), f'{ground_truth["len_text"] = } != {len(paragraph_data.text) = }'
assert (
ground_truth["cut_type"] == paragraph_data.cut_type
), f'{ground_truth["cut_type"] = } != {paragraph_data.cut_type = }'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix formatting to comply with Ruff.

The assertion formatting needs to be adjusted according to Ruff's requirements. While the multi-line format improves readability, it should follow the project's automated formatting rules.

Run ruff format on this file to automatically fix the formatting issues.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

Comment on lines +74 to +102
assert (
"sentence_cut" == paragraph_data.cut_type
), f" sentence_cut != {paragraph_data.cut_type = }"

# Test DOCX
for paragraph_data in docx_document.read(chunk_size=1024, chunker="text_chunker"):
assert 16 == paragraph_data.word_count, f" 16 != {paragraph_data.word_count = }"
assert 145 == len(paragraph_data.text), f" 145 != {len(paragraph_data.text) = }"
assert "sentence_end" == paragraph_data.cut_type, (
f" sentence_end != {paragraph_data.cut_type = }"
)
assert (
"sentence_end" == paragraph_data.cut_type
), f" sentence_end != {paragraph_data.cut_type = }"

# TEST CSV
for paragraph_data in csv_document.read(chunk_size=1024, chunker="text_chunker"):
assert 15 == paragraph_data.word_count, f" 15 != {paragraph_data.word_count = }"
assert "A A A A A A A A A,A A A A A A,A A" == paragraph_data.text, (
f"Read text doesn't match expected text: {paragraph_data.text}"
)
assert "sentence_cut" == paragraph_data.cut_type, (
f" sentence_cut != {paragraph_data.cut_type = }"
)
assert (
"A A A A A A A A A,A A A A A A,A A" == paragraph_data.text
), f"Read text doesn't match expected text: {paragraph_data.text}"
assert (
"sentence_cut" == paragraph_data.cut_type
), f" sentence_cut != {paragraph_data.cut_type = }"

# Test XLSX
for paragraph_data in xlsx_document.read(chunk_size=1024, chunker="text_chunker"):
assert 36 == paragraph_data.word_count, f" 36 != {paragraph_data.word_count = }"
assert 171 == len(paragraph_data.text), f" 171 != {len(paragraph_data.text) = }"
assert "sentence_cut" == paragraph_data.cut_type, (
f" sentence_cut != {paragraph_data.cut_type = }"
)
assert (
"sentence_cut" == paragraph_data.cut_type
), f" sentence_cut != {paragraph_data.cut_type = }"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix formatting to comply with Ruff.

Multiple assertion blocks need formatting adjustments to comply with Ruff's requirements.

Run ruff format on this file to automatically fix the formatting issues.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

Comment on lines +20 to +22
assert (
reconstructed_text == input_text
), f"texts are not identical: {len(input_text) = }, {len(reconstructed_text) = }"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix formatting to comply with Ruff.

The assertion formatting needs to be adjusted according to Ruff's requirements.

Run ruff format on this file to automatically fix the formatting issues.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

Comment on lines +6 to +7
from sqlalchemy.testing.plugin.plugin_base import logging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the standard Python logging module instead.

Importing logging from sqlalchemy's testing plugin is not recommended. Use Python's built-in logging module instead.

-from sqlalchemy.testing.plugin.plugin_base import logging
+import logging
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from sqlalchemy.testing.plugin.plugin_base import logging
import logging

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbd51e8 and fe47253.

⛔ Files ignored due to path filters (1)
  • cognee-mcp/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • cognee-mcp/cognee_mcp/server.py (3 hunks)
  • cognee/api/v1/visualize/visualize.py (1 hunks)
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2 hunks)
  • cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (3 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
  • cognee/shared/utils.py (5 hunks)
✅ Files skipped from review due to trivial changes (4)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
  • cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py
  • cognee/api/v1/visualize/visualize.py
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee-mcp/cognee_mcp/server.py
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
🔇 Additional comments (3)
cognee/shared/utils.py (3)

38-56: 🛠️ Refactor suggestion

Add input validation and improve error handling for extract_pos_tags()

The function should validate input and handle NLTK resource downloads more robustly.

 def extract_pos_tags(sentence):
     """Extract Part-of-Speech (POS) tags for words in a sentence."""
+    if not isinstance(sentence, str) or not sentence.strip():
+        raise ValueError("Input must be a non-empty string")

+    try:
         # Ensure that the necessary NLTK resources are downloaded
         nltk.download("words", quiet=True)
         nltk.download("punkt", quiet=True)
         nltk.download("averaged_perceptron_tagger", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     from nltk.tag import pos_tag
     from nltk.tokenize import word_tokenize

     # Tokenize the sentence into words
     tokens = word_tokenize(sentence)

     # Tag each word with its corresponding POS tag
     pos_tags = pos_tag(tokens)

     return pos_tags

Likely invalid or redundant comment.


31-35: 🛠️ Refactor suggestion

Add docstring and error handling for get_entities()

The function lacks a docstring explaining its purpose and parameters. Also, NLTK resource downloads should be handled with error checking.

 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    
+    Raises:
+        Exception: If NLTK resource download fails
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

         return ne_chunk(tagged_tokens)

Likely invalid or redundant comment.


397-397: 🛠️ Refactor suggestion

Add error handling for PNG export

The PNG export functionality lacks error handling and cleanup of old files.

     from bokeh.io import export_png
+    import os
+
+    def cleanup_old_pngs(directory=".", keep_latest=5):
+        """Cleanup old PNG files, keeping only the N latest files."""
+        png_files = [f for f in os.listdir(directory) if f.endswith('.png')]
+        if len(png_files) <= keep_latest:
+            return
+            
+        # Sort by timestamp in filename
+        sorted_files = sorted(png_files, key=lambda x: int(x.replace(".png", "")))
+        
+        # Remove older files
+        for f in sorted_files[:-keep_latest]:
+            try:
+                os.remove(os.path.join(directory, f))
+            except OSError as e:
+                logging.warning(f"Failed to remove old PNG file {f}: {str(e)}")

Likely invalid or redundant comment.

Comment on lines +448 to +453
home_dir = os.path.expanduser("~")

# Construct the final output file path
output_filepath = os.path.join(home_dir, output_filename)
with open(output_filepath, "w") as f:
f.write(html_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for file operations

The file writing operations lack error handling and directory existence checks.

     home_dir = os.path.expanduser("~")
+    
+    # Ensure output directory exists
+    try:
+        os.makedirs(home_dir, exist_ok=True)
+    except OSError as e:
+        logging.error(f"Failed to create output directory: {str(e)}")
+        raise

     # Construct the final output file path
     output_filepath = os.path.join(home_dir, output_filename)
+    try:
         with open(output_filepath, "w") as f:
             f.write(html_content)
+        logging.info(f"Successfully saved visualization to {output_filepath}")
+    except IOError as e:
+        logging.error(f"Failed to write output file: {str(e)}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
home_dir = os.path.expanduser("~")
# Construct the final output file path
output_filepath = os.path.join(home_dir, output_filename)
with open(output_filepath, "w") as f:
f.write(html_content)
home_dir = os.path.expanduser("~")
# Ensure output directory exists
try:
os.makedirs(home_dir, exist_ok=True)
except OSError as e:
logging.error(f"Failed to create output directory: {str(e)}")
raise
# Construct the final output file path
output_filepath = os.path.join(home_dir, output_filename)
try:
with open(output_filepath, "w") as f:
f.write(html_content)
logging.info(f"Successfully saved visualization to {output_filepath}")
except IOError as e:
logging.error(f"Failed to write output file: {str(e)}")
raise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants