feat: Add data visualization for Anthropic #432

Vasilije1990 · 2025-01-10T21:28:31Z

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

New Features
- Added visualization tool for knowledge graph.
- Introduced functions for named entity recognition and part-of-speech tagging.
Improvements
- Enhanced server's image processing capabilities.
- Added functionality to retrieve the most recent PNG file.
- Integrated image handling with PIL library.
- Enabled saving graph visualizations as PNG images with timestamps.
- Updated documentation with debugging and installation reset instructions.
Dependency Updates
- Updated various dependencies to newer versions.
- Specified minimum Python version requirement.
Workflows
- Introduced automated DCO validation for pull requests.
- Streamlined Docker image build and push workflow.
- Added steps to download NLTK tokenizer data for testing workflows.

coderabbitai · 2025-01-10T21:28:39Z

Walkthrough

The pull request introduces enhanced image processing and visualization capabilities for the Cognee knowledge graph system. The changes span several files, primarily cognee_mcp/server.py and cognee/shared/utils.py. Modifications include the addition of a new "visualize" tool for graph visualization, the ability to save visualizations as PNG files, and a function to retrieve the most recently created PNG file. Additionally, updates to the README and pyproject.toml ensure proper documentation and dependency management.

Changes

File	Change Summary
`cognee_mcp/server.py`	- Added `get_freshest_png()` function to retrieve the latest PNG file - Introduced "visualize" tool in `handle_list_tools` - Updated `handle_call_tool` to support visualization
`cognee/shared/utils.py`	- Added `get_entities()` and `extract_pos_tags()` functions - Updated `create_cognee_style_network_with_logo()` for PNG export
`cognee-mcp/README.md`	- Added debugging and installation reset instructions
`cognee-mcp/pyproject.toml`	- Updated Python version requirement and several package dependencies
`cognee/__init__.py`	- Replaced import of `visualize` with `visualize_graph`
`pyproject.toml`	- Specified version for `fastapi-users` and updated `nltk` dependency
`.github/workflows/dockerhub.yml`	- Updated workflow name and steps for building and pushing Docker images
`.github/pull_request_template.md`	- Introduced a new pull request template with sections for description and DCO affirmation
`.github/workflows/approve_dco.yaml`	- Added a workflow to validate DCO compliance in pull requests

Possibly related PRs

Format with Ruff 0.9.0 #425: The changes in the get_filtered_graph_data method of the Neo4jAdapter class involve string formatting in a query, which is related to the overall theme of improving code readability and consistency, similar to the formatting changes made in the main PR's cognee_mcp/server.py file.

Poem

🐰 A rabbit's tale of graphs so bright,
Visualizing knowledge with pure delight,
PNG images, fresh and new,
Timestamps dancing, a graphing breakthrough!
Cognee's magic, now clear to see! 🖼️

Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e983c21 and daf2d54.

📒 Files selected for processing (2)

cognee-mcp/cognee_mcp/server.py (3 hunks)
cognee/shared/utils.py (3 hunks)

🧰 Additional context used

🪛 GitHub Actions: ruff format

cognee-mcp/cognee_mcp/server.py

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

🔇 Additional comments (1)

cognee-mcp/cognee_mcp/server.py (1)

92-101: LGTM: Tool definition looks good

The "visualize" tool is properly defined with a clear description and appropriate schema.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

coderabbitai · 2025-01-10T21:37:37Z

cognee-mcp/cognee_mcp/server.py

+from PIL import Image
+from PIL import Image as PILImage


🛠️ Refactor suggestion

Remove duplicate PIL Image import

The Image module from PIL is imported twice with different aliases. This is redundant and could cause confusion.

-from PIL import Image -from PIL import Image as PILImage +from PIL import Image

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from PIL import Image

from PIL import Image as PILImage

from PIL import Image

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff formatting standards.

cognee-mcp/cognee_mcp/server.py

coderabbitai · 2025-01-10T21:37:37Z

cognee/shared/utils.py

+    # Get the latest Unix timestamp as an integer
+    timestamp = int(time.time())
+
+    # Construct your filename
+    filename = f"{timestamp}.png"
+
+    export_png(p, filename=filename)
+


🛠️ Refactor suggestion

Add error handling for PNG export

The PNG export functionality lacks error handling and cleanup of old files.

# Get the latest Unix timestamp as an integer timestamp = int(time.time()) # Construct your filename filename = f"{timestamp}.png" - export_png(p, filename=filename) + try: + # Cleanup old PNG files to prevent disk space issues + cleanup_old_pngs(directory=".", keep_latest=5) + + # Export the new PNG + export_png(p, filename=filename) + except Exception as e: + logging.error(f"Failed to export PNG: {str(e)}") + raise

Consider adding a helper function to cleanup old PNG files:

def cleanup_old_pngs(directory: str, keep_latest: int = 5): """Cleanup old PNG files, keeping only the N latest files.""" png_files = [f for f in os.listdir(directory) if f.endswith('.png')] if len(png_files) <= keep_latest: return # Sort by timestamp in filename sorted_files = sorted(png_files, key=lambda x: int(x.replace(".png", ""))) # Remove older files for f in sorted_files[:-keep_latest]: try: os.remove(os.path.join(directory, f)) except OSError as e: logging.warning(f"Failed to remove old PNG file {f}: {str(e)}")

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

cognee-mcp/cognee_mcp/server.py (1)
Fix import statements organization

The file has the following formatting issues:

Remove duplicate PIL import: from PIL import Image as PILImage is redundant since Image is already imported

Organize imports according to PEP 8 style guide:

Standard library imports

Third-party imports

Local application imports

🔗 Analysis chain

Line range hint 1-230: Fix formatting issues

The file requires formatting according to Ruff's standards. Please run the formatter:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
ruff format cognee-mcp/cognee_mcp/server.py
Length of output: 140

Script:
#!/bin/bash
# Check for basic formatting patterns
echo "=== Checking import statements ==="
rg "^import|^from.*import" cognee-mcp/cognee_mcp/server.py

echo -e "\n=== Checking blank lines between functions ==="
rg "^def|^async def" -A 1 cognee-mcp/cognee_mcp/server.py

echo -e "\n=== Checking indentation ==="
rg "^[[:space:]]+[^[:space:]]" cognee-mcp/cognee_mcp/server.py
Length of output: 8158
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

🧹 Nitpick comments (1)

cognee-mcp/cognee_mcp/server.py (1)

105-130: Add docstring to document timestamp format requirement

The function assumes PNG files are named with timestamps but this requirement isn't documented.

 def get_freshest_png(directory: str) -> Image.Image:
+    """
+    Retrieve the most recently created PNG file from a directory.
+    
+    Args:
+        directory (str): Path to directory containing PNG files
+        
+    Returns:
+        Image.Image: PIL Image object of the most recent PNG
+        
+    Raises:
+        FileNotFoundError: If directory doesn't exist or no PNG files found
+        ValueError: If PNG filenames don't follow timestamp format (e.g., '1673185134.png')
+        IOError: If PNG file can't be opened
+        
+    Note:
+        PNG files must be named with Unix timestamps (e.g., '1673185134.png')
+    """
     if not os.path.exists(directory):

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between daf2d54 and 7b0bfe9.

📒 Files selected for processing (1)

cognee-mcp/cognee_mcp/server.py (3 hunks)

🧰 Additional context used

🪛 GitHub Actions: ruff format

cognee-mcp/cognee_mcp/server.py

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: docker-compose-test

🔇 Additional comments (1)

cognee-mcp/cognee_mcp/server.py (1)

13-14: Remove duplicate PIL Image import

The Image module from PIL is imported twice with different aliases. This is redundant and could cause confusion.
-from PIL import Image
-from PIL import Image as PILImage
+from PIL import Image
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

cognee-mcp/cognee_mcp/server.py

coderabbitai · 2025-01-11T16:01:54Z

cognee-mcp/cognee_mcp/server.py

+            with redirect_stdout(fnull), redirect_stderr(fnull):
+                try:
+                    await cognee.visualize
+                    img = get_freshest_png(".")


🛠️ Refactor suggestion

Use configured output directory instead of current directory

Using "." as the directory path is brittle and depends on the current working directory when the server starts. Consider using a configured output directory.

- img = get_freshest_png(".") + img = get_freshest_png(cognee.config.visualization_output_dir)

You'll need to ensure the output directory is properly configured in the Cognee configuration.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

coderabbitai · 2025-01-11T16:01:54Z

cognee-mcp/cognee_mcp/server.py

+        types.Tool(
+            name="visualize",
+            description="Visualize the knowledge graph.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string"},
+                },
+            },
+        ),


🛠️ Refactor suggestion

Clarify query parameter requirement in visualization tool schema

The tool's schema defines a "query" property but doesn't mark it as required. Either:

Mark it as required if visualization needs a query parameter

Document that it's optional if visualization can work without a query

types.Tool( name="visualize", description="Visualize the knowledge graph.", inputSchema={ "type": "object", "properties": { "query": {"type": "string"}, }, + "required": ["query"], # Add this if query is required }, ),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

types.Tool(

name="visualize",

description="Visualize the knowledge graph.",

inputSchema={

"type": "object",

"properties": {

"query": {"type": "string"},

},

},

),

types.Tool(

name="visualize",

description="Visualize the knowledge graph.",

inputSchema={

"type": "object",

"properties": {

"query": {"type": "string"},

},

"required": ["query"], # Add this if query is required

},

),

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting. Code does not conform to Ruff's formatting standards.

This is a good suggestion from coderabbit

@borisarzentar Thank you for confirming! I'm glad the suggestion was helpful.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣴⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣆⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣿⣿⡄⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⡿⢿⣿⡿⠟⢻⣷⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⣰⣿⣿⠟⠉⠀⢸⣿⠀⠀⢸⣿⣇⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⣰⣿⣿⠋⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⡄⠀⠀⠀⠀⠀
⠀⠀⠀⣰⣿⣿⠃⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣷⠀⠀⠀⠀⠀
⠀⠀⢠⣿⣿⠇⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⡄⠀⠀⠀⠀
⠀⢠⣿⣿⡏⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣷⡀⠀⠀⠀
⢠⣿⣿⡟⠀⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣿⣷⡀⠀⠀
⣿⣿⡟⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣿⣿⣿⡄⠀
⠛⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⠀

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

borisarzentar · 2025-01-12T15:26:21Z

cognee-mcp/cognee_mcp/server.py

 from mcp.server import NotificationOptions, Server
 from mcp.server.models import InitializationOptions
+from PIL import Image
+from PIL import Image as PILImage


Double import here

borisarzentar · 2025-01-12T15:26:46Z

cognee-mcp/cognee_mcp/server.py

+        types.Tool(
+            name="visualize",
+            description="Visualize the knowledge graph.",
+            inputSchema={
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string"},
+                },
+            },
+        ),


This is a good suggestion from coderabbit

borisarzentar · 2025-01-12T15:29:49Z

cognee-mcp/cognee_mcp/server.py

+        with open(os.devnull, "w") as fnull:
+            with redirect_stdout(fnull), redirect_stderr(fnull):
+                try:
+                    await cognee.visualize()


Should visualize receive the parameter where to put the image it generates? Then we can send that path to get_freshest_png to retrieve the image.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

cognee-mcp/README.md (2)
88-91: Minor grammar fix needed in documentation.

Add "the" before "debugger" for better readability.
-To use debugger, run:
+To use the debugger, run:
🧰 Tools

🪛 LanguageTool

[uncategorized] ~88-~88: You might be missing the article “the” here.
Context: ...y Restart your Claude desktop. To use debugger, run: ```bash npx @modelcontextprotocol...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

93-94: Add command description for clarity.

The reset installation command would benefit from a brief description of when to use it.
-To reset the installation
-uv sync --dev --all-extras --reinstall 
+To reset the installation (useful when dependencies are corrupted):
+```bash
+uv sync --dev --all-extras --reinstall
+```
cognee-mcp/cognee_mcp/server.py (1)
6-7: Remove unused logging import.

The logging import from sqlalchemy.testing.plugin.plugin_base is not used in the code.
-from sqlalchemy.testing.plugin.plugin_base import logging
cognee/shared/utils.py (1)
Line range hint 377-494: Consider removing duplicate logo embedding.

The logo is embedded twice in the visualization, at both bottom-right and top-left positions. This might be unnecessary and could affect performance. Consider keeping only one logo instance unless there's a specific requirement for two logos.
    logging.info("Embedding logo into visualization...")
    embed_logo(p, layout_scale, logo_alpha, "bottom_right")
-    embed_logo(p, layout_scale, logo_alpha, "top_left")
🧰 Tools

🪛 Ruff (0.8.2)

429-429: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf4737b and 047948a.

⛔ Files ignored due to path filters (2)

cognee-mcp/uv.lock is excluded by !**/*.lock
poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

cognee-mcp/README.md (1 hunks)
cognee-mcp/cognee_mcp/server.py (3 hunks)
cognee-mcp/pyproject.toml (4 hunks)
cognee/__init__.py (1 hunks)
cognee/shared/utils.py (4 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

🪛 LanguageTool

cognee-mcp/README.md

[uncategorized] ~88-~88: You might be missing the article “the” here.
Context: ...y Restart your Claude desktop. To use debugger, run: ```bash npx @modelcontextprotocol...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 Ruff (0.8.2)

cognee/shared/utils.py

429-429: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🔇 Additional comments (6)

cognee/__init__.py (1)

7-7: LGTM! Verify function usage across codebase.

The renamed import to visualize_graph is more descriptive of its purpose.

Let's verify all usages of the old and new function names:

✅ Verification successful

✅ Function rename verification complete

The visualize to visualize_graph rename is consistently applied across the codebase. The new function is properly used in cognee-mcp/cognee_mcp/server.py and no instances of the old function name remain.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for any remaining usages of old 'visualize' function
rg -l "\.visualize\(" 

# Search for usages of new 'visualize_graph' function
rg -l "\.visualize_graph\("
Length of output: 85

Script:
#!/bin/bash
# Check the implementation in server.py
rg "visualize_graph" cognee-mcp/cognee_mcp/server.py -A 2 -B 2
Length of output: 273
cognee-mcp/pyproject.toml (2)

94-94: LGTM! Script entry point is properly configured.

The cognee script entry point is correctly defined.

6-6: Verify compatibility with major version updates.

Several dependencies have significant version jumps:

pandas (2.0.3 → 2.2.3)

instructor (1.5.2 → 1.7.2)

litellm (1.49.1 → 1.57.2)

Let's check for any breaking changes in the major updates:

Also applies to: 9-9, 24-25, 28-28, 35-35, 47-48, 54-54, 59-59

✅ Verification successful

Dependencies are compatible with current usage

The codebase uses basic, stable features from all updated libraries:

pandas: Simple DataFrame creation and iteration

instructor: Basic litellm integration

litellm: Standard completion API calls
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for pandas DataFrame operations that might be affected
rg "pd\.DataFrame" -A 3

# Search for instructor decorators and usage
rg "@instruction" -A 3

# Search for litellm configurations
rg "litellm\.completion" -A 3
Length of output: 1927
pyproject.toml (1)

66-66: LGTM! Version alignment with cognee-mcp.

The fastapi-users version is properly pinned and aligned with cognee-mcp's requirements.

cognee-mcp/cognee_mcp/server.py (1)

94-103: Clarify query parameter requirement in visualization tool schema.

The tool's schema defines a "query" property but doesn't mark it as required. Either mark it as required if visualization needs a query parameter, or document that it's optional.
cognee/shared/utils.py (1)

425-432: 🛠️ Refactor suggestion

Remove unused variable and add PNG export.

The filename variable is created but never used. Additionally, the PNG export functionality is incomplete.
    # Get the latest Unix timestamp as an integer
    timestamp = int(time.time())

    # Construct your filename
    filename = f"{timestamp}.png"
+
+    # Export the visualization as PNG
+    try:
+        export_png(p, filename=filename)
+    except Exception as e:
+        logging.error(f"Failed to export PNG: {str(e)}")
+        raise
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

429-429: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

coderabbitai · 2025-01-14T09:00:31Z

cognee-mcp/cognee_mcp/server.py

+    elif name == "visualize":
+        with open(os.devnull, "w") as fnull:
+            with redirect_stdout(fnull), redirect_stderr(fnull):
+                try:
+                    result = await cognee.visualize_graph()
+                    results = retrieved_edges_to_string(result)
+
+                    return [
+                        types.TextContent(
+                            type="text",
+                            text=results,
+                        )
+                    ]
+                except (FileNotFoundError, IOError, ValueError) as e:
+                    raise ValueError(f"Failed to create visualization: {str(e)}")


🛠️ Refactor suggestion

Return visualization output instead of text content.

The visualization tool returns text content (types.TextContent) despite being described as a tool to "Visualize the knowledge graph." Consider returning the visualization as an image using types.ImageContent or as an embedded resource using types.EmbeddedResource.

return [ - types.TextContent( - type="text", - text=results, - ) + types.ImageContent( + type="image", + data=await cognee.visualize_graph(), + format="png" + ) ]

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

cognee/shared/utils.py (1)
Line range hint 1-526: Format code using Ruff

The pipeline indicates that the file requires formatting using the Ruff formatter.

Run the following command to format the code:
ruff format cognee/shared/utils.py
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

♻️ Duplicate comments (1)

cognee/shared/utils.py (1)

453-460: ⚠️ Potential issue

Add error handling for PNG export

The PNG export functionality lacks error handling and cleanup of old files.

A past review has already suggested adding error handling and cleanup functionality for PNG exports. The same issues still apply here.

🧰 Tools

🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable filename is assigned to but never used

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

🧹 Nitpick comments (2)

cognee/shared/utils.py (2)
44-47: Optimize NLTK resource downloads

Downloading NLTK resources on every function call is inefficient. Consider moving these downloads to a module initialization or a dedicated setup function.
+def setup_nltk_resources():
+    """Download required NLTK resources if not already present."""
+    try:
+        nltk.download("words", quiet=True)
+        nltk.download("punkt", quiet=True)
+        nltk.download("averaged_perceptron_tagger", quiet=True)
+        nltk.download("maxent_ne_chunker", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise
+
 def extract_pos_tags(sentence):
     """Extract Part-of-Speech (POS) tags for words in a sentence."""
-    # Ensure that the necessary NLTK resources are downloaded
-    nltk.download("words", quiet=True)
-    nltk.download("punkt", quiet=True)
-    nltk.download("averaged_perceptron_tagger", quiet=True)
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

Line range hint 405-526: Consider splitting visualization code into smaller components

The create_cognee_style_network_with_logo function is handling too many responsibilities, making it hard to maintain and test.

Consider:

Extracting logo embedding into a separate class

Creating a dedicated NetworkVisualizer class

Separating layout and styling logic

Moving PNG export functionality to a dedicated service

This will improve maintainability, testability, and reusability of the visualization components.

🧰 Tools

🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable filename is assigned to but never used

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ba98b2 and ad07bae.

📒 Files selected for processing (1)

cognee/shared/utils.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

cognee/shared/utils.py

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff lint

cognee/shared/utils.py

[error] 457-457: Local variable filename is assigned to but never used

🪛 GitHub Actions: ruff format

cognee/shared/utils.py

[warning] File requires formatting using Ruff formatter

🪛 GitHub Actions: test | weaviate

cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the required package using 'poetry add nltk'.

🪛 GitHub Actions: test | milvus

cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the required package using 'pip install nltk' or add it to poetry dependencies.

🪛 GitHub Actions: test | neo4j

cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the package using poetry or pip.

🪛 GitHub Actions: test | qdrant

cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the nltk package.

🪛 GitHub Actions: test | deduplication

cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the package using poetry add nltk or add it to pyproject.toml.

🪛 GitHub Actions: test | pgvector

cognee/shared/utils.py

[error] 14-14: Missing required dependency: Module 'nltk' not found. Please install the nltk package.

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: run_multimedia_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: profiler
GitHub Check: docker-compose-test

coderabbitai · 2025-01-14T13:14:04Z

cognee/shared/utils.py

+def get_entities(tagged_tokens):
+    nltk.download("maxent_ne_chunker", quiet=True)
+    from nltk.chunk import ne_chunk
+
+    return ne_chunk(tagged_tokens)


🛠️ Refactor suggestion

Add docstring and error handling for get_entities()

The function lacks a docstring explaining its purpose and parameters. Also, NLTK resource downloads should be handled with error checking.

def get_entities(tagged_tokens): + """Extract named entities from POS-tagged tokens using NLTK's ne_chunk. + + Args: + tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag() + + Returns: + A tree containing chunks of named entities + """ + try: nltk.download("maxent_ne_chunker", quiet=True) from nltk.chunk import ne_chunk + except Exception as e: + logging.error(f"Failed to download NLTK resources: {str(e)}") + raise return ne_chunk(tagged_tokens)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_entities(tagged_tokens):

nltk.download("maxent_ne_chunker", quiet=True)

from nltk.chunk import ne_chunk

return ne_chunk(tagged_tokens)

def get_entities(tagged_tokens):

"""Extract named entities from POS-tagged tokens using NLTK's ne_chunk.

Args:

tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()

Returns:

A tree containing chunks of named entities

"""

try:

nltk.download("maxent_ne_chunker", quiet=True)

from nltk.chunk import ne_chunk

except Exception as e:

logging.error(f"Failed to download NLTK resources: {str(e)}")

raise

return ne_chunk(tagged_tokens)

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting using Ruff formatter

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

cognee-mcp/README.md (3)

40-42: Add language specifications to code blocks.

The code blocks should specify their language for proper syntax highlighting.

Apply these changes:

-```
+```bash
cd ~/Library/Application\ Support/Claude/

- +bash
nano claude_desktop_config.json



Also applies to: 46-48

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

40-40: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

---

`44-45`: **Improve configuration file instructions.**

The instructions about adding paths and API key could be more specific and include a warning about securing sensitive information.

Consider adding:
- A note about keeping the API key secure
- Examples of correct path formatting
- Warning about not committing the config file with API keys

---

`101-105`: **Enhance development workflow documentation.**

The development workflow steps could be more detailed to prevent confusion.

Consider:
1. Explaining why each step is necessary
2. Adding expected output or success indicators
3. Mentioning potential troubleshooting steps

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between ad07bae9a781c909c67658261d579de5834ea3b1 and 61118dda0353158ed949d04acf95867d30011d47.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `cognee-mcp/README.md` (2 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

<details>
<summary>cognee-mcp/README.md</summary>

40-40: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

46-46: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

coderabbitai · 2025-01-14T19:06:43Z

cognee-mcp/README.md

+
+To use debugger, run:
+```bash
+npx @modelcontextprotocol/inspector uv --directory /Users/name/folder run cognee


⚠️ Potential issue

Replace hardcoded paths with placeholders.

The instructions contain specific user paths that won't work for other users.

Apply these changes:

-npx @modelcontextprotocol/inspector uv --directory /Users/name/folder run cognee +npx @modelcontextprotocol/inspector uv --directory /Users/{username}/path/to/folder run cognee -npx @modelcontextprotocol/inspector uv --directory /Users/vasilije/cognee/cognee-mcp run cognee +npx @modelcontextprotocol/inspector uv --directory /Users/{username}/cognee/cognee-mcp run cognee

Also applies to: 105-105

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

.github/workflows/dockerhub.yml (2)

44-45: Consider using dynamic cache tags.

The current cache configuration uses a static buildcache tag which could lead to race conditions in parallel builds. Consider using a branch-specific cache tag.

-          cache-from: type=registry,ref=cognee/cognee:buildcache
-          cache-to: type=registry,ref=cognee/cognee:buildcache,mode=max
+          cache-from: type=registry,ref=cognee/cognee:buildcache-${{ github.ref_name }}
+          cache-to: type=registry,ref=cognee/cognee:buildcache-${{ github.ref_name }},mode=max

26-35: Enhance image metadata.

Consider adding more metadata to improve image discoverability and traceability:

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: cognee/cognee
          tags: |
            type=ref,event=branch
            type=sha,prefix={{branch}}-
            type=raw,value=latest,enable={{is_default_branch}}
+          labels: |
+            org.opencontainers.image.title=Cognee
+            org.opencontainers.image.description=Knowledge graph system with visualization capabilities
+            maintainer=Cognee Team

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61118dd and 933d21a.

📒 Files selected for processing (1)

.github/workflows/dockerhub.yml (1 hunks)

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/dockerhub.yml

48-48: property "build" is not defined in object type {meta: {conclusion: string; outcome: string; outputs: {annotations: string; bake-file: string; bake-file-annotations: string; bake-file-labels: string; bake-file-tags: string; json: string; labels: string; tags: string; version: string}}}

(expression)

🪛 yamllint (1.35.1)

.github/workflows/dockerhub.yml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: run_simple_example_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: profiler
GitHub Check: docker-compose-test

🔇 Additional comments (1)

.github/workflows/dockerhub.yml (1)

1-8: LGTM! Good addition of the dev branch.

The workflow trigger configuration is well-structured, allowing for testing in the dev branch before changes reach main.

coderabbitai · 2025-01-14T19:41:55Z

.github/workflows/dockerhub.yml

+      - name: Image digest
+        run: echo ${{ steps.build.outputs.digest }}


⚠️ Potential issue

Fix the image digest step.

There are two issues to address:

The digest step references an undefined step ID 'build'

The file is missing a newline at the end

Apply this diff to fix both issues:

- name: Image digest - run: echo ${{ steps.build.outputs.digest }} + run: echo ${{ steps.meta.outputs.json }} +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Image digest

run: echo ${{ steps.build.outputs.digest }}

- name: Image digest

run: echo ${{ steps.meta.outputs.json }}

🧰 Tools

🪛 actionlint (1.7.4)

48-48: property "build" is not defined in object type {meta: {conclusion: string; outcome: string; outputs: {annotations: string; bake-file: string; bake-file-annotations: string; bake-file-labels: string; bake-file-tags: string; json: string; labels: string; tags: string; version: string}}}

(expression)

🪛 yamllint (1.35.1)

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

coderabbitai · 2025-01-14T19:41:55Z

.github/workflows/dockerhub.yml

  docker-build-and-push:
    runs-on: ubuntu-latest

    steps:
-    - name: Checkout repository


🛠️ Refactor suggestion

Add explicit permissions configuration.

Consider adding explicit permissions to follow the principle of least privilege:

docker-build-and-push: runs-on: ubuntu-latest + permissions: + contents: read + packages: write

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

jobs:

docker-build-and-push:

runs-on: ubuntu-latest

steps:

jobs:

docker-build-and-push:

runs-on: ubuntu-latest

permissions:

contents: read

packages: write

steps:

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/workflows/approve_dco.yaml (2)
15-23: Consider making the DCO validation more robust.

The current exact text matching might be too strict. Consider:

Case-insensitive matching

Normalizing whitespace

Supporting multiple statement formats

Here's a suggested improvement:
-            const requiredStatement = "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin";
-            
-            if (!prBody.includes(requiredStatement)) {
+            const requiredStatement = "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin";
+            const normalizedPrBody = prBody.toLowerCase().replace(/\s+/g, ' ').trim();
+            const normalizedStatement = requiredStatement.toLowerCase().replace(/\s+/g, ' ').trim();
+            
+            if (!normalizedPrBody.includes(normalizedStatement)) {
🧰 Tools

🪛 yamllint (1.35.1)

[error] 18-18: trailing spaces

(trailing-spaces)

18-18: Remove trailing whitespace.

There is trailing whitespace on line 18 that should be removed.
-            
+
🧰 Tools

🪛 yamllint (1.35.1)

[error] 18-18: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 662faeb and 4a87df9.

📒 Files selected for processing (2)

.github/pull_request_template.md (1 hunks)
.github/workflows/approve_dco.yaml (1 hunks)

✅ Files skipped from review due to trivial changes (1)

.github/pull_request_template.md

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/approve_dco.yaml

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 yamllint (1.35.1)

.github/workflows/approve_dco.yaml

[error] 18-18: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (14)

GitHub Check: run_notebook_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: profiler
GitHub Check: docker-compose-test

🔇 Additional comments (1)

.github/workflows/approve_dco.yaml (1)

1-5: LGTM! Well-defined workflow triggers.

The workflow name is descriptive and the trigger configuration comprehensively covers all relevant pull request events.

coderabbitai · 2025-01-16T19:15:06Z

.github/workflows/approve_dco.yaml

+  check-dco:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Validate Developer Certificate of Origin statement
+        uses: actions/github-script@v6
+        with:


💡 Codebase verification

Update actions/github-script from v6 to v7

The latest version is v7.0.1, which includes improvements and bug fixes. Since this is a major version update, review the changelog for any breaking changes before updating.

🔗 Analysis chain

Update the GitHub Script action version.

The current version of actions/github-script@v6 might be outdated. Consider updating to the latest version to ensure compatibility and security.

Let's verify the latest version and any security advisories:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check the latest version and security advisories for actions/github-script gh api repos/actions/github-script/releases/latest --jq .tag_name # Check for any security advisories gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: ACTIONS, package: "github-script") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }'

Length of output: 425

🧰 Tools

🪛 actionlint (1.7.4)

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

.github/workflows/approve_dco.yaml (1)
12-12: ⚠️ Potential issue

Update actions/github-script from v6 to v7.

The current version is outdated and should be updated to the latest v7 for compatibility and security improvements.
-        uses: actions/github-script@v6
+        uses: actions/github-script@v7
🧰 Tools

🪛 actionlint (1.7.4)

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🧹 Nitpick comments (1)

.github/workflows/approve_dco.yaml (1)

37-44: Enhance error handling with specific error messages.

The error handling could be more informative by providing specific error messages for different error scenarios.

             } catch (error) {
               // If we get a 404, user is NOT an org member
               if (error.status === 404) {
                 console.log(`${prUser} is NOT a member of ${orgName}. Enforcing DCO check.`);
               } else {
                 // Some other error—fail the workflow or handle accordingly
-                core.setFailed(`Error checking organization membership: ${error.message}`);
+                core.setFailed(
+                  `Failed to verify organization membership: ${error.status === 403 ? 
+                    'Insufficient permissions. Ensure GITHUB_TOKEN has read:org scope.' : 
+                    `Unexpected error (${error.status}): ${error.message}`}`
+                );
               }
             }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a87df9 and b2355de.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

.github/workflows/approve_dco.yaml (1 hunks)
cognee/shared/utils.py (5 hunks)
pyproject.toml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/approve_dco.yaml

12-12: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Ruff (0.8.2)

cognee/shared/utils.py

26-26: Redefinition of unused nltk from line 14

Remove definition: nltk

(F811)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff format

cognee/shared/utils.py

[warning] File needs formatting according to Ruff standards

🪛 GitHub Actions: ruff lint

cognee/shared/utils.py

[error] 26-26: Redefinition of unused 'nltk' from line 14

[error] 457-457: Local variable 'filename' is assigned to but never used

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: docker-compose-test

🔇 Additional comments (3)

.github/workflows/approve_dco.yaml (1)

1-6: LGTM! Comprehensive PR trigger events.

The workflow configuration is well-structured with appropriate trigger events for PR-related actions.
cognee/shared/utils.py (2)

Line range hint 60-550: LGTM! Well-structured utility functions

The remaining utility functions are well-implemented with proper error handling and documentation.

🧰 Tools

🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable 'filename' is assigned to but never used

34-38: 🛠️ Refactor suggestion

Add error handling and docstring for get_entities()

The function lacks a docstring and proper error handling for NLTK resource downloads.
 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    
+    Raises:
+        Exception: If NLTK resource download fails
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

         return ne_chunk(tagged_tokens)
Likely invalid or redundant comment.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

coderabbitai · 2025-01-16T19:42:41Z

.github/workflows/approve_dco.yaml

+          # In GitHub Enterprise or private orgs, you might need a PAT (personal access token) with read:org scope.
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org


⚠️ Potential issue

Replace placeholder organization name.

The organization name is currently set to a placeholder value 'YOUR_ORGANIZATION_NAME'. This needs to be replaced with the actual organization name 'topoteretes' as indicated in the PR objectives.

- const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org + const orgName = 'topoteretes';

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org

const orgName = 'topoteretes';

coderabbitai · 2025-01-16T19:42:41Z

cognee/shared/utils.py

 import pandas as pd
 import matplotlib.pyplot as plt
 import tiktoken
 import nltk


⚠️ Potential issue

Remove duplicate nltk import

The nltk module is imported twice. Remove the duplicate import on line 26 to fix the F811 error.

import nltk import base64 import time ... - import nltk from cognee.shared.exceptions import IngestionError

Also applies to: 26-26

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

coderabbitai · 2025-01-16T19:42:41Z

cognee/shared/utils.py

+    # Get the latest Unix timestamp as an integer
+    timestamp = int(time.time())
+
+    # Construct your filename
+    filename = f"{timestamp}.png"
+
+
+


⚠️ Potential issue

Add error handling and use the filename variable for PNG export

The PNG export functionality lacks error handling and cleanup of old files. Also, the filename variable is created but never used.

# Get the latest Unix timestamp as an integer timestamp = int(time.time()) # Construct your filename filename = f"{timestamp}.png" + try: + # Cleanup old PNG files to prevent disk space issues + cleanup_old_pngs(directory=".", keep_latest=5) + + # Export the new PNG + export_png(p, filename=filename) + except Exception as e: + logging.error(f"Failed to export PNG: {str(e)}") + raise

Consider adding a helper function to cleanup old PNG files:

def cleanup_old_pngs(directory: str, keep_latest: int = 5): """Cleanup old PNG files, keeping only the N latest files.""" png_files = [f for f in os.listdir(directory) if f.endswith('.png')] if len(png_files) <= keep_latest: return # Sort by timestamp in filename sorted_files = sorted(png_files, key=lambda x: int(x.replace(".png", ""))) # Remove older files for f in sorted_files[:-keep_latest]: try: os.remove(os.path.join(directory, f)) except OSError as e: logging.warning(f"Failed to remove old PNG file {f}: {str(e)}")

🧰 Tools

🪛 Ruff (0.8.2)

457-457: Local variable filename is assigned to but never used

Remove assignment to unused variable filename

(F841)

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

🪛 GitHub Actions: ruff lint

[error] 457-457: Local variable 'filename' is assigned to but never used

coderabbitai · 2025-01-16T19:42:41Z

cognee/shared/utils.py

+def extract_pos_tags(sentence):
+    """Extract Part-of-Speech (POS) tags for words in a sentence."""
+
+    # Ensure that the necessary NLTK resources are downloaded
+    nltk.download("words", quiet=True)
+    nltk.download("punkt", quiet=True)
+    nltk.download("averaged_perceptron_tagger", quiet=True)
+
+    from nltk.tag import pos_tag
+    from nltk.tokenize import word_tokenize
+
+    # Tokenize the sentence into words
+    tokens = word_tokenize(sentence)
+
+    # Tag each word with its corresponding POS tag
+    pos_tags = pos_tag(tokens)
+
+    return pos_tags
+


🛠️ Refactor suggestion

Add input validation and improve error handling for extract_pos_tags()

The function should validate input and handle NLTK resource downloads more robustly.

def extract_pos_tags(sentence): """Extract Part-of-Speech (POS) tags for words in a sentence. + + Args: + sentence (str): Input sentence to be POS tagged + + Returns: + list: A list of tuples containing (word, POS_tag) + + Raises: + ValueError: If sentence is not a string or is empty + Exception: If NLTK resource download fails + """ + if not isinstance(sentence, str) or not sentence.strip(): + raise ValueError("Input must be a non-empty string") + try: # Ensure that the necessary NLTK resources are downloaded nltk.download("words", quiet=True) nltk.download("punkt", quiet=True) nltk.download("averaged_perceptron_tagger", quiet=True) + except Exception as e: + logging.error(f"Failed to download NLTK resources: {str(e)}") + raise from nltk.tag import pos_tag from nltk.tokenize import word_tokenize # Tokenize the sentence into words tokens = word_tokenize(sentence) # Tag each word with its corresponding POS tag pos_tags = pos_tag(tokens) return pos_tags

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def extract_pos_tags(sentence):

"""Extract Part-of-Speech (POS) tags for words in a sentence."""

# Ensure that the necessary NLTK resources are downloaded

nltk.download("words", quiet=True)

nltk.download("punkt", quiet=True)

nltk.download("averaged_perceptron_tagger", quiet=True)

from nltk.tag import pos_tag

from nltk.tokenize import word_tokenize

# Tokenize the sentence into words

tokens = word_tokenize(sentence)

# Tag each word with its corresponding POS tag

pos_tags = pos_tag(tokens)

return pos_tags

def extract_pos_tags(sentence):

"""Extract Part-of-Speech (POS) tags for words in a sentence.

Args:

sentence (str): Input sentence to be POS tagged

Returns:

list: A list of tuples containing (word, POS_tag)

Raises:

ValueError: If sentence is not a string or is empty

Exception: If NLTK resource download fails

"""

if not isinstance(sentence, str) or not sentence.strip():

raise ValueError("Input must be a non-empty string")

try:

# Ensure that the necessary NLTK resources are downloaded

nltk.download("words", quiet=True)

nltk.download("punkt", quiet=True)

nltk.download("averaged_perceptron_tagger", quiet=True)

except Exception as e:

logging.error(f"Failed to download NLTK resources: {str(e)}")

raise

from nltk.tag import pos_tag

from nltk.tokenize import word_tokenize

# Tokenize the sentence into words

tokens = word_tokenize(sentence)

# Tag each word with its corresponding POS tag

pos_tags = pos_tag(tokens)

return pos_tags

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File needs formatting according to Ruff standards

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

.github/workflows/test_python_3_10.yml (1)

45-47: ⚠️ Potential issue

Fix NLTK tokenizer name and add download verification.

The NLTK tokenizer package name is incorrect. It should be 'punkt' instead of 'punkt_tab'. Also, consider adding download verification.

.github/workflows/test_python_3_11.yml (1)

47-49: ⚠️ Potential issue

Fix NLTK tokenizer name and add download verification.

The NLTK tokenizer package name is incorrect. It should be 'punkt' instead of 'punkt_tab'. Also, consider adding download verification.

🧹 Nitpick comments (4)

.github/workflows/test_python_3_10.yml (1)

48-48: Remove extra blank line.

Remove the extra blank line after the NLTK download step to maintain consistent spacing between workflow steps.

.github/workflows/test_python_3_11.yml (1)

50-51: Remove extra blank lines.

Remove the two extra blank lines after the NLTK download step to maintain consistent spacing between workflow steps.

cognee/tests/unit/processing/chunks/chunk_by_word_test.py (1)

Line range hint 1-22: Consider batch formatting all test files.

Given the consistent pattern of assertion formatting changes across multiple test files, consider:

Running ruff format on all test files in a single batch

Adding a pre-commit hook to automatically format files

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter
cognee/tests/test_deduplication.py (1)
33-35: LGTM! The assertion reformatting improves readability.

The reformatting of assertions using parentheses improves code readability while maintaining the same functionality.

Please run the Ruff formatter to fix the formatting issues flagged by the pipeline:
#!/bin/bash
# Description: Check which files need formatting
ruff format --check .
Also applies to: 64-66

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2355de and bbd51e8.

📒 Files selected for processing (18)

.github/workflows/test_python_3_10.yml (1 hunks)
.github/workflows/test_python_3_11.yml (1 hunks)
.github/workflows/test_python_3_12.yml (1 hunks)
cognee-mcp/cognee_mcp/server.py (3 hunks)
cognee/shared/utils.py (4 hunks)
cognee/tests/integration/documents/AudioDocument_test.py (1 hunks)
cognee/tests/integration/documents/ImageDocument_test.py (1 hunks)
cognee/tests/integration/documents/PdfDocument_test.py (1 hunks)
cognee/tests/integration/documents/TextDocument_test.py (1 hunks)
cognee/tests/integration/documents/UnstructuredDocument_test.py (1 hunks)
cognee/tests/test_deduplication.py (2 hunks)
cognee/tests/test_falkordb.py (1 hunks)
cognee/tests/test_library.py (1 hunks)
cognee/tests/test_pgvector.py (2 hunks)
cognee/tests/unit/processing/chunks/chunk_by_paragraph_2_test.py (3 hunks)
cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py (1 hunks)
cognee/tests/unit/processing/chunks/chunk_by_sentence_test.py (2 hunks)
cognee/tests/unit/processing/chunks/chunk_by_word_test.py (1 hunks)

✅ Files skipped from review due to trivial changes (9)

cognee/tests/test_falkordb.py
cognee/tests/unit/processing/chunks/chunk_by_sentence_test.py
cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py
cognee/tests/integration/documents/ImageDocument_test.py
cognee/tests/integration/documents/PdfDocument_test.py
cognee/tests/test_library.py
cognee/tests/integration/documents/TextDocument_test.py
cognee/tests/test_pgvector.py
cognee/tests/unit/processing/chunks/chunk_by_paragraph_2_test.py

🧰 Additional context used

🪛 GitHub Actions: ruff format

cognee/tests/unit/processing/chunks/chunk_by_word_test.py

[warning] File requires formatting with Ruff formatter

cognee/tests/integration/documents/AudioDocument_test.py

[warning] File requires formatting with Ruff formatter

cognee/tests/test_deduplication.py

[warning] File requires formatting with Ruff formatter

cognee/tests/integration/documents/UnstructuredDocument_test.py

[warning] File requires formatting with Ruff formatter

⏰ Context from checks skipped due to timeout of 90000ms (14)

GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: docker-compose-test

🔇 Additional comments (8)

cognee/tests/integration/documents/AudioDocument_test.py (1)

Line range hint 1-47: Verify PR scope and title accuracy.

This file contains only test assertion formatting changes, which seems unrelated to the PR's stated purpose of "Add data visualization for Anthropic". Consider either:

Updating the PR title/description to reflect these formatting changes, or

Moving these changes to a separate PR focused on test improvements

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

cognee/tests/unit/processing/chunks/chunk_by_word_test.py (1)

Line range hint 1-1: Overall PR Review Summary

Scope Mismatch: The PR title suggests adding data visualization features, but the changes only contain test assertion formatting modifications. Consider either:

Updating the PR title/description to accurately reflect these changes

Moving these formatting changes to a separate PR

Adding the missing data visualization changes

Formatting Issues: All modified files have Ruff formatting warnings. Recommend:

Running ruff format on all modified files

Setting up pre-commit hooks to prevent formatting issues

Consider batch formatting all test files for consistency

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter
cognee-mcp/cognee_mcp/server.py (3)

94-103: Mark query parameter as required in visualization tool schema.

The tool's schema defines a "query" property but doesn't mark it as required. This should be fixed to ensure proper validation.
         types.Tool(
             name="visualize",
             description="Visualize the knowledge graph.",
             inputSchema={
                 "type": "object",
                 "properties": {
                     "query": {"type": "string"},
                 },
+                "required": ["query"],
             },
         ),
107-132: LGTM! Well-implemented function with proper error handling.

The get_freshest_png function is well-implemented with:

Proper error handling for directory and file operations

Clear documentation

Good use of type hints

199-213: Return visualization output as image content.

The visualization tool returns text content despite being described as a tool to "Visualize the knowledge graph."
                    return [
-                       types.TextContent(
-                           type="text",
-                           text=results,
-                       )
+                       types.ImageContent(
+                           type="image",
+                           data=await cognee.visualize_graph(),
+                           format="png"
+                       )
                    ]
cognee/shared/utils.py (3)

397-397: Add error handling for PNG export.

The PNG export functionality lacks error handling and cleanup of old files.
     # Get the latest Unix timestamp as an integer
     timestamp = int(time.time())

     # Construct your filename
     filename = f"{timestamp}.png"

-    export_png(p, filename=filename)
+    try:
+        # Cleanup old PNG files to prevent disk space issues
+        cleanup_old_pngs(directory=".", keep_latest=5)
+        
+        # Export the new PNG
+        export_png(p, filename=filename)
+    except Exception as e:
+        logging.error(f"Failed to export PNG: {str(e)}")
+        raise
31-35: 🛠️ Refactor suggestion

Add docstring and error handling for get_entities().

The function lacks a docstring explaining its purpose and parameters. Also, NLTK resource downloads should be handled with error checking.
 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    
+    Raises:
+        Exception: If NLTK resource download fails
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     return ne_chunk(tagged_tokens)
Likely invalid or redundant comment.

38-56: 🛠️ Refactor suggestion

Add input validation and improve error handling for extract_pos_tags().

The function should validate input and handle NLTK resource downloads more robustly.
 def extract_pos_tags(sentence):
-    """Extract Part-of-Speech (POS) tags for words in a sentence."""
+    """Extract Part-of-Speech (POS) tags for words in a sentence.
+    
+    Args:
+        sentence (str): Input sentence to be POS tagged
+    
+    Returns:
+        list: A list of tuples containing (word, POS_tag)
+    
+    Raises:
+        ValueError: If sentence is not a string or is empty
+        Exception: If NLTK resource download fails
+    """
+    if not isinstance(sentence, str) or not sentence.strip():
+        raise ValueError("Input must be a non-empty string")

+    try:
         nltk.download("words", quiet=True)
         nltk.download("punkt", quiet=True)
         nltk.download("averaged_perceptron_tagger", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     from nltk.tag import pos_tag
     from nltk.tokenize import word_tokenize

     tokens = word_tokenize(sentence)
     pos_tags = pos_tag(tokens)

     return pos_tags
Likely invalid or redundant comment.

coderabbitai · 2025-01-16T20:12:42Z

.github/workflows/test_python_3_12.yml

+      - name: Download NLTK tokenizer data
+        run: |
+          poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng


⚠️ Potential issue

Fix NLTK tokenizer name and add download verification.

The NLTK tokenizer package name is incorrect. It should be 'punkt' instead of 'punkt_tab'. Also, consider adding download verification.

Apply this diff to fix the issues:

- name: Download NLTK tokenizer data run: | - poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng + poetry run python -c " + import nltk + for package in ['punkt', 'averaged_perceptron_tagger']: + nltk.download(package) + # Verify download + try: + if package == 'punkt': + nltk.data.find('tokenizers/punkt') + else: + nltk.data.find('taggers/averaged_perceptron_tagger') + print(f'Successfully downloaded and verified {package}') + except LookupError as e: + print(f'Failed to verify {package}') + exit(1) + "

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Download NLTK tokenizer data

run: |

poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng

- name: Download NLTK tokenizer data

run: |

poetry run python -c "

import nltk

for package in ['punkt', 'averaged_perceptron_tagger']:

nltk.download(package)

# Verify download

try:

if package == 'punkt':

nltk.data.find('tokenizers/punkt')

else:

nltk.data.find('taggers/averaged_perceptron_tagger')

print(f'Successfully downloaded and verified {package}')

except LookupError as e:

print(f'Failed to verify {package}')

exit(1)

"

coderabbitai · 2025-01-16T20:12:42Z

cognee/tests/integration/documents/AudioDocument_test.py

+            assert (
+                ground_truth["word_count"] == paragraph_data.word_count
+            ), f'{ground_truth["word_count"] = } != {paragraph_data.word_count = }'
+            assert ground_truth["len_text"] == len(
+                paragraph_data.text
+            ), f'{ground_truth["len_text"] = } != {len(paragraph_data.text) = }'
+            assert (
+                ground_truth["cut_type"] == paragraph_data.cut_type
+            ), f'{ground_truth["cut_type"] = } != {paragraph_data.cut_type = }'


⚠️ Potential issue

Fix formatting to comply with Ruff.

The assertion formatting needs to be adjusted according to Ruff's requirements. While the multi-line format improves readability, it should follow the project's automated formatting rules.

Run ruff format on this file to automatically fix the formatting issues.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T20:12:42Z

cognee/tests/integration/documents/UnstructuredDocument_test.py

+        assert (
+            "sentence_cut" == paragraph_data.cut_type
+        ), f" sentence_cut != {paragraph_data.cut_type = }"

    # Test DOCX
    for paragraph_data in docx_document.read(chunk_size=1024, chunker="text_chunker"):
        assert 16 == paragraph_data.word_count, f" 16 != {paragraph_data.word_count = }"
        assert 145 == len(paragraph_data.text), f" 145 != {len(paragraph_data.text) = }"
-        assert "sentence_end" == paragraph_data.cut_type, (
-            f" sentence_end != {paragraph_data.cut_type = }"
-        )
+        assert (
+            "sentence_end" == paragraph_data.cut_type
+        ), f" sentence_end != {paragraph_data.cut_type = }"

    # TEST CSV
    for paragraph_data in csv_document.read(chunk_size=1024, chunker="text_chunker"):
        assert 15 == paragraph_data.word_count, f" 15 != {paragraph_data.word_count = }"
-        assert "A A A A A A A A A,A A A A A A,A A" == paragraph_data.text, (
-            f"Read text doesn't match expected text: {paragraph_data.text}"
-        )
-        assert "sentence_cut" == paragraph_data.cut_type, (
-            f" sentence_cut != {paragraph_data.cut_type = }"
-        )
+        assert (
+            "A A A A A A A A A,A A A A A A,A A" == paragraph_data.text
+        ), f"Read text doesn't match expected text: {paragraph_data.text}"
+        assert (
+            "sentence_cut" == paragraph_data.cut_type
+        ), f" sentence_cut != {paragraph_data.cut_type = }"

    # Test XLSX
    for paragraph_data in xlsx_document.read(chunk_size=1024, chunker="text_chunker"):
        assert 36 == paragraph_data.word_count, f" 36 != {paragraph_data.word_count = }"
        assert 171 == len(paragraph_data.text), f" 171 != {len(paragraph_data.text) = }"
-        assert "sentence_cut" == paragraph_data.cut_type, (
-            f" sentence_cut != {paragraph_data.cut_type = }"
-        )
+        assert (
+            "sentence_cut" == paragraph_data.cut_type
+        ), f" sentence_cut != {paragraph_data.cut_type = }"


⚠️ Potential issue

Fix formatting to comply with Ruff.

Multiple assertion blocks need formatting adjustments to comply with Ruff's requirements.

Run ruff format on this file to automatically fix the formatting issues.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T20:12:42Z

cognee/tests/unit/processing/chunks/chunk_by_word_test.py

+    assert (
+        reconstructed_text == input_text
+    ), f"texts are not identical: {len(input_text) = }, {len(reconstructed_text) = }"


⚠️ Potential issue

Fix formatting to comply with Ruff.

The assertion formatting needs to be adjusted according to Ruff's requirements.

Run ruff format on this file to automatically fix the formatting issues.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T20:12:43Z

cognee-mcp/cognee_mcp/server.py

+from sqlalchemy.testing.plugin.plugin_base import logging
+


🛠️ Refactor suggestion

Use the standard Python logging module instead.

Importing logging from sqlalchemy's testing plugin is not recommended. Use Python's built-in logging module instead.

-from sqlalchemy.testing.plugin.plugin_base import logging +import logging

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from sqlalchemy.testing.plugin.plugin_base import logging

import logging

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbd51e8 and fe47253.

⛔ Files ignored due to path filters (1)

cognee-mcp/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

cognee-mcp/cognee_mcp/server.py (3 hunks)
cognee/api/v1/visualize/visualize.py (1 hunks)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2 hunks)
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (3 hunks)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
cognee/shared/utils.py (5 hunks)

✅ Files skipped from review due to trivial changes (4)

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py
cognee/api/v1/visualize/visualize.py
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py

🚧 Files skipped from review as they are similar to previous changes (1)

cognee-mcp/cognee_mcp/server.py

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_simple_example_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: docker-compose-test

🔇 Additional comments (3)

cognee/shared/utils.py (3)

38-56: 🛠️ Refactor suggestion

Add input validation and improve error handling for extract_pos_tags()

The function should validate input and handle NLTK resource downloads more robustly.

 def extract_pos_tags(sentence):
     """Extract Part-of-Speech (POS) tags for words in a sentence."""
+    if not isinstance(sentence, str) or not sentence.strip():
+        raise ValueError("Input must be a non-empty string")

+    try:
         # Ensure that the necessary NLTK resources are downloaded
         nltk.download("words", quiet=True)
         nltk.download("punkt", quiet=True)
         nltk.download("averaged_perceptron_tagger", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

     from nltk.tag import pos_tag
     from nltk.tokenize import word_tokenize

     # Tokenize the sentence into words
     tokens = word_tokenize(sentence)

     # Tag each word with its corresponding POS tag
     pos_tags = pos_tag(tokens)

     return pos_tags

Likely invalid or redundant comment.

31-35: 🛠️ Refactor suggestion

Add docstring and error handling for get_entities()

The function lacks a docstring explaining its purpose and parameters. Also, NLTK resource downloads should be handled with error checking.

 def get_entities(tagged_tokens):
+    """Extract named entities from POS-tagged tokens using NLTK's ne_chunk.
+    
+    Args:
+        tagged_tokens: A list of POS-tagged tokens from nltk.pos_tag()
+    
+    Returns:
+        A tree containing chunks of named entities
+    
+    Raises:
+        Exception: If NLTK resource download fails
+    """
+    try:
         nltk.download("maxent_ne_chunker", quiet=True)
         from nltk.chunk import ne_chunk
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resources: {str(e)}")
+        raise

         return ne_chunk(tagged_tokens)

Likely invalid or redundant comment.

397-397: 🛠️ Refactor suggestion

Add error handling for PNG export

The PNG export functionality lacks error handling and cleanup of old files.

     from bokeh.io import export_png
+    import os
+
+    def cleanup_old_pngs(directory=".", keep_latest=5):
+        """Cleanup old PNG files, keeping only the N latest files."""
+        png_files = [f for f in os.listdir(directory) if f.endswith('.png')]
+        if len(png_files) <= keep_latest:
+            return
+            
+        # Sort by timestamp in filename
+        sorted_files = sorted(png_files, key=lambda x: int(x.replace(".png", "")))
+        
+        # Remove older files
+        for f in sorted_files[:-keep_latest]:
+            try:
+                os.remove(os.path.join(directory, f))
+            except OSError as e:
+                logging.warning(f"Failed to remove old PNG file {f}: {str(e)}")

Likely invalid or redundant comment.

coderabbitai · 2025-01-16T20:40:26Z

cognee/shared/utils.py

+    home_dir = os.path.expanduser("~")
+
+    # Construct the final output file path
+    output_filepath = os.path.join(home_dir, output_filename)
+    with open(output_filepath, "w") as f:
+        f.write(html_content)


🛠️ Refactor suggestion

Add error handling for file operations

The file writing operations lack error handling and directory existence checks.

home_dir = os.path.expanduser("~") + + # Ensure output directory exists + try: + os.makedirs(home_dir, exist_ok=True) + except OSError as e: + logging.error(f"Failed to create output directory: {str(e)}") + raise # Construct the final output file path output_filepath = os.path.join(home_dir, output_filename) + try: with open(output_filepath, "w") as f: f.write(html_content) + logging.info(f"Successfully saved visualization to {output_filepath}") + except IOError as e: + logging.error(f"Failed to write output file: {str(e)}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

home_dir = os.path.expanduser("~")

# Construct the final output file path

output_filepath = os.path.join(home_dir, output_filename)

with open(output_filepath, "w") as f:

f.write(html_content)

home_dir = os.path.expanduser("~")

# Ensure output directory exists

try:

os.makedirs(home_dir, exist_ok=True)

except OSError as e:

logging.error(f"Failed to create output directory: {str(e)}")

raise

# Construct the final output file path

output_filepath = os.path.join(home_dir, output_filename)

try:

with open(output_filepath, "w") as f:

f.write(html_content)

logging.info(f"Successfully saved visualization to {output_filepath}")

except IOError as e:

logging.error(f"Failed to write output file: {str(e)}")

raise

Add data visualization for Anthropic

daf2d54

Vasilije1990 changed the title ~~Add data visualization for Anthropic~~ feat: Add data visualization for Anthropic Jan 10, 2025

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

Vasilije1990 and others added 2 commits January 11, 2025 16:57

Update cognee-mcp/cognee_mcp/server.py

b132ff4

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update cognee-mcp/cognee_mcp/server.py

7b0bfe9

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Jan 11, 2025

View reviewed changes

Update cognee-mcp/cognee_mcp/server.py

cf4737b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

borisarzentar requested changes Jan 12, 2025

View reviewed changes

Vasilije1990 added 2 commits January 14, 2025 00:40

Add data visualization for Anthropic

55e9d64

Add data visualization for Anthropic

047948a

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

Vasilije1990 and others added 2 commits January 14, 2025 10:04

Merge branch 'dev' into COG-975

3ba98b2

Add data visualization for Anthropic

ad07bae

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

Vasilije1990 added 2 commits January 14, 2025 19:59

Update README.md

a0e3686

Update README.md

61118dd

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

Vasilije1990 added 2 commits January 14, 2025 20:25

Update README.md

e71f852

Update dockerhub pushes

933d21a

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

Vasilije1990 and others added 4 commits January 15, 2025 22:29

Merge branch 'dev' into COG-975

aef7822

Update lock files

be0b486

Update format

662faeb

Update format

4a87df9

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Vasilije1990 added 3 commits January 16, 2025 20:16

Update format

4ae8eb9

Update format

1af24dc

Update format

b2355de

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Vasilije1990 added 7 commits January 16, 2025 20:49

Update format

5b31638

Update format

f19b58a

Fix for now

5aaf420

Fix for now

72b503f

Fix for now

7a4a0f4

Fix for now

0783625

Fix for now

bbd51e8

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Vasilije1990 added 2 commits January 16, 2025 21:36

Fix for now

cb7b2d3

Fix for now

fe47253

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Vasilije1990 merged commit 6c6ba32 into dev Jan 16, 2025
25 of 26 checks passed

Vasilije1990 deleted the COG-975 branch January 16, 2025 20:41

coderabbitai bot mentioned this pull request Apr 29, 2025

Fix versions gh actions [COG-2025] #798

Merged

	from PIL import Image
	from PIL import Image as PILImage
	from PIL import Image

		- name: Image digest
		run: echo ${{ steps.build.outputs.digest }}

	const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org
	const orgName = 'topoteretes';

-      - name: Download NLTK tokenizer data
-        run: |
-          poetry run python -m nltk.downloader punkt_tab averaged_perceptron_tagger_eng
+      - name: Download NLTK tokenizer data
+        run: |
+          poetry run python -c "
+            import nltk
+            for package in ['punkt', 'averaged_perceptron_tagger']:
+                nltk.download(package)
+                # Verify download
+                try:
+                    if package == 'punkt':
+                        nltk.data.find('tokenizers/punkt')
+                    else:
+                        nltk.data.find('taggers/averaged_perceptron_tagger')
+                    print(f'Successfully downloaded and verified {package}')
+                except LookupError as e:
+                    print(f'Failed to verify {package}')
+                    exit(1)
+          "

	from sqlalchemy.testing.plugin.plugin_base import logging
	import logging

feat: Add data visualization for Anthropic #432

feat: Add data visualization for Anthropic #432

Uh oh!

Conversation

Vasilije1990 commented Jan 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

borisarzentar Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Vasilije1990 commented Jan 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Jan 11, 2025 •

edited

Loading