feat: fixes and updates to MCP, retrievers, general fixes #840

Vasilije1990 · 2025-05-19T11:38:39Z

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Resolve issue with .venv being broken when using docker compose with Cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <[email protected]>

… 1947 (#760)  ## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>

## Description Add support for UV and for Poetry package management ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Switch typing from str to UUID for NetworkX node_id ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Add both sse and stdio support for Cognee MCP ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…83] (#782)  ## Description Add log handling options for cognee exceptions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Fix issue with failing versions gh actions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Vasilije <[email protected]>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>

## Description This PR adds support for the Memgraph graph database following the [graph database integration guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration): - Implemented `MemgraphAdapter` for interfacing with Memgraph. - Updated `get_graph_engine.py` to return MemgraphAdapter when appropriate. - Added a test script:` test_memgraph.py.` - Created a dedicated test workflow: `.github/workflows/test_memgraph.yml.` ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]> Co-authored-by: Boris <[email protected]>

## Description refactor: Handle boto3 s3fs dependencies better ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Update LanceDB and rewrite data points to run async ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Boris Arzentar <[email protected]>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@hande-k

## Description As discussed with @hande-k and Lazar, I've created a short demo to illustrate how to get the pagerank rankings from the knowledge graph given the nx engine. This is a POC, and a first of step towards solving #643 . Please let me know what you think, and how to proceed from here. :) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>

## Description Added tools to check current cognify and codify status ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

This reverts commit c058219.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Add ability to map column values from relational databases to graph ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Matea Pesic <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Daniel Molnar <[email protected]> Co-authored-by: Diego Baptista Theuerkauf <[email protected]>

## Description /api/v1/responses In this PR manages function calls - search - cognify - prune Next steps - codify ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Diego B Theuerkauf <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Diego Baptista Theuerkauf <[email protected]> Co-authored-by: Boris <[email protected]> Co-authored-by: Boris <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Vasilije <[email protected]>

## Description Fixes Anthropic bug as reported by the user #812 ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <[email protected]>

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…exist case  ## Description Fixes pipeline run status migration ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

## Description Fixes graph completion limit ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

pull-checklist · 2025-05-19T11:38:42Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-05-19T11:38:48Z

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces broad enhancements and refactoring across the codebase. Major changes include: new OpenAI-compatible response APIs, expanded support for graph/vector databases (e.g., Memgraph), improved pipeline and dataset handling, refined error handling, and new example scripts. The frontend and documentation are updated for clarity and new features, while several tests and legacy files are removed or adjusted.

Changes

File(s) / Path(s)	Change Summary
CONTRIBUTING.md, Dockerfile, README.md, entrypoint.sh, assets/graph_visualization.html	Refactored Dockerfile to multi-stage build, updated PR instructions and README, replaced static visualization with UI, and modified entrypoint script logic.
alembic/versions/...	Alembic migrations: added new pipeline run status, improved error handling in user creation.
cognee-frontend/src/app/... , cognee-frontend/src/modules/... , cognee-frontend/src/ui/...	Changed dataset prop types from id to name, added "Cognify" button, improved notification and error handling, updated fetch base URL.
cognee-mcp/src/server.py	Refactored MCP server: replaced monolithic dispatcher with explicit async tool functions, added developer rules tool, improved background task handling and status reporting.
cognee/api/client.py, cognee/api/v1/responses/**	Introduced new OpenAI-compatible responses API: routers, models, default tools, dispatch logic, and endpoints.
cognee/api/v1/cognify/, cognee/api/v1/config/, cognee/api/v1/datasets/*	Improved observability integration, error message syntax, and pipeline status scoping.
cognee/base_config.py, cognee/modules/observability/*, cognee/shared/data_models.py	Refactored monitoring tool configuration: replaced MonitoringTool enum with Observer, centralized observe decorator retrieval.
cognee/modules/data/methods/*	Added async unique dataset ID generation, updated dataset creation to use user object, and adjusted related ingestion logic.
cognee/modules/engine/models/*	Introduced new ColumnValue model for representing column-level data.
cognee/modules/pipelines/models/, operations/	Added new pipeline run status, logging for run initiation, stricter dataset handling, and pipeline status filtering by name.
cognee/modules/pipelines/operations/run_tasks*.py	Extended task execution to support context propagation through pipelines.
cognee/modules/retrieval/*	Improved error handling in brute force search, removed unused exceptions, and updated test logic.
cognee/infrastructure/databases/graph/*	Added Memgraph support, adjusted node/edge type annotations, and logging levels.
cognee/infrastructure/databases/vector/*	Centralized collection retrieval, improved error handling for missing collections, removed redundant distance methods, standardized search limits and signatures, and fully async Weaviate adapter.
cognee/infrastructure/llm/*	Unified observability decorator usage, switched Anthropic client to async.
cognee/modules/visualization/*	Added color mapping for new node type.
cognee/tasks/ingestion/*	Updated dataset creation to use user object, added option to migrate column data as nodes in relational DB migration.
cognee/tasks/temporal_awareness/*	Adjusted graph data access patterns for new data structure.
examples/data/car_and_tech_companies.txt, examples/database_examples/*	Added new example scripts and data for various database integrations.
notebooks/*	Updated for new API usage, user context, formatting, and removed obsolete/legacy files.
cognee/tests/*	Added Memgraph integration test, updated test assertions for new graph structure, improved async test execution, removed obsolete or redundant tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API
    participant ResponsesRouter
    participant Dispatcher
    participant ToolFunction
    participant DB/Engine

    Client->>API: POST /api/v1/responses (input, tools)
    API->>ResponsesRouter: Validate and parse request
    ResponsesRouter->>OpenAI API: Call model with input/tools
    OpenAI API-->>ResponsesRouter: Response (may include function call)
    alt Function call present
        ResponsesRouter->>Dispatcher: dispatch_function(tool_call)
        Dispatcher->>ToolFunction: handle_search/cognify/prune(...)
        ToolFunction->>DB/Engine: Execute async operation
        DB/Engine-->>ToolFunction: Result
        ToolFunction-->>Dispatcher: Output/result
        Dispatcher-->>ResponsesRouter: Tool call output
    end
    ResponsesRouter-->>Client: Structured response (status, tool_calls, usage)

Possibly related PRs

topoteretes/cognee#501: Both PRs involve changes to pipeline run status handling and dataset ID usage in pipeline execution.
topoteretes/cognee#792: Both PRs add or modify the OpenAI-compatible responses API, including routers, models, and dispatch logic.
topoteretes/cognee#751: Both PRs add Memgraph support and related adapter integration for graph databases.

Suggested labels

run-checks

Suggested reviewers

Vasilije1990
borisarzentar

Poem

In a warren of code, the rabbits hopped,
Refactoring here, old errors stopped.
Pipelines now smarter, responses anew,
Graphs and vectors—support grew!
With context and colors, and tests that delight,
This release is a carrot—crunchy and bright!
🥕✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate Unit Tests

Create PR with Unit Tests
Commit Unit Tests in branch dev
Post Copyable Unit Tests in Comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai auto-generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gitguardian · 2025-05-19T11:38:50Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
17116131	Triggered	Generic Password	`3b07f3c`	examples/database_examples/neo4j_example.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

coderabbitai

Actionable comments posted: 42

🔭 Outside diff range comments (6)

cognee/exceptions/exceptions.py (1)
38-44: 🛠️ Refactor suggestion

Update child exception classes to use new parameters

The child exception classes (ServiceError, InvalidValueError, InvalidAttributeError, etc.) don't forward the new log and log_level parameters to the parent constructor, limiting their logging flexibility.
 def __init__(
     self,
     message: str = "Service is unavailable.",
     name: str = "ServiceError",
     status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+    log=True,
+    log_level="ERROR",
 ):
-    super().__init__(message, name, status_code)
+    super().__init__(message, name, status_code, log, log_level)
Apply similar updates to other child exception classes to ensure consistent behavior.
cognee/modules/observability/get_observe.py (1)
1-12: 🛠️ Refactor suggestion

Good centralization of observability decorator, but add handling for all Observer enum values

This centralized function for retrieving observability decorators helps eliminate duplicate conditional import logic across the codebase, which is a good practice.

However, the function only explicitly handles Observer.LANGFUSE without providing fallback behavior for other enum values like Observer.LLMLITE and Observer.LANGSMITH. Consider adding explicit handling for all possible values:
 from cognee.base_config import get_base_config
 from .observers import Observer
 
 
 def get_observe():
     monitoring = get_base_config().monitoring_tool
 
     if monitoring == Observer.LANGFUSE:
         from langfuse.decorators import observe
 
         return observe
+    elif monitoring == Observer.LLMLITE:
+        # Import and return LLMLITE observe decorator
+        # For example:
+        # from llmlite.decorators import observe
+        # return observe
+    elif monitoring == Observer.LANGSMITH:
+        # Import and return LANGSMITH observe decorator
+        # For example:
+        # from langsmith.decorators import observe
+        # return observe
+    else:
+        # Return a no-op decorator as fallback
+        def noop_observe(name=None, **kwargs):
+            def decorator(func):
+                return func
+            return decorator
+        return noop_observe
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
170-191: ⚠️ Potential issue

Update convertToSearchTypeOutput function for new search types

The convertToSearchTypeOutput function still handles "INSIGHTS", "SUMMARIES", and "CHUNKS" cases, but none of these match the current search options in the dropdown ("GRAPH_COMPLETION" and "RAG_COMPLETION").

The function needs to be updated to handle the new search types:
function convertToSearchTypeOutput(systemMessages: any[], searchType: string): string {
  if (systemMessages.length > 0 && typeof(systemMessages[0]) === "string") {
    return systemMessages[0];
  }

  switch (searchType) {
-   case 'INSIGHTS':
+   case 'GRAPH_COMPLETION':
      return systemMessages.map((message: InsightMessage) => {
        const [node1, relationship, node2] = message;
        if (node1.name && node2.name) {
          return `${node1.name} ${relationship.relationship_name} ${node2.name}.`;
        }
        return '';
      }).join('\n');
-   case 'SUMMARIES':
+   case 'RAG_COMPLETION':
      return systemMessages.map((message: { text: string }) => message.text).join('\n');
    case 'CHUNKS':
      return systemMessages.map((message: { text: string }) => message.text).join('\n');
    default:
      return "";
  }
}
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

267-279: ⚠️ Potential issue

Mixed identifier types in edge-removal helpers

The public API now demands list[UUID], but deeper in the code self.graph.has_edge() is called with the edge_label as key (OK) while the stored edge endpoints were converted to str in add_edges(). This mismatch silently leaves “string” nodes dangling.

Either:

Stop casting UUIDs to str in add_edges, or

Cast incoming node_id back to str here.

Failing to normalise will corrupt the graph.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)
85-103: ⚠️ Potential issue

Collection may not exist – create_data_points should ensure it

create_data_points fetches the collection first:
collection = await self.get_collection(collection_name)
If the collection is missing, a CollectionNotFoundError is raised.
All external callers (e.g. index_data_points) rely on this method and do not create the collection beforehand, leading to runtime errors.

Add an existence check or simply call await self.create_collection(collection_name) at the start:
+await self.create_collection(collection_name)
 collection = await self.get_collection(collection_name)
90-98: 🛠️ Refactor suggestion

Inefficient index() lookup turns O(n²)

Inside convert_to_weaviate_data_points:
vector = data_vectors[data_points.index(data_point)]
list.index is O(n). For large batches this becomes quadratic.
Iterate with enumerate instead:
-data_points = [convert_to_weaviate_data_points(dp) for dp in data_points]
+data_points = [
+    DataObject(
+        uuid=dp.id,
+        properties={**dp.model_dump(), "uuid": str(dp.id), "id": None},
+        vector=vec,
+    )
+    for dp, vec in zip(data_points, data_vectors)
+]

♻️ Duplicate comments (2)

cognee/infrastructure/databases/graph/networkx/adapter.py (1)

624-636: Interface-mismatch recurrence

get_node(s) repeats the UUID parameter divergence noted earlier. Please align with whichever identifier strategy you settle on.

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)

81-98: Dynamic label & property placeholders render invalid Cypher

ON CREATE SET n:node.label (and the similar ON MATCH) tries to use node.label as a literal label. This suffers from the same problem as above and will break batch insertion.

Additionally, because you overwrite the list nodes later (nodes = [ {...} for node in nodes ]) the outer reference is lost – consider renaming the temporary variable to avoid confusion.

🧹 Nitpick comments (60)

cognee/modules/engine/models/__init__.py (1)
6-6: Consider addressing the unused import linter warning.

The import for ColumnValue is correctly added to make it part of the package's public API, but it's flagged as unused by the static analysis tool.

To silence the linter warning while maintaining intended behavior, consider one of these approaches:
- from .ColumnValue import ColumnValue
+ from .ColumnValue import ColumnValue  # noqa: F401 - Exported as part of public API
Or define an __all__ list:
from .Entity import Entity
from .EntityType import EntityType
from .TableRow import TableRow
from .TableType import TableType
from .node_set import NodeSet
from .ColumnValue import ColumnValue

+ __all__ = ["Entity", "EntityType", "TableRow", "TableType", "NodeSet", "ColumnValue"]
🧰 Tools

🪛 Ruff (0.11.9)

6-6: .ColumnValue.ColumnValue imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
cognee/modules/pipelines/operations/__init__.py (1)
1-1: Consider addressing the unused import linter warning.

The import for log_pipeline_run_initiated makes this function part of the package's public API, consistent with other similar functions, but it's flagged as unused by the static analysis tool.

To silence the linter warning while maintaining intended behavior, consider one of these approaches:
- from .log_pipeline_run_initiated import log_pipeline_run_initiated
+ from .log_pipeline_run_initiated import log_pipeline_run_initiated  # noqa: F401 - Exported as part of public API
Or define an __all__ list:
from .log_pipeline_run_initiated import log_pipeline_run_initiated
from .log_pipeline_run_start import log_pipeline_run_start
from .log_pipeline_run_complete import log_pipeline_run_complete
from .log_pipeline_run_error import log_pipeline_run_error
from .pipeline import cognee_pipeline

+ __all__ = [
+     "log_pipeline_run_initiated",
+     "log_pipeline_run_start",
+     "log_pipeline_run_complete",
+     "log_pipeline_run_error",
+     "cognee_pipeline"
+ ]
🧰 Tools

🪛 Ruff (0.11.9)

1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
cognee/modules/data/methods/__init__.py (1)
10-10: Add imported function to __all__ list

You've imported get_unique_dataset_id to make it part of the package's public API, which is good for module organization. However, the static analyzer has flagged this as potentially unused. To explicitly mark it as part of the public API and silence the linter warning, consider adding an __all__ list to the file.
# Create
from .create_dataset import create_dataset

# Get
from .get_dataset import get_dataset
from .get_datasets import get_datasets
from .get_datasets_by_name import get_datasets_by_name
from .get_dataset_data import get_dataset_data
from .get_data import get_data
from .get_unique_dataset_id import get_unique_dataset_id

+__all__ = [
+    "create_dataset",
+    "get_dataset",
+    "get_datasets",
+    "get_datasets_by_name",
+    "get_dataset_data",
+    "get_data",
+    "get_unique_dataset_id",
+    "delete_dataset",
+    "delete_data",
+]

# Delete
from .delete_dataset import delete_dataset
from .delete_data import delete_data
🧰 Tools

🪛 Ruff (0.11.9)

10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
cognee/api/v1/add/add.py (1)

2-6: Reordered imports

The import statements have been reordered without changing functionality. While this is fine, consider following a consistent import ordering convention across the codebase (e.g., standard library imports first, third-party imports second, local imports third) and documenting this in your coding guidelines.
cognee-frontend/src/utils/fetch.ts (1)
4-4: Using localhost instead of IP address is more readable

Changing from '127.0.0.1' to 'localhost' improves readability while maintaining the same functionality.

Consider using an environment variable for the API base URL instead of hardcoding it, which would make it easier to configure for different environments (development, testing, production):
-  return global.fetch('http://localhost:8000/api' + url, {
+  const baseUrl = process.env.NEXT_PUBLIC_API_BASE_URL || 'http://localhost:8000/api';
+  return global.fetch(baseUrl + url, {
cognee/infrastructure/databases/graph/graph_db_interface.py (1)
88-88: Consider adding explicit error handling for commit failures

While logging level has been changed to debug, note that unlike the previous error blocks, there's no explicit handling of commit failures (such as retry logic or transaction management). Consider whether additional error handling would be appropriate here.
            try:
                await session.commit()
            except Exception as e:
                logger.debug(f"Error committing session: {e}")
+               # Consider adding retry logic or additional error handling here
+               # await session.rollback()  # Ensure clean state if needed
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
19-28: Consider handling empty dataset name

The code doesn't check if dataset.name is empty before calling getExplorationGraphUrl. If an empty name is provided, this could result in unexpected errors.
  const exploreData = useCallback(() => {
+   if (!dataset.name) {
+     setError(new Error('Dataset name is required'));
+     return;
+   }
    getExplorationGraphUrl(dataset)
      .then((graphHtml) => {
        setError(null);
        setGraphHtml(graphHtml);
      })
      .catch((error) => {
        setError(error);
      });
  }, [dataset]);
cognee/api/v1/config/config.py (1)
158-158: Inconsistent exception handling pattern in config methods

While removing the message= keyword argument makes this line use standard Python exception syntax, it creates inconsistency with other similar methods in this file. Other methods like set_llm_config, set_relational_db_config, etc. use InvalidAttributeError with a message parameter.

Consider standardizing the exception handling approach across all similar methods either by:
-raise AttributeError(f"'{key}' is not a valid attribute of the config.")
+raise InvalidAttributeError(message=f"'{key}' is not a valid attribute of the config.")
Or updating all other methods to use standard exceptions without keyword arguments for consistency.
cognee/modules/data/methods/get_unique_dataset_id.py (1)
5-6: Consider removing async keyword for synchronous function

This function is marked as async but contains no await statements. Since UUID generation is a synchronous operation, the async keyword is unnecessary and may mislead callers into using await with this function.
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
    return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
cognee/base_config.py (1)
11-11: Consider using a more specific type annotation

The type annotation for monitoring_tool is object, which is very general. Consider using Observer as the type to provide better type checking and code completion.
-    monitoring_tool: object = Observer.LANGFUSE
+    monitoring_tool: Observer = Observer.LANGFUSE
cognee/modules/observability/observers.py (1)
1-9: Consider adding enum validation method

For added robustness, you might want to include a static method to validate string values against the enum. This would be helpful when processing configuration from external sources.
class Observer(str, Enum):
    """Monitoring tools"""

    LANGFUSE = "langfuse"
    LLMLITE = "llmlite"
    LANGSMITH = "langsmith"
+
+    @classmethod
+    def is_valid(cls, value: str) -> bool:
+        """Check if a string value is a valid observer type"""
+        return value in [item.value for item in cls]
cognee/modules/engine/models/ColumnValue.py (1)
5-7: Consider adding field type annotations and documentation

The string fields would benefit from more detailed type annotations like Field() with descriptions to better document their purpose.
class ColumnValue(DataPoint):
-    name: str
-    description: str
-    properties: str
+    name: str = Field(description="The name of the column")
+    description: str = Field(description="Description of the column's purpose")
+    properties: str = Field(description="The properties/values of the column used for indexing")
cognee/tests/test_relational_db_migration.py (1)
115-193: Consider making expected counts more maintainable

With multiple hard-coded count values throughout the test, future schema changes might require updates in multiple places. Consider defining these as constants at the top of the file to improve maintainability.
import json
import pathlib
import os
from cognee.infrastructure.databases.graph import get_graph_engine
from cognee.infrastructure.databases.relational import (
    get_migration_relational_engine,
    create_db_and_tables as create_relational_db_and_tables,
)
from cognee.infrastructure.databases.vector.pgvector import (
    create_db_and_tables as create_pgvector_db_and_tables,
)
from cognee.tasks.ingestion import migrate_relational_database
from cognee.modules.search.types import SearchType
import cognee

+# Expected counts for graph elements
+EXPECTED_DISTINCT_NODE_COUNT = 12
+EXPECTED_DISTINCT_EDGE_COUNT = 15
+
+# Provider-specific expected counts
+SQLITE_EXPECTED_NODE_COUNT = 543
+SQLITE_EXPECTED_EDGE_COUNT = 1317
+POSTGRES_EXPECTED_NODE_COUNT = 522
+POSTGRES_EXPECTED_EDGE_COUNT = 961
Then use these constants in the assertions.
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2)
108-110: Good error handling enhancement

Adding error handling to restore the input text if the fetch request fails is a great UX improvement, ensuring users don't lose their message on network failures.

Consider enhancing this further with visual feedback to the user when a fetch error occurs:
.catch(() => {
  setInputValue(inputValue);
+ // Add a toast/notification to inform the user about the error
+ // e.g., showErrorNotification("Failed to send message. Please try again.");
});
149-152: Fix typo in Stack alignment attribute

There's a typo in the Stack component's align prop: align="end/" should likely be align="end".
-<Stack orientation="horizontal" align="end/" gap="2">
+<Stack orientation="horizontal" align="end" gap="2">
alembic/versions/1d0bb7fede17_add_pipeline_run_status.py (1)
13-14: Remove unused imports

The static analysis tool correctly identified that PipelineRun and PipelineRunStatus are imported but not used in this file.
- from cognee.modules.pipelines.models.PipelineRun import PipelineRun, PipelineRunStatus
+ from cognee.modules.pipelines.models.PipelineRun import PipelineRunStatus
Or remove both imports if PipelineRunStatus isn't used either:
- from cognee.modules.pipelines.models.PipelineRun import PipelineRun, PipelineRunStatus
🧰 Tools

🪛 Ruff (0.11.9)

13-13: cognee.modules.pipelines.models.PipelineRun.PipelineRun imported but unused

Remove unused import

(F401)

13-13: cognee.modules.pipelines.models.PipelineRun.PipelineRunStatus imported but unused

Remove unused import

(F401)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
6-22: Clean implementation of pipeline run initiation logging.

This function effectively creates and persists a new pipeline run record with the DATASET_PROCESSING_INITIATED status. The implementation follows good practices:

Properly typed parameters

Appropriate UUID generation

Correct use of async database session

Clean error handling with proper session commit

One improvement suggestion would be to add docstrings to clearly document the function's purpose, parameters, and return value.
 async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID):
+    """
+    Create and persist a new pipeline run record with DATASET_PROCESSING_INITIATED status.
+    
+    Args:
+        pipeline_id: Unique identifier for the pipeline
+        pipeline_name: Name of the pipeline to run
+        dataset_id: UUID of the dataset being processed
+        
+    Returns:
+        PipelineRun: The created pipeline run record
+    """
     pipeline_run = PipelineRun(
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)
37-39: Add explanation for the expected result calculation

The hardcoded expected result (4586471424) is not immediately obvious. Consider adding a comment explaining how this value is calculated from the pipeline tasks.
-    final_result = 4586471424
+    # Expected calculation: ((5 + 7) * 2)^7 = (12 * 2)^7 = 24^7 = 4586471424
+    final_result = 4586471424
examples/database_examples/kuzu_example.py (1)
1-6: Remove unused import

The os module is imported but not used in the code.
-import os
 import pathlib
 import asyncio
 import cognee
 from cognee.modules.search.types import SearchType
🧰 Tools

🪛 Ruff (0.11.9)

1-1: os imported but unused

Remove unused import: os

(F401)
cognee/tasks/temporal_awareness/index_graphiti_objects.py (1)
63-66: Consider more robust edge data access

Accessing the relationship name at a fixed index position (edge[2]) makes the code brittle to changes in the underlying data structure. Consider using a more descriptive approach to access this data.
-    edge_types = Counter(
-        edge[2]  # The edge key (relationship name) is at index 2
-        for edge in edges_data
-    )
+    # Access relationship name (at index 2) from edge data
+    edge_types = Counter(
+        relationship_name  # More descriptive variable name
+        for _, _, relationship_name, *_ in edges_data
+    )
examples/database_examples/falkordb_example.py (3)
1-1: Remove unused import

The os module is imported but not used in the code. Consider removing this import to maintain clean dependencies.
-import os
 import pathlib
 import asyncio
 import cognee
 from cognee.modules.search.types import SearchType
🧰 Tools

🪛 Ruff (0.11.9)

1-1: os imported but unused

Remove unused import: os

(F401)

20-30: Add more guidance on directory configuration

The example sets up data directories relative to the script location. Consider adding more explanation about how users should adapt these paths for their own environment, especially for production use cases.
 # Set up data directories for storing documents and system files
-# You should adjust these paths to your needs
+# NOTE: These paths are relative to the example script location.
+# For production use, you should:
+# - Use absolute paths
+# - Ensure the directories are persistent and have appropriate permissions
+# - Consider environment-specific configurations
 current_dir = pathlib.Path(__file__).parent
 data_directory_path = str(current_dir / "data_storage")
 cognee.config.data_root_directory(data_directory_path)
83-85: Clarify cleanup comment

The commented-out cleanup code might confuse users. Consider adding a note explaining when it would be appropriate to uncomment and use these lines.
 # Clean up (optional)
+# Uncomment the following lines if you want to remove all data after running the example
+# Note: This will delete all datasets and system data created by Cognee
 # await cognee.prune.prune_data()
 # await cognee.prune.prune_system(metadata=True)
cognee/tasks/ingestion/migrate_relational_database.py (1)
111-117: Improve ColumnValue node properties for better searchability

The current properties field is a simple space-separated string, which might not be optimal for semantic searches. Consider using a more structured format like JSON for better searchability and clarity.
 column_node = ColumnValue(
     id=uuid5(NAMESPACE_OID, name=column_node_id),
     name=column_node_id,
-    properties=f"{key} {value} {table_name}",
+    properties=f"{{\"column\": \"{key}\", \"value\": \"{value}\", \"table\": \"{table_name}\"}}",
-    description=f"Column name={key} and value={value} from column from table={table_name}",
+    description=f"Column '{key}' with value '{value}' from table '{table_name}'",
 )
examples/database_examples/milvus_example.py (3)
29-29: Use consistent pathlib approach instead of os.path

Since you're already using pathlib for directory path management, consider using it consistently throughout the code rather than mixing with os.path.
-local_milvus_db_path = os.path.join(cognee_directory_path, "databases", "milvus.db")
+local_milvus_db_path = str(pathlib.Path(cognee_directory_path) / "databases" / "milvus.db")
46-53: Enhance sample text with Cognee integration details

The sample text describes Milvus but doesn't mention how it integrates with Cognee specifically. Consider adding information about the integration to make the example more informative.
 # Add sample text to the dataset
 sample_text = """Milvus is an open-source vector database built to power AI applications.
 It is designed for storing, indexing, and querying large-scale vector datasets.
 Milvus implements efficient approximate nearest neighbor search algorithms.
 It features advanced indexing techniques like HNSW, IVF, PQ, and more.
 Milvus supports hybrid searches combining vector similarity with scalar filtering.
-The system can be deployed standalone, in clusters, or through a cloud service."""
+The system can be deployed standalone, in clusters, or through a cloud service.
+When integrated with Cognee, Milvus provides fast vector similarity search capabilities
+that enable semantic search, knowledge retrieval, and AI-powered insights generation."""
83-85: Clarify cleanup comment

The commented-out cleanup code might confuse users. Consider adding a note explaining when it would be appropriate to uncomment and use these lines.
 # Clean up (optional)
+# Uncomment the following lines if you want to remove all data after running the example
+# Note: This will delete all datasets and system data created by Cognee
 # await cognee.prune.prune_data()
 # await cognee.prune.prune_system(metadata=True)
notebooks/cognee_openai_compatable_demo.ipynb (3)
28-31: Fix URL redirection by using the correct endpoint.

The log shows a 307 redirect from /api/v1/responses to /api/v1/responses/ (with trailing slash). Using the correct URL directly would save an HTTP request.
-client = OpenAI(api_key="COGNEE_API_KEY", base_url="http://localhost:8000/api/v1/")
+client = OpenAI(api_key="COGNEE_API_KEY", base_url="http://localhost:8000/api/v1/")
 
 client.responses.create(
     model="cognee-v1",
     input="Cognify: Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval.",
 )
Note: While the URL in the client initialization is correct, the OpenAI client is still making the request without a trailing slash. This appears to be an internal implementation detail of the OpenAI client rather than an issue with your code.

1-109: Add markdown cells to improve notebook documentation.

The notebook lacks descriptive markdown cells that would help users understand what each cell is demonstrating. Adding markdown cells between code cells would significantly improve clarity.

Consider adding:

An introductory markdown cell at the beginning explaining the purpose of the notebook

A markdown cell before each code example explaining what it demonstrates

A markdown cell after each output explaining the structure of the response

Example structure:
# Cognee OpenAI-Compatible API Demo

This notebook demonstrates how to use Cognee's OpenAI-compatible API to perform various operations.

## Setup and Cognify Example

The following cell demonstrates how to initialize an OpenAI client pointed at a Cognee server and run a cognify operation.

[Code cell 1]

## Search Example

The following cell demonstrates how to perform a search operation using the same API interface.

[Code cell 2]
55-62: Extract client initialization to avoid code duplication.

The OpenAI client initialization is duplicated in both cells. Consider extracting this to a reusable cell at the beginning of the notebook.

You could create a new first cell that initializes the client, then use that client instance in subsequent cells:
import os
from openai import OpenAI

# Get API key from environment variable with a fallback for demo purposes
api_key = os.environ.get("COGNEE_API_KEY", "demo_api_key")

# Initialize client that will be reused throughout the notebook
client = OpenAI(api_key=api_key, base_url="http://localhost:8000/api/v1/")
Then in subsequent cells, you would just use the existing client variable instead of reinitializing it.
examples/database_examples/chromadb_example.py (2)
1-1: Remove unused import.

The os module is imported but never used in this example. Consider removing it for cleaner code.
-import os
 import pathlib
 import asyncio
 import cognee
🧰 Tools

🪛 Ruff (0.11.9)

1-1: os imported but unused

Remove unused import: os

(F401)

19-26: Add configuration flexibility for different environments.

The ChromaDB URL is hardcoded which limits configuration flexibility. Consider adding a comment explaining how to adapt this for different environments.
 # Configure ChromaDB as the vector database provider
 cognee.config.set_vector_db_config(
     {
-        "vector_db_url": "http://localhost:8000",  # Default ChromaDB server URL
+        "vector_db_url": "http://localhost:8000",  # Default local ChromaDB server URL
         "vector_db_key": "",  # ChromaDB doesn't require an API key by default
         "vector_db_provider": "chromadb",  # Specify ChromaDB as provider
     }
 )
+
+# Note: For production environments, you might want to:
+# 1. Load the URL from environment variables
+# 2. Use a different port or hostname
+# 3. Add authentication if using a managed ChromaDB instance
cognee/modules/pipelines/operations/run_tasks.py (2)
23-25: Update docstring to document the new context parameter.

You've added a new context parameter, but there's no docstring explaining its purpose, expected structure, or usage. This would be helpful for developers using this API.
 async def run_tasks_with_telemetry(
     tasks: list[Task], data, user: User, pipeline_name: str, context: dict = None
 ):
+    """
+    Run a list of tasks with telemetry tracking.
+    
+    Args:
+        tasks: List of Task objects to execute
+        data: The data to process
+        user: The user running the tasks
+        pipeline_name: Name of the pipeline for telemetry and logging
+        context: Optional dictionary containing contextual information to be passed to tasks
+               that support receiving context
+    
+    Yields:
+        Results from the executed tasks
+    """
     config = get_current_settings()
71-78: Update docstring to document the new context parameter.

Similar to the run_tasks_with_telemetry function, this function also needs a docstring update to include information about the new context parameter.
 async def run_tasks(
     tasks: list[Task],
     dataset_id: UUID = uuid4(),
     data: Any = None,
     user: User = None,
     pipeline_name: str = "unknown_pipeline",
     context: dict = None,
 ):
+    """
+    Run a list of tasks with pipeline run logging.
+    
+    Args:
+        tasks: List of Task objects to execute
+        dataset_id: UUID for the dataset being processed
+        data: The data to process
+        user: The user running the tasks (defaults to the system default user if None)
+        pipeline_name: Name of the pipeline for logging and identification
+        context: Optional dictionary containing contextual information to be passed to tasks
+               that support receiving context
+    
+    Yields:
+        Pipeline run status objects
+    """
     pipeline_id = uuid5(NAMESPACE_OID, pipeline_name)
🧰 Tools

🪛 Ruff (0.11.9)

73-73: Do not perform function call uuid4 in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)
examples/database_examples/pgvector_example.py (1)
1-1: Unused import.

The 'os' module is imported but not used in this file. Unlike the Qdrant example which uses os.getenv(), this example uses hardcoded credentials.
-import os
import pathlib
import asyncio
import cognee
from cognee.modules.search.types import SearchType
🧰 Tools

🪛 Ruff (0.11.9)

1-1: os imported but unused

Remove unused import: os

(F401)
cognee/api/v1/responses/routers/get_responses_router.py (2)
36-42: Client can be reused instead of re-instantiated per request
openai.AsyncOpenAI creation is cheap but not free. Re-creating it on every request causes unnecessary overhead and UDP socket exhaustion under load. Cache it at module-level or store it in app.state.
-    def _get_model_client():
+    _client: Optional[openai.AsyncOpenAI] = None
+
+    def _get_model_client():
         """
         Get appropriate client based on model name
         """
-        llm_config = get_llm_config()
-        return openai.AsyncOpenAI(api_key=llm_config.llm_api_key)
+        nonlocal _client
+        if _client is None:
+            llm_config = get_llm_config()
+            _client = openai.AsyncOpenAI(api_key=llm_config.llm_api_key)
+        return _client
72-75: Depends(...) false-positive from Ruff B008 – suppress or refactor
FastAPI relies on Depends as a sentinel object, not a function call. Add # noqa: B008 or configure Ruff to ignore FastAPI-specific patterns to keep CI green.

🧰 Tools

🪛 Ruff (0.11.9)

74-74: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)
examples/database_examples/weaviate_example.py (1)

42-45: Bulk pruning in examples – add a disclaimer
await cognee.prune.prune_data() and prune_system(metadata=True) irreversibly delete user data. Consider adding a loud comment or runtime prompt so newcomers don’t copy-paste this into production by accident.
cognee/tests/test_memgraph.py (1)
35-37: Relative path may break when tests are executed from project root
explanation_file_path is built via os.path.join(pathlib.Path(__file__).parent, "test_data/…"), but test_data is a directory, not a file. Use Path / .joinpath() and call .resolve() to ensure cross-platform correctness:
explanation_file_path = (
    pathlib.Path(__file__).parent
    / "test_data"
    / "Natural_language_processing.txt"
).resolve()
cognee/modules/pipelines/operations/run_tasks_base.py (1)
35-37: Simplify conditional context appending logic

The current approach of checking a condition and then appending is straightforward but could be more concise.
-    if has_context:
-        args.append(context)
+    args.extend([context] if has_context else [])
Alternatively, for even more clarity, you could use a guard clause approach:
if has_context and context is not None:
    args.append(context)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
66-73: Use contextlib.suppress for cleaner error handling

The current try-except-pass pattern can be simplified using contextlib.suppress.
+from contextlib import suppress

# Later in the code:
-    try:
-        await memory_fragment.project_graph_from_db(
-            graph_engine,
-            node_properties_to_project=properties_to_project,
-            edge_properties_to_project=["relationship_name"],
-        )
-    except EntityNotFoundError:
-        pass
+    with suppress(EntityNotFoundError):
+        await memory_fragment.project_graph_from_db(
+            graph_engine,
+            node_properties_to_project=properties_to_project,
+            edge_properties_to_project=["relationship_name"],
+        )
🧰 Tools

🪛 Ruff (0.11.9)

66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(EntityNotFoundError)

(SIM105)
examples/data/car_and_tech_companies.txt (1)
17-17: Fix grammatical error in text description

There's a grammatical error in the text - the plural determiner "these" doesn't agree with the singular noun "manufacturer".
-Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence.
+Each of these car manufacturers contributes to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence.
🧰 Tools

🪛 LanguageTool

[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)

[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
Dockerfile (2)
8-8: Consider enabling bytecode compilation for production

The commented ENV for bytecode compilation could improve runtime performance. Consider enabling it for production builds.
-# ENV UV_COMPILE_BYTECODE=1
+ENV UV_COMPILE_BYTECODE=1
50-52: Clean up commented lines

These lines appear to be from the transition to the current build process and are no longer needed.
-# COPY --from=uv /app/.venv /app/.venv
-# COPY --from=uv /root/.local /root/.local
cognee/api/v1/responses/dispatch_function.py (1)
58-68: Consider using Enum validation directly

Instead of validating against string lists, consider using the SearchType enum's built-in validation capabilities.
-    valid_search_types = (
-        search_tool["parameters"]["properties"]["search_type"]["enum"]
-        if search_tool
-        else ["INSIGHTS", "CODE", "GRAPH_COMPLETION", "SEMANTIC", "NATURAL_LANGUAGE"]
-    )
-
-    if search_type_str not in valid_search_types:
-        logger.warning(f"Invalid search_type: {search_type_str}, defaulting to GRAPH_COMPLETION")
-        search_type_str = "GRAPH_COMPLETION"
-
-    query_type = SearchType[search_type_str]
+    try:
+        query_type = SearchType[search_type_str]
+    except (KeyError, ValueError):
+        logger.warning(f"Invalid search_type: {search_type_str}, defaulting to GRAPH_COMPLETION")
+        query_type = SearchType.GRAPH_COMPLETION
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)
181-196: Ensure client is closed in all execution paths

While there's a finally block that closes the client, it appears there's also a separate client close in the happy path. This could be redundant or confusing.
 try:
     client = self.get_qdrant_client()
 
     results = await client.search(
         collection_name=collection_name,
         query_vector=models.NamedVector(
             name="text",
             vector=query_vector
             if query_vector is not None
             else (await self.embed_data([query_text]))[0],
         ),
         limit=limit if limit > 0 else None,
         with_vectors=with_vector,
     )
 
-    await client.close()
 
     return [
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
241-252: I/O efficiency & with_vector flag are ignored

You call closest_items.all() inside the loop (241) — this runs the query result materialisation on every iteration. Move it outside:
-        for vector in closest_items.all():
+        for vector in closest_items:
Since session.execute() already returns a Result, you can iterate directly.

The with_vector parameter is accepted by the signature but never honoured. Either forward it in the select() (i.e. add PGVectorDataPoint.c.vector when with_vector is True) or drop the argument to avoid API confusion.
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (2)

176-183: Early exit for limit<=0 is good – redundant None branch can be removed
Because you return early, the later ternary (limit if limit > 0 else None) is unreachable. Trim it to keep intent crystal-clear.

220-232: Consolidate duplicate “collection not found” handling
You handle missing collections in three separate blocks (187-192, 219-223, 226-230). This bloats the method and risks divergence. Refactor to:

A single existence check at the top (if not has_collection: return [])

A single except CollectionNotExistException handler.

This keeps the happy-path tight and the error path obvious.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
218-230: Potential O(N²) duplication in neighbour aggregation

predecessors + successors may contain the same neighbour twice when a node has both in- and out-edges to the target. Consider de-duplicating:
return list({n["id"]: n for n in (predecessors + successors)}.values())
cognee-mcp/src/server.py (2)
162-166: Exception chaining improves debuggability

Static analysis (B904) flags re-raising bare exceptions. Add context while preserving the traceback:
-            except Exception as e:
-                logger.error("Cognify process failed.")
-                raise ValueError(f"Failed to cognify: {str(e)}")
+            except Exception as e:
+                logger.error("Cognify process failed.")
+                raise ValueError(f"Failed to cognify: {e}") from e
🧰 Tools

🪛 Ruff (0.11.9)

166-166: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

221-235: Background task error leaks are swallowed

codify_task logs failures but the outer create_task ignores the task handle, so unhandled exceptions will be dumped to asyncio default handler and lost for the caller. Consider storing the task and adding:
task.add_done_callback(lambda t: logger.error(t.exception()) if t.exception() else None)
or gathering tasks inside a supervisor coroutine.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
151-166: limit=0 repurposed as “return all” but negative / None are unchecked

Edge-cases:

Passing limit=None (allowed by batch_search) bubbles into the query unchanged.

Negative limits are not rejected.

Add validation:
if limit is None or limit <= 0:
    limit = await collection.count_rows()
205-211: Serial deletes are O(N) round-trips

Deleting one ID at a time will hammer the DB for large lists. LanceDB supports SQL-like predicates; you can delete in one shot:
-        for data_point_id in data_point_ids:
-            await collection.delete(f"id = '{data_point_id}'")
+        ids_tuple = tuple(data_point_ids)
+        await collection.delete(f"id IN {ids_tuple}")
Huge latency reduction and atomicity.
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (2)
430-445: Mutable default argument violates best-practice

def serialize_properties(self, properties=dict()): creates a single shared dict across calls.
-def serialize_properties(self, properties=dict()):
+def serialize_properties(self, properties: dict | None = None):
     serialized_properties = {}
-    for property_key, property_value in properties.items():
+    for property_key, property_value in (properties or {}).items():
for the implementation logic once the default is fixed

🧰 Tools

🪛 Ruff (0.11.9)

430-430: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

594-604: Hard-coded labels ‘Node’ & ‘EDGE’ break on heterogeneous graphs

Metric queries assume every vertex has label Node and every relationship EDGE, which contradicts earlier dynamic labelling (type(node).__name__ / arbitrary relationship_name). This will lead to 0-row results and inaccurate metrics.

Consider:

Removing label filters altogether.

Dynamically building label strings via CALL db.labels() / CALL db.relationshipTypes() or using the helper methods you already wrote (get_node_labels_string, get_relationship_labels_string).
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
128-133: Race-condition possibility – ensure collection actually exists

create_data_points calls create_collection and immediately calls get_collection.
If two coroutines create the same collection concurrently the second call may still raise CollectionNotFoundError. Consider retrying once or making create_collection idempotent (it currently is) but wait for collection creation to finish before proceeding.

317-321: Guard against mixed return types more robustly

list_collections handles two variants (object.name or dict["name"]). Unexpected types will raise AttributeError/KeyError.
-return [
-    collection.name if hasattr(collection, "name") else collection["name"]
-    for collection in collections
-]
+names = []
+for coll in collections:
+    try:
+        names.append(coll.name)
+    except AttributeError:
+        names.append(coll["name"])
+return names
Not critical, but prevents hidden errors when the Chroma client changes its return type again.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)

37-47: Repeated await client.connect() incurs overhead

get_client calls await self.client.connect() on every invocation. The underlying SDK is idempotent but performs an HTTP round-trip.
Cache a self._connected flag or call connect() once in __init__ to avoid unnecessary latency.
cognee/api/v1/responses/routers/default_tools.py (1)
38-59: Cognify tool definition is complete but could use required parameters.

The cognify tool definition is properly structured with appropriate parameter types and descriptions. However, consider specifying which parameters, if any, should be required, similar to how "search_query" is required for the search tool.
            },
+           "required": [],
        },

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1b4ae3 and f8f7877.

⛔ Files ignored due to path filters (11)

.github/actions/cognee_setup/action.yml is excluded by !**/*.yml
.github/workflows/db_examples_tests.yml is excluded by !**/*.yml
.github/workflows/python_version_tests.yml is excluded by !**/*.yml
.github/workflows/test_memgraph.yml is excluded by !**/*.yml
.github/workflows/test_suites.yml is excluded by !**/*.yml
assets/graph_visualization.png is excluded by !**/*.png, !**/*.png
cognee-mcp/pyproject.toml is excluded by !**/*.toml
cognee-mcp/uv.lock is excluded by !**/*.lock, !**/*.lock
poetry.lock is excluded by !**/*.lock, !**/*.lock
pyproject.toml is excluded by !**/*.toml
uv.lock is excluded by !**/*.lock, !**/*.lock

📒 Files selected for processing (104)

CONTRIBUTING.md (1 hunks)
Dockerfile (1 hunks)
README.md (1 hunks)
alembic/versions/1d0bb7fede17_add_pipeline_run_status.py (1 hunks)
alembic/versions/482cd6517ce4_add_default_user.py (1 hunks)
assets/graph_visualization.html (0 hunks)
cognee-frontend/src/app/page.tsx (3 hunks)
cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1 hunks)
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1 hunks)
cognee-frontend/src/app/wizard/WizardPage.tsx (1 hunks)
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1 hunks)
cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1 hunks)
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (4 hunks)
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1 hunks)
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2 hunks)
cognee-frontend/src/utils/fetch.ts (1 hunks)
cognee-mcp/src/server.py (3 hunks)
cognee/api/client.py (2 hunks)
cognee/api/v1/add/add.py (1 hunks)
cognee/api/v1/cognify/code_graph_pipeline.py (3 hunks)
cognee/api/v1/cognify/cognify.py (1 hunks)
cognee/api/v1/config/config.py (1 hunks)
cognee/api/v1/datasets/datasets.py (1 hunks)
cognee/api/v1/responses/__init__.py (1 hunks)
cognee/api/v1/responses/default_tools.py (1 hunks)
cognee/api/v1/responses/dispatch_function.py (1 hunks)
cognee/api/v1/responses/models.py (1 hunks)
cognee/api/v1/responses/routers/__init__.py (1 hunks)
cognee/api/v1/responses/routers/default_tools.py (1 hunks)
cognee/api/v1/responses/routers/get_responses_router.py (1 hunks)
cognee/base_config.py (1 hunks)
cognee/exceptions/exceptions.py (1 hunks)
cognee/infrastructure/databases/graph/get_graph_engine.py (1 hunks)
cognee/infrastructure/databases/graph/graph_db_interface.py (2 hunks)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1 hunks)
cognee/infrastructure/databases/graph/networkx/adapter.py (7 hunks)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2 hunks)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (8 hunks)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1 hunks)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (8 hunks)
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (9 hunks)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (5 hunks)
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (6 hunks)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (9 hunks)
cognee/infrastructure/llm/anthropic/adapter.py (1 hunks)
cognee/infrastructure/llm/gemini/adapter.py (1 hunks)
cognee/infrastructure/llm/openai/adapter.py (2 hunks)
cognee/modules/data/methods/__init__.py (1 hunks)
cognee/modules/data/methods/create_dataset.py (2 hunks)
cognee/modules/data/methods/get_unique_dataset_id.py (1 hunks)
cognee/modules/engine/models/ColumnValue.py (1 hunks)
cognee/modules/engine/models/__init__.py (1 hunks)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
cognee/modules/observability/get_observe.py (1 hunks)
cognee/modules/observability/observers.py (1 hunks)
cognee/modules/pipelines/models/PipelineRun.py (1 hunks)
cognee/modules/pipelines/operations/__init__.py (1 hunks)
cognee/modules/pipelines/operations/get_pipeline_status.py (2 hunks)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1 hunks)
cognee/modules/pipelines/operations/pipeline.py (3 hunks)
cognee/modules/pipelines/operations/run_tasks.py (4 hunks)
cognee/modules/pipelines/operations/run_tasks_base.py (4 hunks)
cognee/modules/retrieval/exceptions/__init__.py (1 hunks)
cognee/modules/retrieval/exceptions/exceptions.py (0 hunks)
cognee/modules/retrieval/graph_completion_retriever.py (1 hunks)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (4 hunks)
cognee/modules/settings/get_settings.py (2 hunks)
cognee/modules/visualization/cognee_network_visualization.py (1 hunks)
cognee/shared/data_models.py (0 hunks)
cognee/shared/logging_utils.py (1 hunks)
cognee/tasks/ingestion/ingest_data.py (1 hunks)
cognee/tasks/ingestion/migrate_relational_database.py (2 hunks)
cognee/tasks/temporal_awareness/index_graphiti_objects.py (2 hunks)
cognee/tests/integration/run_toy_tasks/conftest.py (0 hunks)
cognee/tests/test_memgraph.py (1 hunks)
cognee/tests/test_neo4j.py (1 hunks)
cognee/tests/test_relational_db_migration.py (3 hunks)
cognee/tests/test_weaviate.py (1 hunks)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1 hunks)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1 hunks)
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (4 hunks)
cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1 hunks)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1 hunks)
cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py (0 hunks)
entrypoint.sh (3 hunks)
examples/data/car_and_tech_companies.txt (1 hunks)
examples/database_examples/chromadb_example.py (1 hunks)
examples/database_examples/falkordb_example.py (1 hunks)
examples/database_examples/kuzu_example.py (1 hunks)
examples/database_examples/milvus_example.py (1 hunks)
examples/database_examples/neo4j_example.py (1 hunks)
examples/database_examples/pgvector_example.py (1 hunks)
examples/database_examples/qdrant_example.py (1 hunks)
examples/database_examples/weaviate_example.py (1 hunks)
examples/python/graphiti_example.py (2 hunks)
notebooks/cognee_demo.ipynb (3 hunks)
notebooks/cognee_graphiti_demo.ipynb (4 hunks)
notebooks/cognee_llama_index.ipynb (2 hunks)
notebooks/cognee_openai_compatable_demo.ipynb (1 hunks)
notebooks/cognee_simple_demo.ipynb (7 hunks)
notebooks/github_graph_visualization.html (0 hunks)
notebooks/graphrag_vs_rag.ipynb (7 hunks)
notebooks/hr_demo.ipynb (0 hunks)
notebooks/llama_index_cognee_integration.ipynb (5 hunks)

💤 Files with no reviewable changes (7)

cognee/modules/retrieval/exceptions/exceptions.py
cognee/shared/data_models.py
cognee/tests/integration/run_toy_tasks/conftest.py
assets/graph_visualization.html
notebooks/github_graph_visualization.html
cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py
notebooks/hr_demo.ipynb

🧰 Additional context used

🧬 Code Graph Analysis (30)

cognee/tests/test_weaviate.py (1)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-6)

cognee/api/v1/responses/__init__.py (1)

cognee/api/v1/responses/routers/get_responses_router.py (1)

get_responses_router (25-149)

cognee/modules/data/methods/__init__.py (1)

cognee/modules/data/methods/get_unique_dataset_id.py (1)

get_unique_dataset_id (5-6)

cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)

cognee/modules/retrieval/summaries_retriever.py (1)

SummariesRetriever (9-33)

cognee/modules/engine/models/__init__.py (1)

cognee/modules/engine/models/ColumnValue.py (1)

ColumnValue (4-9)

cognee/modules/pipelines/operations/__init__.py (1)

cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)

log_pipeline_run_initiated (6-22)

cognee/infrastructure/llm/gemini/adapter.py (4)

cognee/shared/logging_utils.py (1)

get_logger (137-158)

cognee/modules/observability/get_observe.py (1)

get_observe (5-11)

cognee/exceptions/exceptions.py (1)

InvalidValueError (47-54)

cognee/infrastructure/llm/rate_limiter.py (2)

rate_limit_async (220-243)

sleep_and_retry_async (331-376)

alembic/versions/482cd6517ce4_add_default_user.py (1)

cognee/modules/users/methods/create_default_user.py (1)

create_default_user (5-19)

cognee/api/v1/datasets/datasets.py (1)

cognee/modules/pipelines/operations/get_pipeline_status.py (1)

get_pipeline_status (8-35)

cognee/api/v1/responses/routers/__init__.py (1)

cognee/api/v1/responses/routers/get_responses_router.py (1)

get_responses_router (25-149)

cognee/api/client.py (1)

cognee/api/v1/responses/routers/get_responses_router.py (1)

get_responses_router (25-149)

cognee/infrastructure/llm/openai/adapter.py (1)

cognee/modules/observability/get_observe.py (1)

get_observe (5-11)

cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)

cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)

Explorer (15-61)

cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)

cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)

test_run_tasks (42-43)

cognee/modules/data/methods/get_unique_dataset_id.py (1)

cognee/modules/users/models/User.py (1)

User (12-39)

cognee/modules/engine/models/ColumnValue.py (1)

cognee/infrastructure/engine/models/DataPoint.py (1)

DataPoint (16-96)

cognee/base_config.py (1)

cognee/modules/observability/observers.py (1)

Observer (4-9)

cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)

cognee-frontend/src/utils/fetch.ts (1)

fetch (3-12)

cognee/modules/pipelines/operations/get_pipeline_status.py (1)

cognee/modules/pipelines/models/PipelineRun.py (1)

PipelineRun (15-27)

cognee/modules/observability/get_observe.py (3)

cognee/base_config.py (1)

get_base_config (29-30)

cognee/modules/observability/observers.py (1)

Observer (4-9)

cognee/api/v1/config/config.py (1)

monitoring_tool (37-39)

cognee/tasks/ingestion/ingest_data.py (1)

cognee/modules/data/methods/create_dataset.py (1)

create_dataset (11-33)

cognee/tests/test_neo4j.py (1)

cognee/modules/users/methods/get_default_user.py (1)

get_default_user (12-37)

examples/database_examples/qdrant_example.py (2)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-13)

cognee/api/v1/config/config.py (4)

config (15-194)

set_vector_db_config (161-172)

data_root_directory (32-34)

system_root_directory (17-29)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)

MemgraphAdapter (20-690)

cognee/modules/data/methods/create_dataset.py (2)

cognee/modules/data/methods/get_unique_dataset_id.py (1)

get_unique_dataset_id (5-6)

cognee/modules/users/models/User.py (1)

User (12-39)

cognee/tasks/ingestion/migrate_relational_database.py (3)

cognee/modules/engine/models/TableRow.py (1)

TableRow (6-12)

cognee/modules/engine/models/TableType.py (1)

TableType (4-8)

cognee/modules/engine/models/ColumnValue.py (1)

ColumnValue (4-9)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)

cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)

CollectionNotFoundError (5-14)

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

get_async_session (34-40)

cognee/exceptions/exceptions.py (1)

cognee/shared/logging_utils.py (4)

error (127-128)

warning (124-125)

info (121-122)

debug (133-134)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (5)

cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (4)

get_collection (75-80)

has_collection (51-53)

create_data_points (82-132)

delete_data_points (218-226)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (5)

get_collection (121-126)

has_collection (111-113)

get_connection (99-106)

create_data_points (128-144)

delete_data_points (300-304)

cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (3)

has_collection (74-78)

create_data_points (99-130)

delete_data_points (259-262)

cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)

CollectionNotFoundError (5-14)

cognee/infrastructure/engine/models/DataPoint.py (1)

DataPoint (16-96)

cognee/infrastructure/databases/graph/networkx/adapter.py (4)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (7)

has_node (66-75)

get_edges (264-275)

extract_node (121-124)

extract_nodes (126-136)

get_neighbors (381-383)

get_node (385-392)

get_nodes (394-402)

cognee/infrastructure/databases/graph/kuzu/adapter.py (7)

has_node (167-171)

get_edges (439-475)

extract_node (284-304)

extract_nodes (306-325)

get_neighbors (479-481)

get_node (483-502)

get_nodes (504-521)

cognee/infrastructure/databases/graph/graph_db_interface.py (4)

get_edges (177-179)

get_neighbors (182-184)

get_node (125-127)

get_nodes (130-132)

cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)

extract_node (235-238)

extract_nodes (240-241)

🪛 Ruff (0.11.9)

cognee/modules/data/methods/__init__.py

10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/retrieval/exceptions/__init__.py

7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/engine/models/__init__.py

6-6: .ColumnValue.ColumnValue imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

cognee/modules/pipelines/operations/__init__.py

1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

alembic/versions/482cd6517ce4_add_default_user.py

24-27: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

alembic/versions/1d0bb7fede17_add_pipeline_run_status.py

13-13: cognee.modules.pipelines.models.PipelineRun.PipelineRun imported but unused

Remove unused import

(F401)

13-13: cognee.modules.pipelines.models.PipelineRun.PipelineRunStatus imported but unused

Remove unused import

(F401)

cognee/api/v1/responses/routers/get_responses_router.py

74-74: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

examples/database_examples/chromadb_example.py

1-1: os imported but unused

Remove unused import: os

(F401)

cognee/modules/pipelines/operations/run_tasks_base.py

32-32: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

examples/database_examples/kuzu_example.py

1-1: os imported but unused

Remove unused import: os

(F401)

examples/database_examples/falkordb_example.py

1-1: os imported but unused

Remove unused import: os

(F401)

cognee/modules/retrieval/utils/brute_force_triplet_search.py

66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(EntityNotFoundError)

(SIM105)

examples/database_examples/pgvector_example.py

1-1: os imported but unused

Remove unused import: os

(F401)

cognee-mcp/src/server.py

166-166: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

430-430: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🪛 LanguageTool

examples/data/car_and_tech_companies.txt

[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...

(ENGLISH_WORD_REPEAT_RULE)

[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...

(A_VARIETY_OF)

[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...

(ENGLISH_WORD_REPEAT_RULE)

[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)

[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...

(ENGLISH_WORD_REPEAT_RULE)

[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...

(IN_RECENT_STYLE)

[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...

(ENGLISH_WORD_REPEAT_RULE)

🪛 Biome (1.9.4)

cognee-frontend/src/modules/ingestion/DataView/DataView.tsx

[error] 36-36: Do not shadow the global "DataView" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🔇 Additional comments (154)

CONTRIBUTING.md (1)

100-100: Clean and concise wording improvement.

The removal of "make sure to" makes the instruction more direct and concise without changing the meaning, improving readability.

README.md (1)

133-139: Great addition of cognee UI documentation.

The new section introduces users to the cognee UI feature with a clear description and visual representation, enhancing the user's understanding of available interfaces.

cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (2)

19-19: Improved test isolation with specific directory paths.

Replacing generic test directory names with more specific ones (e.g., test_rag_context → test_chunks_context_simple) creates better isolation between test environments and makes the test purpose clearer.

Also applies to: 23-23, 76-76, 80-80, 165-165, 169-169

193-198: Enhanced async test execution.

Consolidating all three test method calls within a single async main() function improves the structure and efficiency of the test execution by running all tests in a single event loop.

cognee/exceptions/exceptions.py (2)

15-16: Expanding exception flexibility with logging control

The addition of log and log_level parameters to the CogneeApiError constructor enables more flexible logging behavior, which is a valuable improvement.

23-30: Well-implemented conditional logging

Good implementation of conditional logging with support for different log levels. This enhancement allows for more granular control over exception logging.

cognee/modules/visualization/cognee_network_visualization.py (1)

24-24: Color mapping for new "ColumnValue" node type

Good addition of color mapping for the "ColumnValue" node type, which maintains consistent visual representation in the network visualization.

cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)

53-54: Added script execution capability

Adding the if __name__ == "__main__" guard is a good practice that enables direct execution of the test script, improving developer workflow.

cognee/modules/pipelines/models/PipelineRun.py (1)

9-9: Well-integrated pipeline status addition.

The new status DATASET_PROCESSING_INITIATED is a logical addition that enhances the pipeline execution workflow by adding a separate initiation step before processing starts. This follows the established naming convention and integrates well with the existing statuses.

cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)

130-130: Updated parameter name from limit to top_k

Good update to align with the parameter naming changes in the SummariesRetriever implementation. This ensures the test uses the correct parameter name which has been standardized across retrieval components.

cognee/modules/settings/get_settings.py (1)

2-2: Made LLM endpoint and API version optional

Good change to make endpoint and api_version fields optional in the LLMConfig model. This provides more flexibility for different LLM provider configurations where these fields might not be required.

Also applies to: 24-25

cognee/tests/test_weaviate.py (1)

85-85: LGTM: Correctly updated to await the now-async list_all method

This change properly awaits the list_all() method, aligning with the refactoring of the Weaviate adapter to be fully asynchronous.

examples/python/graphiti_example.py (1)

17-17: LGTM: Properly integrated user context into the task pipeline

The changes correctly implement user context propagation through the task execution pipeline, which aligns with the broader modifications in the codebase.

Also applies to: 31-33, 41-41

cognee/api/v1/responses/routers/__init__.py (1)

1-3: Good implementation of module exports

This file follows Python best practices by explicitly defining the public API through the __all__ list, making the module's interface clear and maintainable.
cognee/infrastructure/databases/graph/graph_db_interface.py (2)

81-81: Ensure consistency in error logging approaches

This is another instance of changing log level from error to debug. While the code handles the exception correctly, verify this change is consistent with overall error handling strategy across the codebase.

61-61:
Details

❓ Verification inconclusive

Consider the visibility impact of changing log level from error to debug

Changing this log level from error to debug will significantly reduce visibility of relationship addition issues. While the code properly handles the exception with rollback, ensure these errors are truly expected in normal operation and not indicative of actual problems that should be monitored.

🏁 Script executed:
#!/bin/bash
# Check how often these errors occur in logs if available
rg "Error adding relationship:" -A 1 --glob "*.log" 

# Also check if there are error handling patterns in other database adapters
rg "Error adding relationship:|Error committing session:" -A 2 --glob "*.py"
Length of output: 1221
Re-assess logging level for database errors

These exception handlers for both relationship additions and session commits are now logged at DEBUG, which may hide genuine issues from normal monitoring. Please confirm whether these failures are expected in routine operation—if not, consider raising the level to WARN or ERROR to ensure visibility.

Locations to review:

cognee/infrastructure/databases/graph/graph_db_interface.py:
• Line ~61 – logger.debug(f"Error adding relationship: {e}")
• Line ~?? – logger.debug(f"Error committing session: {e}")
cognee/api/v1/responses/__init__.py (1)

1-3: Good module structure for package-level exports

This file properly establishes a clean public API for the responses package by re-exporting the get_responses_router function. This follows Python best practices for package organization.
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)

10-10:
Details

✅ Verification successful

Verify that all callsites have been updated for the dataset prop change

The dataset prop type has been changed from { id: string } to { name: string }. While this component passes the dataset object directly to getExplorationGraphUrl() without accessing specific properties, verify that all components using Explorer have been updated to provide a dataset with a name property instead of an id property.

🏁 Script executed:
#!/bin/bash
# Search for usages of the Explorer component to check if they've all been updated
rg -A 5 -B 5 "<Explorer\s+dataset=" --type tsx
Length of output: 76

Let's retry searching for <Explorer> usages without the unsupported --type tsx flag:

🏁 Script executed:
#!/bin/bash
# Search all TSX files for <Explorer dataset= ... />
rg -g '*.tsx' -A5 -B5 '<Explorer[^>]*dataset='
Length of output: 1736

To confirm that both callsites now supply a dataset object with a name field (and no longer rely on id), let’s inspect their prop definitions and usages:

🏁 Script executed:
#!/bin/bash
echo "=== ExploreStep.tsx ==="
sed -n '1,200p' cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx

echo
echo "=== DatasetsView.tsx ==="
sed -n '1,200p' cognee-frontend/src/modules/ingestion/DatasetsView/DatasetsView.tsx
Length of output: 4213
All <Explorer> callsites now use a dataset with a name field

Both instances of <Explorer> have been updated to supply an object containing name rather than id, satisfying the new prop type:

ExploreStep.tsx

Defines ExploreStepProps as dataset: { name: string }

Passes dataset directly to <Explorer>

DatasetsView.tsx

Holds a state of type { id: string; name: string }

Passes dataset! to <Explorer> (extra id property is structurally compatible)

No further updates are required.
cognee/tests/test_neo4j.py (2)

77-85: Good troubleshooting approach for unstable test.

Commenting out the natural language search test with a clear explanation improves test stability for environments with weaker LLM models. This is a pragmatic solution to prevent CI failures.

For the longer term, consider implementing a conditional test that can detect LLM capabilities and skip this test automatically when using weaker models, rather than commenting it out entirely.

90-90: Correctly updated assertion to match removed test.

The assertion has been properly updated to expect 6 history entries instead of 8, accounting for the removed natural language search test. This ensures the test remains accurate.
cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1)

3-3:
Details

✅ Verification successful

Parameter type change aligns with codebase refactoring.

The function parameter has been updated from { id: string } to { name: string }, which aligns with the broader change across the frontend to use dataset names instead of IDs for identification. This is a consistent pattern being applied throughout the codebase.

Let's verify that all components calling this function have been updated to pass the correct parameter:

🏁 Script executed:
#!/bin/bash
# Find all occurrences of getExplorationGraphUrl being called with dataset parameters
rg "getExplorationGraphUrl\(" -A 3 -B 3
Length of output: 1265

🏁 Script executed:
#!/bin/bash
rg "dataset" -n cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx
Length of output: 244
Verified dataset parameter type updated consistently

I’ve confirmed that getExplorationGraphUrl is defined to accept dataset: { name: string } in
cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts and that its sole call site in
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx is passing the same dataset shape. No further updates are needed.
cognee/infrastructure/llm/gemini/adapter.py (2)

2-7: Import reorganization follows best practices.

The imports have been properly organized, separating standard library imports, third-party packages, and internal modules with blank lines. Adding explicit imports for BaseModel and type annotations improves code readability.

17-17: Centralized observability implementation improves maintainability.

Replacing conditional import logic with a centralized get_observe() function follows the DRY principle and improves code maintainability. This change is part of a broader effort to standardize observability across the codebase.

cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1)

157-162: Improved async test execution with consolidated event loop.

Consolidating all test executions into a single async main() function is a best practice for async code. This approach:

Avoids creating multiple event loops

Improves test execution efficiency

Provides a clear sequential execution flow

Aligns with patterns in other test modules

cognee/api/v1/cognify/cognify.py (1)

37-39: Explicit pipeline naming enhances observability.

Adding the pipeline_name="cognify_pipeline" parameter improves pipeline tracking and management without changing the external behavior of the function. This change supports enhanced pipeline status tracking and logging introduced in this PR.

cognee/api/client.py (2)

17-17: Good addition of OpenAI-compatible responses router import

The import statement for get_responses_router from the responses module follows the established pattern of importing routers in this file.

171-172: Well-integrated OpenAI-compatible responses API

The responses router is properly mounted at /api/v1/responses with appropriate tags, following the established pattern of router registration in this application. This addition enables OpenAI-compatible response generation with function call support.

cognee/infrastructure/llm/anthropic/adapter.py (1)

21-21: Good migration to async Anthropic client

Switching from anthropic.Anthropic() to anthropic.AsyncAnthropic() aligns with the existing async methods in this class and the broader effort to adopt asynchronous patterns for external service clients throughout the codebase. This change will improve performance by enabling better concurrency.

cognee/tasks/ingestion/ingest_data.py (1)

107-107: Correct update to match create_dataset signature

This change correctly updates the call to pass the full User object instead of just user.id, aligning with the updated signature of the create_dataset function as shown in the relevant code snippets.

cognee/api/v1/datasets/datasets.py (1)

31-31: Good improvement in specifying the pipeline name explicitly

Adding the pipeline name parameter ensures that status retrieval is scoped to the "cognify_pipeline" specifically, aligning with the updated get_pipeline_status function that now filters pipeline runs by name. This change improves the precision of pipeline status tracking.
cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1)

10-10:
Details

❌ Incorrect review comment

Property type change from id to name

The dataset prop type has been changed from { id: string } to { name: string }. This appears to be part of a coordinated update across the frontend codebase to standardize dataset identification by name rather than ID.

Verify that the cognifyDataset function and all related components have been updated to handle this change consistently:

🏁 Script executed:
#!/bin/bash
# Check if cognifyDataset function has been updated to work with 'name' property

echo "Checking cognifyDataset implementation:"
rg -A 10 "function cognifyDataset" --type ts --type tsx src/modules/datasets/
Length of output: 196

It looks like ripgrep doesn’t recognize the built-in “tsx” type, so let’s re-run the search using file globs:

🏁 Script executed:
#!/bin/bash
# Re-check cognifyDataset implementation in both .ts and .tsx files
echo "Searching for cognifyDataset definition:"
rg -A 10 "function cognifyDataset" -g '*.ts' -g '*.tsx' src/modules/datasets/
Length of output: 232

It looks like the path was off—let’s search the entire repo for where cognifyDataset is defined and used:
#!/bin/bash
echo "Locating function definition of cognifyDataset:"
rg -n "function cognifyDataset" -g '*.ts' -g '*.tsx'

echo
echo "Locating all calls to cognifyDataset:"
rg -n "cognifyDataset\(" -g '*.ts' -g '*.tsx'
Let’s do a repo-wide search for every occurrence of cognifyDataset, both its definition and usage:
#!/bin/bash
echo "Searching for cognifyDataset occurrences in .ts/.tsx files:"
rg -n "cognifyDataset" -g '*.ts' -g '*.tsx' .
Let’s search the entire repo for any references to cognifyDataset and confirm all instances of dataset.id were replaced with dataset.name:
#!/bin/bash
echo "1) All occurrences of cognifyDataset:"
rg -n "cognifyDataset" .

echo
echo "2) Any remaining uses of dataset.id:"
rg -n "dataset\s*\.\s*id" -g '*.ts' -g '*.tsx' .

echo
echo "3) Confirm uses of dataset.name:"
rg -n "dataset\s*\.\s*name" -g '*.ts' -g '*.tsx' .
Ignore cognifyDataset verification
I was unable to locate any definition or usage of cognifyDataset in the repository. It looks like this function doesn’t exist, so the original request to verify its update can be disregarded.

Likely an incorrect or invalid review comment.
cognee/modules/data/methods/get_unique_dataset_id.py (1)

1-6: LGTM - Deterministic ID generation approach

The implementation correctly uses UUID v5 with NAMESPACE_OID to generate a deterministic, unique dataset ID based on the combination of dataset name and user ID. This ensures consistent IDs for the same dataset-user combination.

cognee/base_config.py (1)

5-6: LGTM - Updated import for Observer enum

The import was correctly updated to use the new Observer enum from the observability module, replacing the deprecated MonitoringTool import.

cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

131-135: LGTM - Updated to use standardized search interface

The code correctly updates to use the new search method with explicit keyword arguments, replacing the deprecated get_distance_from_collection_elements method. This aligns with the broader refactoring of vector database adapters mentioned in the summary.

cognee/modules/observability/observers.py (1)

4-9: LGTM - Well-structured Observer enum

The Observer enum is well-defined as a string enum with appropriate values for different monitoring tools. This approach provides type safety while maintaining string compatibility, which is excellent for configuration values.

cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)

3-3: Function interface improved for flexibility

The update to accept a dataset object with optional properties instead of separate parameters is a good improvement. It allows more flexibility in how datasets are identified.

Also applies to: 10-10

cognee/modules/engine/models/ColumnValue.py (2)

4-9: New ColumnValue class implementation looks good

The ColumnValue class correctly inherits from DataPoint and defines the necessary attributes for representing column data in the graph.

9-9: Indexing only on properties seems appropriate

The metadata configuration indexes only the "properties" field, which aligns with the likely use case of searching column values in the graph.

cognee/infrastructure/llm/openai/adapter.py (1)

5-5: Import refactoring improves code organization

The changes simplify imports and centralize how the observe decorator is obtained, which is a good practice for maintainability.

The refactoring aligns with similar changes in other LLM adapters, promoting consistency across the codebase.

Also applies to: 18-20

cognee/tests/test_relational_db_migration.py (2)

115-118: Test assertions updated to match enhanced graph modeling

The increased expected counts for distinct nodes and edges correctly reflect the architectural changes where individual column data is now migrated as separate ColumnValue nodes.

161-162: Database-specific node and edge counts properly updated

The significantly higher node and edge counts for both SQLite and PostgreSQL providers accurately reflect the expanded graph representation resulting from the migration enhancements.

Also applies to: 192-193

cognee-frontend/src/app/page.tsx (2)

46-46: Updated notification message to improve user flow

The notification message now prompts users to run "Cognify" after data is successfully added, providing clearer guidance on the next step in the workflow.

106-106: Added cognify capability to DataView component

This change properly passes the new onCognify callback to the DataView component, enabling users to trigger cognification directly from the data view.

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)

72-72: Fixed apostrophe in comment

Simple comment correction from curly apostrophe to straight apostrophe for consistency.

333-334: Restricted schema list for table deletion

Changed from dynamically fetching all schemas to using a fixed list ["public", "public_staging"]. This is a safer approach as it restricts table deletion to only specific schemas, preventing accidental deletion of tables in other schemas.

cognee/modules/pipelines/operations/get_pipeline_status.py (2)

8-8: Added pipeline_name parameter to function signature

The function now accepts a pipeline name parameter, allowing for more specific pipeline status queries. This change aligns with the broader enhancement of pipeline management in the codebase.

23-23: Added filter by pipeline name

This change implements the filtering by pipeline name, restricting results to runs of the specified pipeline. This is consistent with the PipelineRun model which has a pipeline_name column.

notebooks/cognee_llama_index.ipynb (2)

15-17: Standardized notebook cell format

Updated the code cell format to use a list of strings for the "source" field, which is the standard Jupyter notebook format. This change improves compatibility with notebook tooling.

123-124: Reordered cell metadata and outputs

Standardized the notebook JSON structure by reordering metadata and outputs fields. These formatting changes don't affect code execution or logic but ensure consistent notebook structure.

notebooks/cognee_demo.ipynb (3)

470-470: Import correction for Task class

The import statement has been corrected to use the lowercase module name, following Python's conventional naming patterns where module names are typically lowercase.

508-508: Function signature update with user parameter

The call to run_tasks now correctly includes the user parameter, aligning with changes in the task execution pipeline that now require user context for operations. This ensures proper authorization and user-specific data handling.

532-536: Improved user retrieval logic

The user retrieval logic has been refined to follow a two-step process: first fetch the default user, then retrieve the complete user object by ID. This approach ensures the full user object with all necessary attributes is available for dataset operations.

cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)

81-82: Improved UX with immediate input clearing

The change to clear the input field immediately after submission provides better feedback to the user, rather than waiting for the response to come back.

entrypoint.sh (5)

16-16: Migration command updated to use direct alembic invocation

Changing from Poetry-based alembic invocation to direct command is consistent with the Dockerfile refactoring mentioned in the PR summary.

25-25: Added helpful confirmation message

The added confirmation message improves observability of the container startup process.

27-27: More generic startup message

Changed from the specific "Starting Gunicorn" to the more generic "Starting server..." which is appropriate for the entrypoint script.

36-37: Simplified debugpy invocation

The debugpy invocation has been simplified by removing the python -m prefix. This works because debugpy is now directly available in the path.

38-42: Removed exec from server commands

Removing the exec command allows the shell script to continue running after the server starts, which might be necessary for additional steps after server startup.

Note: This change means the container will not receive termination signals directly. Ensure that there's a proper signal handling mechanism or container health checks to manage graceful shutdown.

cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)

1-4: Imports look appropriate and complete.

These imports correctly provide the necessary components for the function: UUID generation, database engine access, and pipeline run models.

cognee/api/v1/cognify/code_graph_pipeline.py (4)

5-6: Import reorganization improves clarity.

The imports were reorganized with the addition of get_observe from the observability module. This follows the project's import organization pattern.

14-16: Updated imports align with code functionality.

The import changes reflect removing unused imports and adding the necessary get_unique_dataset_id function, which aligns with the changes in dataset ID generation logic later in the file.

25-25: Simplified observability setup.

The conditional import of observe from langfuse.decorators was replaced by a direct assignment using the centralized get_observe() function. This simplifies the code and ensures consistent observability behavior across the codebase.

68-68: Improved dataset ID generation with user context.

The synchronous generation of a fixed UUID is replaced with an asynchronous call to get_unique_dataset_id that properly incorporates user context. This makes dataset IDs more user-specific and maintainable.

cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)

9-14: Enhanced exception class with improved configuration options.

The changes to CollectionNotFoundError are beneficial:

Fixed the default name parameter from "DatabaseNotCreatedError" to the correct "CollectionNotFoundError"

Added configurable logging parameters (log and log_level) for better error handling flexibility

These changes align with good practices for error reporting and logging

The improvements support standardized error handling across vector database adapters, allowing for more fine-grained control of error visibility.

notebooks/cognee_graphiti_demo.ipynb (8)

18-18: Fixed typo in markdown text.

Corrected "libaries" to "libraries" in the markdown cell, improving documentation quality.

27-29: Added essential async support.

Added import asyncio which is necessary for executing the async functions in the notebook. This is a good addition as the notebook contains async code.

42-43: Important imports for user handling.

Added imports for get_llm_client and get_default_user, which align with the updated user-aware workflow implemented later in the notebook.

131-131: Simplified pruning code.

Simplified the prune_data call by removing an inline comment. This improves code readability.

135-137: Added user context initialization.

Added code to initialize the default user, which is now required for the pipeline execution. This ensures proper user context throughout the pipeline process.

143-143: Minor formatting adjustment.

Small formatting change with no functional impact.

145-145: Updated pipeline execution with user context.

Modified run_tasks to include the user parameter, aligning with updates to the pipeline execution system that now requires user context for proper operation.

148-148: Simplified result printing.

Simplified the pipeline result printing by removing string formatting and directly printing the result object. This is cleaner and will show more information about the pipeline status.

cognee/modules/data/methods/create_dataset.py (3)

7-11: Cleaner dependency organization with user object access

The code now imports and uses the User model directly, which is more type-safe and maintainable than working with just the UUID. This is a good architectural choice that makes the function's dependencies more explicit.

11-12: Improved function signature by using the User model

Changing from accepting a raw UUID to a full User object is a good design decision. It better enforces type safety and makes the function's requirements more explicit. The function correctly extracts the owner_id from the user.

24-26: Properly refactored dataset ID generation logic

The implementation now correctly uses the specialized get_unique_dataset_id helper function to generate dataset IDs consistently. The extracted functionality makes the code more maintainable and ensures consistent ID generation across the codebase.

cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (4)

10-13: Good test setup with proper data cleanup

The test properly cleans up data and system metadata before running, which prevents test interference from previous runs and ensures reproducible test results.

14-22: Well-structured task definitions with different context usage patterns

The test defines tasks with different parameter patterns:

task_1 uses both input and context

task_2 uses only input

task_3 uses both input and context

This properly tests the pipeline's ability to pass context to tasks that need it while handling tasks that don't require context.

26-35: Clear pipeline setup with context parameter

The test properly sets up the pipeline with tasks, input data, user, and context, demonstrating the intended usage pattern of the run_tasks_base function.

42-47: Well-structured test wrapper and execution

The synchronous wrapper and main guard allow for both automated test runs and manual execution, which is good for development and testing.
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (4)

33-33: Good interface update with Promise return type

The onCognify property is correctly defined as returning a Promise, which makes it clear that this is an asynchronous operation.

60-70: Well-implemented button state management

The implementation correctly:

Uses the useBoolean hook to manage button state

Disables the button while the operation is in progress

Re-enables the button regardless of success or failure using finally

This prevents multiple submissions and provides good user feedback.

80-84: Well-integrated Cognify button in the UI

The Cognify button is properly placed alongside existing controls and uses the appropriate UI component (CTAButton) for a primary action.

36-36:
Details

❌ Incorrect review comment

Avoid shadowing the global DataView object

The component name is shadowing the global DataView property, which could lead to confusion.

Consider renaming this component to something more specific like DatasetView or CogneeDataView to avoid potential conflicts with the global DataView constructor.

🏁 Script executed:
#!/bin/bash
# Check if DataView is used as a global property in the codebase
rg "new DataView\(" --glob "*.{ts,tsx,js,jsx}"
Length of output: 48
No global DataView conflict detected
The DataView React component is declared and exported within its own module scope, so it doesn’t override or interfere with the built-in DataView constructor elsewhere. No occurrences of new DataView(...) or references to the global API were found in the codebase. You can safely keep the current name or rename it for clarity at your discretion.

Likely an incorrect or invalid review comment.

🧰 Tools

🪛 Biome (1.9.4)

[error] 36-36: Do not shadow the global "DataView" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)
examples/database_examples/kuzu_example.py (4)

8-34: Well-structured example with clear configuration

The example clearly demonstrates how to:

Configure KuzuDB as the graph database provider

Set up data directories with proper paths

Document each step with helpful comments

This makes it easy for users to understand the configuration process.

39-54: Good dataset creation and sample data ingestion

The example demonstrates dataset creation and data ingestion with realistic sample text about KuzuDB, which helps users understand the context and provides a meaningful example.

57-78: Comprehensive search examples

The example thoroughly demonstrates three different search types:

Insights search

Chunks search with dataset filtering

Graph completion search

Each search is properly formatted and the results are clearly printed, which provides users with a complete guide to using search functionality.

79-81: Good practice showing optional cleanup

The example includes commented-out cleanup code, which is useful for users who want to run the example multiple times without data accumulation while making it clear that cleanup is optional.

cognee/tasks/temporal_awareness/index_graphiti_objects.py (2)

23-32: Consistent approach to graph engine queries with explicit parameters

The code now consistently passes empty parameter dictionaries (params={}) to all graph engine queries, aligning with the updated graph adapter interfaces mentioned in the PR summary. This makes the code more maintainable and consistent across the codebase.

34-40: Updated graph data retrieval interface

The change from get_model_independent_graph_data() to get_graph_data() and the updated node iteration pattern reflect the refactored graph database adapters mentioned in the PR summary. The new approach of iterating directly over node ID-data pairs is cleaner than the previous implementation.
examples/database_examples/falkordb_example.py (1)

29-39: ⚠️ Potential issue

Update FalkorDB configuration comment

Line 29 creates a Milvus database path, but the example is for FalkorDB. This appears to be a copy-paste error from the Milvus example file.
-local_milvus_db_path = os.path.join(cognee_directory_path, "databases", "milvus.db")

 # Configure Milvus as the vector database provider
 cognee.config.set_vector_db_config(
     {
-        "vector_db_url": local_milvus_db_path,  # Enter Milvus Endpoint if exist
+        "vector_db_url": "",  # Enter FalkorDB vector endpoint if needed 
         "vector_db_key": "",  # Enter Token
-        "vector_db_provider": "milvus",  # Specify Milvus as provider
+        "vector_db_provider": "falkordb",  # Specify FalkorDB as provider
     }
 )
Likely an incorrect or invalid review comment.
cognee/tasks/ingestion/migrate_relational_database.py (2)

98-133: Column data migration enhances graph representation capabilities

The addition of column-level nodes provides a more granular representation of relational data in the graph, enabling more detailed queries and insights. The implementation correctly excludes primary and foreign keys to avoid redundancy.

16-16: Function signature updated with clear optional parameter

The addition of the optional migrate_column_data parameter with a sensible default (True) allows users to control the granularity of the migration while maintaining backward compatibility.

examples/database_examples/milvus_example.py (1)

8-18: Well-structured example demonstrates clear process flow

The function documentation and example flow are clear, showing how to set up, process data, and perform searches with Milvus. This structure makes it easy for users to understand the key steps in using Cognee with Milvus.

examples/database_examples/neo4j_example.py (1)

1-95: Well-structured example with good documentation.

The example is well-organized, follows a logical flow, and has clear documentation. It successfully demonstrates the integration between Cognee and Neo4j with appropriate comments explaining each step.

examples/database_examples/chromadb_example.py (1)

1-88: Well-structured example with good documentation.

The example is well-organized, follows a logical flow, and has clear documentation. It successfully demonstrates the integration between Cognee and ChromaDB with appropriate comments explaining each step.

🧰 Tools

🪛 Ruff (0.11.9)

1-1: os imported but unused

Remove unused import: os

(F401)

cognee/modules/pipelines/operations/run_tasks.py (2)

87-93: Good use of explicit keyword arguments.

Good job using explicit keyword arguments for clarity when calling run_tasks_with_telemetry. This makes the code more readable and less prone to errors when parameters are reordered or added.

23-42: Well-implemented backward-compatible parameter addition.

The addition of the optional context parameter with a default value of None is a good example of how to extend functionality while maintaining backward compatibility with existing code.

examples/database_examples/pgvector_example.py (2)

1-100: Well-structured example for PGVector integration.

This example script provides a comprehensive demonstration of using Cognee with PostgreSQL and PGVector, with the same well-organized structure as other database examples. It appropriately configures both vector and relational database settings, which is necessary for PGVector.

🧰 Tools

🪛 Ruff (0.11.9)

1-1: os imported but unused

Remove unused import: os

(F401)

94-95: Clean-up code commented out.

Similar to the Qdrant example, the cleanup calls are commented out. This is appropriate for an example script as it allows users to inspect the results after running.

cognee/tests/test_memgraph.py (1)

88-92: Magic number in history length assertion
assert len(history) == 8 is brittle – the expected number of history entries is an implementation detail that may change. Either calculate the expected value dynamically (e.g., expected = len(SEARCH_TYPES_USED) * entries_per_search) or drop the count assertion and verify properties that matter (non-empty, chronological order, etc.).

cognee/modules/pipelines/operations/run_tasks_base.py (1)

66-82: LGTM - Context forwarding is properly implemented

The function signature update and context forwarding mechanism are correctly implemented. The context is passed through the pipeline execution, maintaining backward compatibility with tasks that don't expect a context parameter.

cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

173-174: LGTM - Improved error handling for collection not found case

Adding specific error handling for the CollectionNotFoundError case is a good improvement. It makes the function more robust by gracefully handling the case when a collection doesn't exist.

examples/data/car_and_tech_companies.txt (1)

1-37: Well-structured sample data for testing

The sample data is well-structured and provides a good representation of two different domains (automotive and technology) for testing knowledge graph functionality. The paragraph format with numbered lists is clear and consistent across both text samples.

🧰 Tools

🪛 LanguageTool

[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...

(ENGLISH_WORD_REPEAT_RULE)

[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...

(A_VARIETY_OF)

[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...

(ENGLISH_WORD_REPEAT_RULE)

[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...

(THIS_NNS)

[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)

[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...

(ENGLISH_WORD_REPEAT_RULE)

[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...

(ENGLISH_WORD_REPEAT_RULE)

[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...

(IN_RECENT_STYLE)

[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...

(ENGLISH_WORD_REPEAT_RULE)

Dockerfile (6)

1-2: Good choice of base image for modern Python development

Using a pre-installed uv image with Python 3.12 is a great improvement over the previous setup. This leverages newer Python features and a more efficient dependency management tool.

19-27: Well-structured dependency installation

Good approach combining all system dependencies in a single RUN command with proper cleanup of apt lists to reduce image size.

29-34: Effective use of Docker layer caching

Copying just the configuration files first is a good practice for leveraging Docker's layer caching mechanism.

33-34: Comprehensive dependency management with uv

Using uv sync with specific extras and flags shows a thoughtful approach to dependency management.

36-38: Good sequencing of file copying

Copying Alembic migrations before application code is a smart approach that allows for better layer caching.

56-60: Properly configured runtime environment

Good configuration of PATH and PYTHONPATH to ensure the application and its dependencies are correctly accessible.

cognee/modules/pipelines/operations/pipeline.py (3)

3-3: Clean import organization

The imports are well organized and the addition of new modules like get_unique_dataset_id and log_pipeline_run_initiated shows good modularization.

Also applies to: 5-9, 16-16

64-92: Improved dataset handling with user-scoped IDs

The refactoring to use user-scoped dataset IDs is a significant improvement over the previous implementation. The code properly handles existing datasets and creates new ones with unique IDs when needed.

148-154: Enhanced pipeline status checking

Good addition of early returns to prevent reprocessing of datasets that are already being processed or have been completed. This will improve efficiency and prevent duplicate work.

cognee/api/v1/responses/dispatch_function.py (4)

19-45: Well-structured function dispatch implementation

The dispatch_function is cleanly implemented with proper error handling and logging. It correctly handles both dictionary and object inputs for broader compatibility.

47-84: Comprehensive search handling with proper validation

The handle_search function includes thorough validation of inputs, defaults for optional parameters, and proper error handling. Good approach to extracting schema information from tool definitions.

87-101: Clear cognify handling with conditional response

The handle_cognify function is well-implemented with a clear conditional response based on whether text was provided. Good separation of concerns between adding text and running cognify.

104-107: Simple and effective prune handler

The handle_prune function is straightforward and returns a clear success message.

cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (4)

4-8: Improved import organization

Better organization of imports with the logging utility first, followed by core components and exceptions.

100-125: Enhanced error handling for data point creation

Good improvement in error handling by specifically catching UnexpectedResponse and raising a more specific CollectionNotFoundError when appropriate.

155-179: Robust search method implementation with validation

The search method has been significantly improved with:

Parameter validation to ensure at least one query type is provided

Early returns for invalid limits

Collection existence checking

Proper embedding of text queries when needed

These changes make the method more robust and efficient.

208-215: Consistent error handling pattern

The error handling pattern for UnexpectedResponse is consistently implemented here, similar to the create_data_points method. This provides a unified approach to handling collection not found errors.

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)

16-21: Import grouping reads better – thanks!
Splitting utility, exception, and embedding-engine imports onto their own lines improves scan-ability and avoids very long lines.

186-188: Good use of CollectionNotFoundError with log-level control
Raising the domain-specific error instead of KeyError/RuntimeError makes the contract clearer for callers and allows the service to downgrade noisy “missing collection” events to DEBUG – nice touch.

cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (3)

122-125: Great: translate Milvus-specific exception to domain error
Catching CollectionNotExistException and re-raising CollectionNotFoundError keeps the adapter boundary clean and avoids leaking backend-specific types.

148-166: retrieve() now accepts UUID but string interpolation still relies on str() cast
You join the IDs with f'"{id}"', which implicitly calls UUID.__str__; fine. However returning raw Milvus rows breaks parity with other adapters where retrieve returns a List[ScoredResult]. Consider normalising here or documenting the deviation.

251-251: Changed signature is a breaking change – ripple review recommended
delete_data_points now expects List[UUID] instead of List[str]. Verify callers were updated; otherwise a silent TypeError or string/UUID mismatch will surface at runtime.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

45-47:
Details

❌ Incorrect review comment

Method signature now violates the parent interface

GraphDBInterface.has_node() is declared with a str argument, but the override was changed to UUID. Besides breaking the Liskov-substitution principle, this will trip static type-checkers and may crash at runtime when other code (e.g., Neo4j/Kùzu adapters or service code) calls the interface with str IDs.

Two safe options:
-async def has_node(self, node_id: UUID) -> bool:
+async def has_node(self, node_id: str | UUID) -> bool:
or refactor all adapters + the interface to accept UUID. Mixing the two types is a hidden time-bomb.

🏁 Script executed:
#!/bin/bash
# Locate the GraphDBInterface declaration
rg -n "class GraphDBInterface" -C 5

# Inspect the has_node method signature in that interface
rg -n "def has_node" -C 5
Length of output: 5302

🏁 Script executed:
#!/bin/bash
# Verify whether GraphDBInterface declares has_node and its signature
rg -n "has_node" -C 5 cognee/infrastructure/databases/graph/graph_db_interface.py

# Show the interface block around line 95 where methods are defined
sed -n '90,115p' cognee/infrastructure/databases/graph/graph_db_interface.py
Length of output: 902
Ignore incorrect override warning
The GraphDBInterface does not declare a has_node method at all, so the networkx adapter’s signature isn’t overriding anything in the interface. No change is needed here.

Likely an incorrect or invalid review comment.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

76-82: get_collection awaits a possibly synchronous method

AsyncConnection.open_table() in LanceDB is synchronous (it returns an AsyncTable immediately).
await connection.open_table(...) works only if the library changed this to async def. Double-check the version; otherwise drop the await:
-        return await connection.open_table(collection_name)
+        return connection.open_table(collection_name)
Failure will raise TypeError: object AsyncTable can’t be used in 'await' expression.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

121-127: Asynchronous contract mismatch

client.get_collection in chromadb-python returns a plain object, not a coroutine; awaiting it may raise TypeError in newer versions. Guard with inspect.isawaitable or remove await once verified.
-return await client.get_collection(collection_name)
+return client.get_collection(collection_name)
notebooks/llama_index_cognee_integration.ipynb (4)

6-8: Notebook source formatting standardized to array structure.

The notebook cell's source format has been converted from a single string to an array format for consistency across the project. This is a structural change in the notebook JSON and doesn't affect functionality.

62-64: Code cell source formatting standardized.

Similar to the markdown cells, this code cell's source has been converted to array format for consistent notebook structure.

199-201: Standardized remaining markdown cell source formatting.

The remaining markdown cells have been updated to use consistent array-based source formatting, maintaining the notebook's JSON structure uniformity.

Also applies to: 219-221

282-283: Updated Python kernel version specification.

The Python kernel metadata has been updated to explicitly specify the Python version as "3.12.9" instead of just "python", which helps ensure reproducibility by clearly documenting the expected Python version.
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (3)

5-5: Updated dataset prop type to use name instead of id.

The dataset prop type has been changed from { id: string } to { name: string }, aligning with a broader refactoring across the frontend to standardize on using dataset name as the identifier.

11-11: Simplified Explorer component usage.

Removed the non-null assertion operator (!) since it's unnecessary when passing the dataset prop directly. This improves type safety and readability.

4-6:
Details

❓ Verification inconclusive

Verify dataset name property usage in Explorer component.

The Explorer component now expects to receive a dataset with a name property instead of id. Based on the relevant code snippet from Explorer.tsx, you should verify that the getExplorationGraphUrl function has been updated to work with dataset name rather than id.

Also applies to: 8-12

🏁 Script executed:
#!/bin/bash
# Check if getExplorationGraphUrl has been updated to work with dataset.name
grep -r "getExplorationGraphUrl" --include="*.ts" --include="*.tsx" src/modules/exploration/
Length of output: 150

It looks like the src/modules/exploration/ path doesn’t exist in this branch. Let’s search the entire repo for getExplorationGraphUrl:
#!/bin/bash
# Search for getExplorationGraphUrl across the entire codebase
grep -R "getExplorationGraphUrl" -n .
Let’s search the entire repo for getExplorationGraphUrl to locate its definition/import and verify it’s using dataset.name:
#!/bin/bash
# Search for getExplorationGraphUrl across the entire codebase
grep -R "getExplorationGraphUrl" -n .
Please verify getExplorationGraphUrl uses dataset.name instead of dataset.id.
In ExploreStep.tsx, the prop now reads:
interface ExploreStepProps {
  dataset: { name: string };
}
Locate the definition or import of getExplorationGraphUrl (e.g. in Explorer.tsx or your URL-builder module) and ensure it has been updated to accept dataset.name rather than dataset.id.

Update any import paths or calls if the function has moved or been renamed.
notebooks/cognee_simple_demo.ipynb (3)

18-18: Updated cognee package version.

The cognee package has been upgraded from version 0.1.36 to 0.1.39. This ensures compatibility with the latest backend improvements including the new OpenAI-compatible responses API and enhanced dataset handling.

13-19: Standardized notebook cell metadata.

All code cells now explicitly include "execution_count": null and "outputs": [] fields, standardizing the notebook metadata format for better consistency across the project's notebooks.

Also applies to: 33-41, 53-60, 72-80, 92-98, 102-108, 112-118, 130-143

1-175: Ensure compatibility with new API features.

The notebook has been updated to use cognee 0.1.39, which introduces new features like OpenAI-compatible responses and improved dataset handling. While the notebook code itself hasn't changed functionally, you should verify that all existing code works with the new package version, especially regarding async dataset handling.

notebooks/graphrag_vs_rag.ipynb (4)

56-56: Package version update looks appropriate.

The notebook is updated to use cognee 0.1.39, which aligns with the new APIs being introduced in this PR. This ensures the notebook will work with the latest version of the package.

152-153: Correct update to the new API import path and method signature.

The import statement has been properly updated from cognee.modules.search.types to cognee.api.v1.search, and the search method now uses keyword arguments (query_type, query_text) instead of positional arguments, which follows the new API convention.

173-173: Consistent use of updated search API.

The RAG completion search is correctly updated to use the new keyword argument pattern and the appropriate SearchType enum value. This maintains consistency with the changes seen in the GraphRAG search call above.

202-202: Insights search syntax properly updated.

The insights search call has been updated to match the new API pattern with keyword arguments. All search calls in this notebook now consistently use the new method signature.

examples/database_examples/qdrant_example.py (10)

1-6: Appropriate imports for asynchronous operation.

The imports cover all necessary modules for asynchronous operation with the Cognee package, including the updated SearchType import from the new API path.

8-19: Well-documented main function with clear purpose.

The main function includes a thorough docstring explaining the example's purpose and steps. This follows good documentation practices and helps users understand how to use Cognee with Qdrant.

20-32: Secure credential handling and proper database configuration.

The code properly retrieves credentials from environment variables and configures Cognee to use Qdrant as the vector database provider. This is a secure approach to handling credentials rather than hardcoding them.

34-41: Appropriate path handling for data storage.

Using pathlib for path manipulation is a good practice. The code correctly sets up relative paths based on the script location, which makes the example more portable.

43-45: Optional data cleanup for fresh start.

Including the prune operations is helpful for ensuring a clean environment when running the example. The comment clearly indicates that this step is optional.

47-59: Clear dataset creation and sample data addition.

The sample data is well-formed and representative of Qdrant's features. The example demonstrates how to add text data to a named dataset, which is a fundamental operation in Cognee.

61-62: Proper knowledge extraction step.

The example correctly demonstrates how to process the added document using the cognify method with the dataset name.

64-86: Comprehensive demonstration of different search types.

The example showcases three different search types (INSIGHTS, CHUNKS, GRAPH_COMPLETION) which is valuable for users to understand the versatility of Cognee's search capabilities. Each search includes proper result handling and output formatting.

87-89: Optional cleanup steps are commented out.

It's good practice to include these cleanup steps but leave them commented out, allowing users to decide whether they want to retain the data for further experimentation.

92-93: Standard Python script entry point with proper asyncio handling.

The script correctly uses asyncio.run() to execute the async main function when the script is run directly.

cognee/api/v1/responses/routers/default_tools.py (2)

1-37: Well-structured definition of the search tool.

The search tool is properly defined with clear parameter types and descriptions. The search_type enum includes the appropriate options, and the required parameter is correctly specified. This follows OpenAI's function calling format for tool definitions.

60-86: Prune tool definition is well-defined with appropriate constraints.

The prune tool definition includes appropriate constraints for its parameters, such as the enum values for prune_strategy and min/max values for min_confidence. The default value for prune_strategy is also helpful.

coderabbitai · 2025-05-19T11:46:38Z

cognee/shared/logging_utils.py

    root_logger.setLevel(log_level)

-    if log_level > logging.WARNING:
+    if log_level > logging.DEBUG:


🛠️ Refactor suggestion

Broadened SQLAlchemy warning suppression.

Changed from suppressing SQLAlchemy warnings only when log_level > WARNING to suppressing them when log_level > DEBUG. This means warnings will now be suppressed at INFO and WARNING levels where they were previously visible.

Consider adding a comment explaining the rationale behind this change, as it affects the visibility of potential SQLAlchemy issues during debugging.

🤖 Prompt for AI Agents

In cognee/shared/logging_utils.py at line 315, add a comment explaining why SQLAlchemy warnings are now suppressed for log levels greater than DEBUG instead of only greater than WARNING. This comment should clarify the intention behind broadening the suppression to include INFO and WARNING levels, helping future readers understand the impact on visibility of SQLAlchemy warnings during debugging.

coderabbitai · 2025-05-19T11:46:38Z

cognee/modules/retrieval/exceptions/__init__.py

 """

-from .exceptions import SearchTypeNotSupported, CypherSearchError, CollectionDistancesNotFoundError
+from .exceptions import SearchTypeNotSupported, CypherSearchError


🛠️ Refactor suggestion

Address unused imports

Static analysis indicates that the imported exceptions (SearchTypeNotSupported and CypherSearchError) are not being used within this file.

Either:

Add them to an __all__ list to make them available to importers:

from .exceptions import SearchTypeNotSupported, CypherSearchError +__all__ = ["SearchTypeNotSupported", "CypherSearchError"]

Or re-export them more explicitly:

-from .exceptions import SearchTypeNotSupported, CypherSearchError +from .exceptions import SearchTypeNotSupported as SearchTypeNotSupported +from .exceptions import CypherSearchError as CypherSearchError

Or import with __all__ if defined in the exceptions module:

-from .exceptions import SearchTypeNotSupported, CypherSearchError +from .exceptions import SearchTypeNotSupported, CypherSearchError, __all__

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from .exceptions import SearchTypeNotSupported, CypherSearchError

from .exceptions import SearchTypeNotSupported, CypherSearchError

__all__ = ["SearchTypeNotSupported", "CypherSearchError"]

🧰 Tools

🪛 Ruff (0.11.9)

7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🤖 Prompt for AI Agents

In cognee/modules/retrieval/exceptions/__init__.py at line 7, the imported exceptions SearchTypeNotSupported and CypherSearchError are not used in the file. To fix this, explicitly re-export these exceptions by adding an __all__ list including their names or by using a direct re-export statement. This will make the imports available to other modules without causing unused import warnings.

coderabbitai · 2025-05-19T11:46:38Z

cognee-frontend/src/app/wizard/WizardPage.tsx

  } = useBoolean(false);

-  const dataset = { id: 'main' };
+  const dataset = { name: 'main' };


💡 Verification agent

🧩 Analysis chain

Property name change is consistent with broader refactoring.

Changing the property from id to name aligns with the broader refactoring across the codebase as mentioned in the AI summary.

Let's verify that all usages of dataset.id have been updated to use dataset.name instead:

🏁 Script executed:

#!/bin/bash # Check for any remaining instances of dataset.id that might have been missed during refactoring echo "Searching for potential missed refactoring instances:" rg "dataset\.id" --type=typescript --type=tsx

Length of output: 199

The rg error indicates “typescript” isn’t a recognized file type. Let’s broaden the search to all .ts/.tsx files:

🏁 Script executed:

#!/bin/bash echo "Searching for any remaining `dataset.id` usages in .ts/.tsx files:" rg "dataset\.id" -g '*.ts' -g '*.tsx'

Length of output: 1298

Incomplete refactoring: remaining dataset.id references need updating

Several files still reference dataset.id instead of dataset.name. Please update each occurrence to use the new name property (and adjust any related variable or API-parameter names), for example:

• cognee-frontend/src/modules/ingestion/useDatasets.ts

status: statuses[dataset.id]

datasets.filter((dataset) => dataset.id !== datasetId)

• cognee-frontend/src/modules/ingestion/addData.ts

if (dataset.id) {
formData.append('datasetId', dataset.id);
}

• cognee-frontend/src/modules/ingestion/DatasetsView/DatasetsView.tsx

<DatasetItem key={dataset.id} onClick={() => onDatasetClick(dataset)}>

• cognee-frontend/src/modules/datasets/deleteDataset.ts

fetch(/v1/datasets/${dataset.id})

• cognee-frontend/src/modules/datasets/getDatasetData.ts

fetch(/v1/datasets/${dataset.id}/data)

• cognee-frontend/src/modules/datasets/cognifyDataset.ts

datasets: [dataset.id || dataset.name]

• cognee-frontend/src/app/page.tsx

.then(() => setSelectedDataset(dataset.id));

const dataset = datasets.find((dataset) => dataset.id === selectedDataset);

Example diff for one case:

- status: statuses[dataset.id] + status: statuses[dataset.name]

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cognee-frontend/src/app/wizard/WizardPage.tsx at line 32, the dataset object uses the property name 'name' instead of 'id' as part of a broader refactoring. However, there are still many references to 'dataset.id' across the codebase that need to be updated to 'dataset.name' to maintain consistency. Review all occurrences of 'dataset.id' in the project files, especially those listed in the comment, and replace them with 'dataset.name'. Also, update any related variable names, API parameters, and usages accordingly to reflect this change.

coderabbitai · 2025-05-19T11:46:38Z

alembic/versions/482cd6517ce4_add_default_user.py

+    try:
+        await_only(create_default_user())
+    except Exception:
+        pass


🛠️ Refactor suggestion

Improve exception handling with contextlib.suppress and add logging

While suppressing exceptions during default user creation may be intentional (perhaps to handle cases where the user already exists), the current implementation silently swallows all exceptions without any logging. This could mask important errors.

Apply this refactoring:

- try: - await_only(create_default_user()) - except Exception: - pass + import contextlib + import logging + + logger = logging.getLogger(__name__) + + with contextlib.suppress(Exception) as e: + await_only(create_default_user()) + logger.info("Default user created successfully") + if e: + logger.warning(f"Failed to create default user: {e}")

Alternatively, for a simpler fix that addresses just the static analysis warning:

- try: - await_only(create_default_user()) - except Exception: - pass + from contextlib import suppress + + with suppress(Exception): + await_only(create_default_user())

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.11.9)

24-27: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

🤖 Prompt for AI Agents

In alembic/versions/482cd6517ce4_add_default_user.py around lines 24 to 27, replace the try-except block that silently passes on all exceptions with contextlib.suppress to explicitly suppress expected exceptions, and add logging to record any suppressed exceptions or unexpected errors. This improves clarity and ensures important errors are not silently ignored.

coderabbitai · 2025-05-19T11:46:39Z

cognee/modules/retrieval/graph_completion_retriever.py

-            triplets = await self.get_triplets(query)
-        except EntityNotFoundError:
-            return ""
+        triplets = await self.get_triplets(query)


💡 Verification agent

🧩 Analysis chain

Verify that exception propagation is handled appropriately by callers

The removal of the try-except block around get_triplets changes the error handling behavior - EntityNotFoundError will now propagate to callers instead of being caught and returning an empty string.

Make sure all callers of get_context are prepared to handle these exceptions. Run this script to identify callers that might need updates:

🏁 Script executed:

#!/bin/bash # Find all places where get_context is called echo "Finding direct callers of get_context method:" rg --context=2 "\.get_context\(" --type py echo -e "\nChecking if any of these callers have try-except blocks:" rg --context=5 "try:.*\.get_context\(" --type py

Length of output: 15045

Preserve error‐handling around get_triplets or update callers of get_context

The removal of the try/except in GraphCompletionRetriever.get_context lets EntityNotFoundError (and DatabaseNotCreatedError) bubble up to every caller. A scan of the codebase found numerous direct callers—none of which wrap get_context in a try/except—so this change will break existing tests (e.g., empty‐graph scenarios expecting "") and may crash downstream consumers.

Call‐sites requiring attention:

cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py

cognee/modules/retrieval/EntityCompletionRetriever.py

cognee/modules/retrieval/cypher_search_retriever.py

cognee/modules/retrieval/summaries_retriever.py

cognee/modules/retrieval/natural_language_retriever.py

cognee/modules/retrieval/code_retriever.py

cognee/modules/retrieval/chunks_retriever.py

cognee/modules/retrieval/completion_retriever.py

cognee/modules/retrieval/insights_retriever.py

cognee/eval_framework/answer_generation/answer_generation_executor.py

cognee/api/v1/cognify/routers/get_code_pipeline_router.py

Suggested fixes:

Re-add a try/except EntityNotFoundError around get_triplets in GraphCompletionRetriever.get_context (returning "" on failure), or

Wrap each caller of get_context in its own try/except to handle the now-propagated exceptions.

🤖 Prompt for AI Agents

In cognee/modules/retrieval/graph_completion_retriever.py around lines 77 to 78, the removal of the try-except block around the call to get_triplets causes EntityNotFoundError and DatabaseNotCreatedError exceptions to propagate to all callers of get_context, which currently do not handle these exceptions and may break existing functionality. To fix this, reintroduce a try-except block around the get_triplets call inside get_context that catches EntityNotFoundError and returns an empty string, preserving the original error handling behavior and preventing exceptions from breaking downstream callers.

coderabbitai · 2025-05-19T11:46:44Z

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

+    async def add_node(self, node: DataPoint):
+        serialized_properties = self.serialize_properties(node.model_dump())
+
+        query = """
+        MERGE (node {id: $node_id})
+        ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp()
+        ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp()
+        RETURN ID(node) AS internal_id,node.id AS nodeId
+        """
+
+        params = {
+            "node_id": str(node.id),
+            "node_label": type(node).__name__,
+            "properties": serialized_properties,
+        }


🛠️ Refactor suggestion

⚠️ Potential issue

Cypher label cannot be parameterized – query will fail at runtime

ON CREATE/ON MATCH SET node:$node_label attempts to pass a label through a Cypher parameter.
Cypher (both Neo4j & Memgraph) does not allow labels to be provided as parameters – the query engine interprets $node_label as a literal string, not as a label. The MERGE therefore creates/updates a node with no label at all, or raises a syntax error (depending on server version).

Replace the pattern with explicit string interpolation (after validating / whitelisting), or use APOC procedures if available.

- MERGE (node {id: $node_id}) - ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp() - ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp() +MERGE (node {id: $node_id}) +SET node:`{type(node).__name__}` +SET node += $properties, + node.updated_at = timestamp()

Be sure to validate type(node).__name__ against an allow-list to avoid Cypher-injection.
Same issue exists in add_nodes.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 64 to 78, the Cypher query incorrectly uses a parameter for the node label, which Cypher does not support and causes runtime errors. To fix this, replace the parameterized label with explicit string interpolation of the label after validating it against a whitelist to prevent Cypher injection. Apply the same fix to the add_nodes method to ensure labels are safely and correctly included in the query.

coderabbitai · 2025-05-19T11:46:44Z

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

+            UNWIND $edges AS edge
+            MATCH (a)-[r]->(b)
+            WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name
+            RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists
+        """
+
+        try:
+            params = {
+                "edges": [
+                    {
+                        "from_node": str(edge[0]),
+                        "to_node": str(edge[1]),
+                        "relationship_name": edge[2],
+                    }
+                    for edge in edges
+                ],
+            }


⚠️ Potential issue

Logic error – comparing property IDs to internal IDs

id(a) / id(b) returns the internal graph ID (an integer), yet you pass the string UUIDs as parameters.
This causes every comparison to evaluate to FALSE, so edge_exists will never be true.

Change the MATCH to use the same id property you use everywhere else:

-WHERE id(a) = edge.from_node AND id(b) = edge.to_node +WHERE a.id = edge.from_node AND b.id = edge.to_node

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

UNWIND $edges AS edge

MATCH (a)-[r]->(b)

WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name

RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists

"""

try:

params = {

"edges": [

{

"from_node": str(edge[0]),

"to_node": str(edge[1]),

"relationship_name": edge[2],

}

for edge in edges

],

}

UNWIND $edges AS edge

MATCH (a)-[r]->(b)

- WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name

+ WHERE a.id = edge.from_node AND b.id = edge.to_node AND type(r) = edge.relationship_name

RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists

"""

try:

params = {

"edges": [

{

"from_node": str(edge[0]),

"to_node": str(edge[1]),

"relationship_name": edge[2],

}

for edge in edges

],

}

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 155 to 171, the MATCH clause incorrectly compares internal graph IDs (id(a), id(b)) to string UUIDs passed as parameters, causing the comparison to always fail. Update the MATCH condition to compare the node property 'id' (or the appropriate UUID property) instead of using id(a) and id(b), ensuring the property used matches the UUID strings in the parameters for correct edge existence checking.

coderabbitai · 2025-05-19T11:46:44Z

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

+        query = f"""
+        UNWIND $node_ids AS id
+        MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
+        DELETE r;
+        """


⚠️ Potential issue

Label/string interpolation misuse

MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)

uses {id} inside back-ticks, producing something like node:1234`` which is invalid.
Use property matching instead (or APOC):

-MATCH (node:`{id}`)<-[r:{edge_label}]-(successor) +MATCH (node {id: nid})<-[r:`{edge_label}`]-(successor)

Again, whitelist edge_label.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 414 to 418, the query incorrectly uses string interpolation for the node label with `{id}` inside backticks, resulting in invalid Cypher syntax like `node:`1234``. Instead, modify the query to match nodes by a property, for example using `MATCH (node) WHERE node.id = $id` to safely match nodes by their ID property. Also, ensure that the `edge_label` is whitelisted before interpolation to prevent injection risks.

coderabbitai · 2025-05-19T11:46:44Z

cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

+        query = dedent(
+            f"""\
+            MATCH (from_node {{id: $from_node}}),
+                  (to_node {{id: $to_node}})
+            MERGE (from_node)-[r:{relationship_name}]->(to_node)
+            ON CREATE SET r += $properties, r.updated_at = timestamp()
+            ON MATCH SET r += $properties, r.updated_at = timestamp()
+            RETURN r
+            """


💡 Verification agent

🧩 Analysis chain

Un-sanitised f-string may allow Cypher-injection

MERGE (from_node)-[r:{relationship_name}]->(to_node) interpolates relationship_name directly into the query.
If user input reaches this path, an attacker can inject arbitrary Cypher.

Validate relationship_name against a whitelist (re.fullmatch(r"[A-Z_][A-Z0-9_]*", …)) or use APOC’s apoc.create.relationship.

🏁 Script executed:

#!/bin/bash # Find all occurrences of `relationship_name` to trace its origin and check for sanitization rg -n 'relationship_name' cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

Length of output: 960

🏁 Script executed:

#!/bin/bash # Display lines 150-250 to inspect the function signature and usage of relationship_name sed -n '150,250p' cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

Length of output: 3490

🏁 Script executed:

#!/bin/bash # Search for .add_edge usages rg -n "add_edge\(" -n cognee

Length of output: 1872

Secure relationship_name interpolation in add_edge` to prevent Cypher injection

The current add_edge implementation uses an f-string to inject relationship_name directly into the Cypher pattern, which can be exploited if an attacker controls that value. We need to either strictly validate the name against a whitelist or switch to a parameterized APOC call.

• File cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py, add_edge (lines 188–196)
• Vulnerable line:

MERGE (from_node)-[r:{relationship_name}]->(to_node)

Suggested fixes:

Add a whitelist check before building the query:

+ import re + # ensure only valid relationship names are allowed + if not re.fullmatch(r"[A-Z_][A-Z0-9_]*", relationship_name): + raise ValueError(f"Invalid relationship name: {relationship_name}")

Replace the dynamic MERGE with an APOC call:

- query = dedent( - f"""\ - MATCH (from_node {{id: $from_node}}), - (to_node {{id: $to_node}}) - MERGE (from_node)-[r:{relationship_name}]->(to_node) - ON CREATE SET r += $properties, r.updated_at = timestamp() - ON MATCH SET r += $properties, r.updated_at = timestamp() - RETURN r - """ - ) + query = dedent( + """\ + MATCH (from_node {id: $from_node}), (to_node {id: $to_node}) + CALL apoc.create.relationship( + from_node, + $relationship_name, + $properties, + to_node + ) YIELD rel AS r + SET r.updated_at = timestamp() + RETURN r + """ + )

This ensures relationship_name cannot inject arbitrary Cypher and uses a parameterized APOC procedure instead.

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 188 to 196, the f-string directly interpolates relationship_name into the Cypher query, risking Cypher injection. Fix this by validating relationship_name against a strict whitelist regex pattern like r"[A-Z_][A-Z0-9_]*" before using it in the query, or refactor the query to use the APOC procedure apoc.create.relationship with relationship_name as a parameter to safely create the relationship without direct string interpolation.

coderabbitai · 2025-05-19T11:46:44Z

cognee/api/v1/responses/routers/default_tools.py

+DEFAULT_TOOLS = [
+    {
+        "type": "function",
+        "name": "search",
+        "description": "Search for information within the knowledge graph",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "search_query": {
+                    "type": "string",
+                    "description": "The query to search for in the knowledge graph",
+                },
+                "search_type": {
+                    "type": "string",
+                    "description": "Type of search to perform",
+                    "enum": [
+                        "INSIGHTS",
+                        "CODE",
+                        "GRAPH_COMPLETION",
+                        "SEMANTIC",
+                        "NATURAL_LANGUAGE",
+                    ],
+                },
+                "top_k": {
+                    "type": "integer",
+                    "description": "Maximum number of results to return",
+                    "default": 10,
+                },
+                "datasets": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": "Optional list of dataset names to search within",
+                },
+            },
+            "required": ["search_query"],
+        },
+    },
+    {
+        "type": "function",
+        "name": "cognify",
+        "description": "Convert text into a knowledge graph or process all added content",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "text": {
+                    "type": "string",
+                    "description": "Text content to be converted into a knowledge graph",
+                },
+                "graph_model_name": {
+                    "type": "string",
+                    "description": "Name of the graph model to use",
+                },
+                "graph_model_file": {
+                    "type": "string",
+                    "description": "Path to a custom graph model file",
+                },
+            },
+        },
+    },
+    {
+        "type": "function",
+        "name": "prune",
+        "description": "Remove unnecessary or outdated information from the knowledge graph",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "prune_strategy": {
+                    "type": "string",
+                    "enum": ["light", "moderate", "aggressive"],
+                    "description": "Strategy for pruning the knowledge graph",
+                    "default": "moderate",
+                },
+                "min_confidence": {
+                    "type": "number",
+                    "description": "Minimum confidence score to retain (0-1)",
+                    "minimum": 0,
+                    "maximum": 1,
+                },
+                "older_than": {
+                    "type": "string",
+                    "description": "ISO date string - prune nodes older than this date",
+                },
+            },
+        },
+    },
+]


💡 Verification agent

🧩 Analysis chain

Verify that search_type enum values match the actual SearchType implementation.

The search_type enum values in the tool definition should match the actual implementation in the SearchType enum. The current set of values (INSIGHTS, CODE, GRAPH_COMPLETION, SEMANTIC, NATURAL_LANGUAGE) seems to be missing some values that are in the SearchType enum (e.g., CHUNKS, RAG_COMPLETION, GRAPH_SUMMARY_COMPLETION, CYPHER, SUMMARIES).

🏁 Script executed:

#!/bin/bash # Check if all SearchType enum values are included in the DEFAULT_TOOLS search_type enum # First, get all SearchType values from the enum definition echo "SearchType enum values:" rg -A 10 "class SearchType" --type py | grep -o '"[A-Z_]\+"' | sort # Then compare with what's in the default_tools.py echo -e "\nSearch type enum values in DEFAULT_TOOLS:" rg -A 10 '"search_type":' cognee/api/v1/responses/routers/default_tools.py | grep -o '"[A-Z_]\+"' | sort

Length of output: 529

Align search_type enum in DEFAULT_TOOLS with the actual SearchType values

The search_type enum in cognee/api/v1/responses/routers/default_tools.py currently lists:

INSIGHTS

CODE

GRAPH_COMPLETION

SEMANTIC

NATURAL_LANGUAGE

But your SearchType enum defines:

CHUNKS

CODE

CYPHER

GRAPH_COMPLETION

GRAPH_SUMMARY_COMPLETION

INSIGHTS

NATURAL_LANGUAGE

RAG_COMPLETION

SUMMARIES

And SEMANTIC is not part of SearchType. Please update the tool definition so that search_type.enum matches exactly the SearchType values.

• File: cognee/api/v1/responses/routers/default_tools.py
• Location: inside the parameters → properties → search_type block of the search tool

Suggested replacement for the enum array:

- "enum": [ - "INSIGHTS", - "CODE", - "GRAPH_COMPLETION", - "SEMANTIC", - "NATURAL_LANGUAGE", - ], + "enum": [ + "CHUNKS", + "CODE", + "CYPHER", + "GRAPH_COMPLETION", + "GRAPH_SUMMARY_COMPLETION", + "INSIGHTS", + "NATURAL_LANGUAGE", + "RAG_COMPLETION", + "SUMMARIES", + ],

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

DEFAULT_TOOLS = [

{

"type": "function",

"name": "search",

"description": "Search for information within the knowledge graph",

"parameters": {

"type": "object",

"properties": {

"search_query": {

"type": "string",

"description": "The query to search for in the knowledge graph",

},

"search_type": {

"type": "string",

"description": "Type of search to perform",

"enum": [

"INSIGHTS",

"CODE",

"GRAPH_COMPLETION",

"SEMANTIC",

"NATURAL_LANGUAGE",

],

},

"top_k": {

"type": "integer",

"description": "Maximum number of results to return",

"default": 10,

},

"datasets": {

"type": "array",

"items": {"type": "string"},

"description": "Optional list of dataset names to search within",

},

},

"required": ["search_query"],

},

},

{

"type": "function",

"name": "cognify",

"description": "Convert text into a knowledge graph or process all added content",

"parameters": {

"type": "object",

"properties": {

"text": {

"type": "string",

"description": "Text content to be converted into a knowledge graph",

},

"graph_model_name": {

"type": "string",

"description": "Name of the graph model to use",

},

"graph_model_file": {

"type": "string",

"description": "Path to a custom graph model file",

},

},

},

},

{

"type": "function",

"name": "prune",

"description": "Remove unnecessary or outdated information from the knowledge graph",

"parameters": {

"type": "object",

"properties": {

"prune_strategy": {

"type": "string",

"enum": ["light", "moderate", "aggressive"],

"description": "Strategy for pruning the knowledge graph",

"default": "moderate",

},

"min_confidence": {

"type": "number",

"description": "Minimum confidence score to retain (0-1)",

"minimum": 0,

"maximum": 1,

},

"older_than": {

"type": "string",

"description": "ISO date string - prune nodes older than this date",

},

},

},

},

]

DEFAULT_TOOLS = [

{

"type": "function",

"name": "search",

"description": "Search for information within the knowledge graph",

"parameters": {

"type": "object",

"properties": {

"search_query": {

"type": "string",

"description": "The query to search for in the knowledge graph",

},

"search_type": {

"type": "string",

"description": "Type of search to perform",

"enum": [

"CHUNKS",

"CODE",

"CYPHER",

"GRAPH_COMPLETION",

"GRAPH_SUMMARY_COMPLETION",

"INSIGHTS",

"NATURAL_LANGUAGE",

"RAG_COMPLETION",

"SUMMARIES",

],

},

},

},

},

{

"type": "function",

"name": "cognify",

"description": "Convert text into a knowledge graph or process all added content",

"parameters": {

"type": "object",

"properties": {

"text": {

"type": "string",

"description": "Text content to be converted into a knowledge graph",

},

"graph_model_name": {

"type": "string",

"description": "Name of the graph model to use",

},

"graph_model_file": {

"type": "string",

"description": "Path to a custom graph model file",

},

},

},

},

{

"type": "function",

"name": "prune",

"description": "Remove unnecessary or outdated information from the knowledge graph",

"parameters": {

"type": "object",

"properties": {

"prune_strategy": {

"type": "string",

"enum": ["light", "moderate", "aggressive"],

"description": "Strategy for pruning the knowledge graph",

"default": "moderate",

},

"min_confidence": {

"type": "number",

"description": "Minimum confidence score to retain (0-1)",

"minimum": 0,

"maximum": 1,

},

"older_than": {

"type": "string",

"description": "ISO date string - prune nodes older than this date",

},

},

},

},

]

🤖 Prompt for AI Agents

In cognee/api/v1/responses/routers/default_tools.py within lines 1 to 86, update the enum values for the "search_type" property inside the "search" tool's parameters to exactly match the SearchType enum values defined elsewhere. Replace the current enum list ["INSIGHTS", "CODE", "GRAPH_COMPLETION", "SEMANTIC", "NATURAL_LANGUAGE"] with ["CHUNKS", "CODE", "CYPHER", "GRAPH_COMPLETION", "GRAPH_SUMMARY_COMPLETION", "INSIGHTS", "NATURAL_LANGUAGE", "RAG_COMPLETION", "SUMMARIES"] and remove "SEMANTIC" since it is not part of the SearchType enum.

## Description Adds modal parallel evaluation for retriever development ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

cognee/eval_framework/metrics_dashboard.py (1)
131-170: Docstring is now stale – update it to reflect the new return value

create_dashboard used to return a file path, but it now returns the full HTML string. The docstring still says “Create and save the dashboard with all visualizations.” without mentioning the HTML return, which can mislead downstream users and static-type checkers.
@@
-    """Create and save the dashboard with all visualizations."""
+    """
+    Create the dashboard, write it to `output_file`, and return the
+    HTML string.
+
+    Returns
+    -------
+    str
+        Complete dashboard HTML that was written to disk.
+    """
cognee/eval_framework/eval_config.py (1)
45-47: Document the new instance_filter field

The new optional list is great, but nothing in the class docstring (nor inline comment) explains its purpose or expected values. A brief description will help other contributors understand how to use it and avoid passing malformed data.
-    instance_filter: Optional[List[str]] = None
+#   instance_filter
+#   ---------------
+#   Optional list of instance IDs/names that should be evaluated.  When
+#   provided, only these instances will be sent to the corpus builder.
+    instance_filter: Optional[List[str]] = None

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8f7877 and 5c36a5d.

📒 Files selected for processing (4)

Dockerfile_modal (1 hunks)
cognee/eval_framework/eval_config.py (3 hunks)
cognee/eval_framework/metrics_dashboard.py (1 hunks)
cognee/eval_framework/modal_run_eval.py (3 hunks)

🧰 Additional context used

🪛 Ruff (0.11.9)

cognee/eval_framework/modal_run_eval.py

5-5: hashlib imported but unused

Remove unused import: hashlib

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: End-to-End Tests / Run Telemetry Test
GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
GitHub Check: End-to-End Tests / S3 Bucket Test
GitHub Check: End-to-End Tests / Deletion Test
GitHub Check: End-to-End Tests / Deduplication Test
GitHub Check: End-to-End Tests / Server Start Test
GitHub Check: Basic Tests / Run Unit Tests
GitHub Check: Basic Tests / Run Basic Graph Tests
GitHub Check: Basic Tests / Run Simple Examples
GitHub Check: Publish Cognee Docker image

🔇 Additional comments (3)

Dockerfile_modal (3)

21-21: Good optimization: Consolidated file copying.

Including README.md in the initial COPY command streamlines the Docker build process by reducing the number of layers and eliminating a redundant copy operation that was likely present in the previous version.

25-26: Good practice: Disabled Poetry virtualenvs in Docker.

Disabling Poetry's virtual environment creation is the recommended approach when running in a Docker container, as containers already provide isolation. This eliminates unnecessary nesting of environments and potential path-related issues.

27-27:
Details

❓ Verification inconclusive

Good optimization: Explicit dependency installation.

Specifying only the required extras (neo4j and qdrant) instead of installing all extras makes the Dockerfile more explicit and reduces the image size. The --no-root flag is appropriate since the application code is copied in a later step.

🏁 Script executed:
#!/bin/bash
# Verify that only neo4j and qdrant extras are needed for the modal environment
grep -r "RUN_MODE=modal" cognee/
Length of output: 32

🏁 Script executed:
#!/bin/bash
# Locate Dockerfile_modal anywhere in the repo
echo "Finding Dockerfile_modal paths:"
file_paths=$(fd Dockerfile_modal || true)
echo "$file_paths"

# If found, show context around the install line
if [ -n "$file_paths" ]; then
  for f in $file_paths; do
    echo -e "\n--- Context in $f (lines 1–60) ---"
    sed -n '1,60p' "$f"
  done
else
  echo "No Dockerfile_modal found via fd."
fi

# Search for RUN_MODE=modal usage globally
echo -e "\n--- Searching for RUN_MODE=modal ---"
grep -R "RUN_MODE=modal" -n . || echo "No occurrences of RUN_MODE=modal"

# Search for any 'modal' references to see how the modal environment is triggered
echo -e "\n--- Searching for 'modal' keyword occurrences ---"
grep -R "modal" -n . || echo "No 'modal' references found"
Length of output: 2298

🏁 Script executed:
#!/bin/bash
# 1. Display the extras defined in pyproject.toml
echo "=== tool.poetry.extras in pyproject.toml ==="
sed -n '/^\[tool\.poetry\.extras\]/,/^\[/{p}' pyproject.toml || echo "Error: pyproject.toml not found"

# 2. Search code for direct usage of neo4j and qdrant
echo -e "\n=== Searching for neo4j usage in cognee/ ==="
grep -R "neo4j" -n cognee/ || echo "No neo4j references found"

echo -e "\n=== Searching for qdrant usage in cognee/ ==="
grep -R "qdrant" -n cognee/ || echo "No qdrant references found"

# 3. Gather all extras keys and check for imports of each
echo -e "\n=== Checking for other extras usage ==="
extras=$(sed -n '/^\[tool\.poetry\.extras\]/,/^\[/{/=/p}' pyproject.toml \
         | sed -E 's/^\s*"?([a-zA-Z0-9_-]+)"?\s*=.*/\1/') || extras=""
for ex in $extras; do
  echo -e "\n-- Extra key: $ex"
  grep -R "import $ex" -n cognee/ && continue
  grep -R "$ex\." -n cognee/ || echo "No import matches for $ex"
done
Length of output: 7059

🏁 Script executed:
#!/bin/bash
# Locate pyproject.toml and poetry.lock in the repo
echo "=== Locating pyproject.toml ==="
fd pyproject.toml || echo "pyproject.toml not found"

echo -e "\n=== Locating poetry.lock ==="
fd poetry.lock || echo "poetry.lock not found"

# If pyproject.toml is found, display the '[tool.poetry.extras]' section
for file in $(fd pyproject.toml); do
  echo -e "\n--- Extras in $file ---"
  sed -n '/^\[tool\.poetry\.extras\]/,/^\[/{p}' "$file" || echo "Failed to read extras"
done
Length of output: 203
Verify Poetry extras in Dockerfile_modal
RUN poetry install --extras neo4j --extras qdrant --no-root
Ensure that this command includes all extras defined under [tool.poetry.extras] in your pyproject.toml that are required by the modal environment.

If there are additional extras used by cognee/eval_framework/modal_run_eval.py or other modal‐related code, add them here to avoid missing dependencies.

coderabbitai · 2025-05-20T13:19:43Z

cognee/eval_framework/modal_run_eval.py

+import hashlib
+import json
 from cognee.shared.logging_utils import get_logger


⚠️ Potential issue

Remove unused hashlib import

hashlib is imported but never referenced, and Ruff flags this (F401).

-import hashlib

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import hashlib

import json

from cognee.shared.logging_utils import get_logger

import json

from cognee.shared.logging_utils import get_logger

🧰 Tools

🪛 Ruff (0.11.9)

5-5: hashlib imported but unused

Remove unused import: hashlib

(F401)

🤖 Prompt for AI Agents

In cognee/eval_framework/modal_run_eval.py around lines 5 to 7, the hashlib module is imported but not used anywhere in the code, causing an unused import warning. Remove the line importing hashlib to clean up the code and resolve the F401 warning flagged by Ruff.

coderabbitai · 2025-05-20T13:19:43Z

cognee/eval_framework/modal_run_eval.py

+    if eval_params.get("dashboard"):
+        logger.info("Generating dashboard...")
+        html_output = create_dashboard(
+            metrics_path=eval_params["metrics_path"],
+            aggregate_metrics_path=eval_params["aggregate_metrics_path"],
+            output_file=eval_params["dashboard_path"],
+            benchmark=eval_params["benchmark"],
        )
-        return None

-    return read_and_combine_metrics(eval_params)
+    with open("/data/" + html_filename, "w") as f:
+        f.write(html_output)
+    vol.commit()
+
+    logger.info("Evaluation set finished...")


⚠️ Potential issue

html_output may be undefined when dashboard is False

html_output is created only inside the if eval_params.get("dashboard"): block, but the write/commit logic that uses it is unconditionally executed right after the block.
For runs where dashboard=False, this will raise UnboundLocalError.

- if eval_params.get("dashboard"): - logger.info("Generating dashboard...") - html_output = create_dashboard( - metrics_path=eval_params["metrics_path"], - aggregate_metrics_path=eval_params["aggregate_metrics_path"], - output_file=eval_params["dashboard_path"], - benchmark=eval_params["benchmark"], - ) - - with open("/data/" + html_filename, "w") as f: - f.write(html_output) - vol.commit() + if eval_params.get("dashboard"): + logger.info("Generating dashboard...") + html_output = create_dashboard( + metrics_path=eval_params["metrics_path"], + aggregate_metrics_path=eval_params["aggregate_metrics_path"], + output_file=eval_params["dashboard_path"], + benchmark=eval_params["benchmark"], + ) + + with open("/data/" + html_filename, "w") as f: + f.write(html_output) + vol.commit()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if eval_params.get("dashboard"):

logger.info("Generating dashboard...")

html_output = create_dashboard(

metrics_path=eval_params["metrics_path"],

aggregate_metrics_path=eval_params["aggregate_metrics_path"],

output_file=eval_params["dashboard_path"],

benchmark=eval_params["benchmark"],

)

return None

return read_and_combine_metrics(eval_params)

with open("/data/" + html_filename, "w") as f:

f.write(html_output)

vol.commit()

logger.info("Evaluation set finished...")

if eval_params.get("dashboard"):

logger.info("Generating dashboard...")

html_output = create_dashboard(

metrics_path=eval_params["metrics_path"],

aggregate_metrics_path=eval_params["aggregate_metrics_path"],

output_file=eval_params["dashboard_path"],

benchmark=eval_params["benchmark"],

)

with open("/data/" + html_filename, "w") as f:

f.write(html_output)

vol.commit()

logger.info("Evaluation set finished...")

🤖 Prompt for AI Agents

In cognee/eval_framework/modal_run_eval.py around lines 84 to 97, the variable html_output is only defined inside the if block checking eval_params.get("dashboard"), but it is used unconditionally afterward, causing an UnboundLocalError when dashboard is False. To fix this, move the file writing and vol.commit() calls inside the if block so they only execute when html_output is defined, or add a conditional check before using html_output to ensure it is defined.

## Description  - set the parallel option to None in Fastembed's embedding function ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py (1)
47-47:
Details

❓ Verification inconclusive

Improved parallelism handling for text embeddings.

Changing the parallel parameter from 0 to None likely allows the fastembed library to use its default parallelism strategy rather than explicitly disabling it. This should improve embedding generation performance, especially for larger batches of text, by allowing the library to optimize resource allocation based on the system capabilities.

Consider adding a brief comment explaining this performance optimization choice for future maintainers:
embeddings = self.embedding_model.embed(
    text,
    batch_size=len(text),
-    parallel=None,
+    parallel=None,  # Use default parallelism strategy for optimal performance
)
🏁 Script executed:
#!/bin/bash
# Check fastembed documentation for parallel parameter
rg -A 5 -B 5 "parallel" --glob "*fastembed*" 
Length of output: 193
Manual verification required: Confirm parallel=None behavior in fastembed

I wasn’t able to find any references to the parallel parameter in the codebase or fastembed docs via ripgrep, so please double-check that passing parallel=None indeed defers to fastembed’s default parallelism (rather than disabling it) before relying on it for performance.

Suggested update once confirmed:
embeddings = self.embedding_model.embed(
    text,
    batch_size=len(text),
-    parallel=None,
+    parallel=None,  # Use fastembed’s default parallelism for optimal performance
)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c36a5d and 9d9ea63.

📒 Files selected for processing (1)

cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: End-to-End Tests / S3 Bucket Test
GitHub Check: End-to-End Tests / Deletion Test
GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
GitHub Check: Basic Tests / Run Basic Graph Tests
GitHub Check: End-to-End Tests / Deduplication Test
GitHub Check: Basic Tests / Run Simple Examples
GitHub Check: Basic Tests / Run Integration Tests
GitHub Check: Basic Tests / Run Unit Tests
GitHub Check: Publish Cognee Docker image

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

cognee/shared/logging_utils.py (1)
326-326: Add comment explaining the SQLAlchemy warning suppression change

Changed from suppressing SQLAlchemy warnings only when log_level > WARNING to suppressing them when log_level > DEBUG. Consider adding a comment explaining why SQLAlchemy warnings are now suppressed at INFO and WARNING levels.
-    if log_level > logging.DEBUG:
+    # Suppress SQLAlchemy warnings at INFO level and above to reduce noise in logs
+    # while still showing them at DEBUG level for detailed troubleshooting
+    if log_level > logging.DEBUG:
        import warnings
        from sqlalchemy.exc import SAWarning

🧹 Nitpick comments (2)

cognee/shared/logging_utils.py (1)
7-12: Clean up unused import

The importlib.metadata import is not being used in this file and should be removed.
import os
import sys
import threading
import logging
import structlog
import traceback
import platform
from datetime import datetime
from pathlib import Path
-import importlib.metadata

from cognee import __version__ as cognee_version
🧰 Tools

🪛 Ruff (0.11.9)

10-10: importlib.metadata imported but unused

Remove unused import: importlib.metadata

(F401)
cognee/version.py (1)
7-24: Great implementation of version detection logic

The function provides a robust way to determine the package version in both development and installed environments.

Consider simplifying the nested with statements:
def get_cognee_version() -> str:
    """Returns either the version of installed cognee package or the one
    found in nearby pyproject.toml"""
    with suppress(FileNotFoundError, StopIteration):
-        with open(
-            os.path.join(Path(__file__).parent.parent, "pyproject.toml"), encoding="utf-8"
-        ) as pyproject_toml:
+        with open(os.path.join(Path(__file__).parent.parent, "pyproject.toml"), 
+                 encoding="utf-8") as pyproject_toml:
            version = (
                next(line for line in pyproject_toml if line.startswith("version"))
                .split("=")[1]
                .strip("'\"\n ")
            )
            # Mark the version as a local Cognee library by appending "-dev"
            return f"{version}-dev"
🧰 Tools

🪛 Ruff (0.11.9)

10-13: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d9ea63 and 4c52ef6.

📒 Files selected for processing (3)

cognee/__init__.py (1 hunks)
cognee/shared/logging_utils.py (4 hunks)
cognee/version.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

cognee/__init__.py (1)

cognee/version.py (1)

get_cognee_version (7-24)

🪛 Ruff (0.11.9)

cognee/version.py

10-13: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)

cognee/shared/logging_utils.py

10-10: importlib.metadata imported but unused

Remove unused import: importlib.metadata

(F401)

🔇 Additional comments (3)

cognee/__init__.py (1)

1-5: Good addition of version tracking!

Adding a module-level __version__ variable is a good practice for Python packages. The comment about circular imports is helpful and explains why the version extraction needs to be at the top of the file.

cognee/shared/logging_utils.py (2)

43-48: Good addition of version tracking constants

Adding these version constants enables consistent tracking across logs, which is valuable for debugging and support.

340-351: Excellent enhancement to logging initialization

Adding system and version information to the logs is very valuable for troubleshooting and understanding the environment where issues occur.

## Description Adds dashboard application to parallel modal evals to enable fast retriever development/evaluation ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: lxobr <[email protected]>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

cognee/eval_framework/modal_eval_dashboard.py (5)
73-75: Add protection against division by zero.

While it's unlikely in this context, it's good practice to add protection against division by zero.
-                "avg_EM": round(total_em / num_q, 4),
-                "avg_F1": round(total_f1 / num_q, 4),
-                "avg_correctness": round(total_corr / num_q, 4),
+                "avg_EM": round(total_em / num_q, 4) if num_q > 0 else 0,
+                "avg_F1": round(total_f1 / num_q, 4) if num_q > 0 else 0,
+                "avg_correctness": round(total_corr / num_q, 4) if num_q > 0 else 0,
84-91: Consider adding interactive visualizations for better insights.

The current dashboard shows only tabular data. Consider enhancing it with interactive charts for better visualization of the metrics.

You could add bar charts or line charts to compare metrics across different benchmarks:
import plotly.express as px

# After creating the DataFrame
if not df.empty:
    # Create visualizations
    st.subheader("Metrics Visualization")
    fig = px.bar(
        df, 
        x="file", 
        y=["avg_EM", "avg_F1", "avg_correctness"],
        barmode="group",
        title="Average Metrics by File"
    )
    st.plotly_chart(fig, use_container_width=True)
    
    # Original tabular display
    st.subheader("Results by benchmark")
    # Rest of the code...
11-11: Consider configuring volume size and persistence options.

The current volume configuration doesn't specify size limits or persistence options, which might lead to unexpected behavior in production.
-metrics_volume = modal.Volume.from_name("evaluation_dashboard_results", create_if_missing=True)
+# Configure volume with appropriate size and persistence settings
+metrics_volume = modal.Volume.from_name(
+    "evaluation_dashboard_results",
+    create_if_missing=True,
+    size_mb=1024,  # Adjust based on expected data volume
+    persistent=True  # Ensure data persists between app deployments
+)
Note: Verify the Modal API documentation for the correct parameters as they may have changed since my last update.

41-41: Use subprocess.run instead of Popen for better process management.

Using subprocess.Popen without managing the process can lead to zombie processes if the parent process exits unexpectedly.
-    subprocess.Popen(cmd, shell=True)
+    # Using run() with shell=True is sufficient for this use case
+    # since Modal will manage the container lifecycle
+    subprocess.run(cmd, shell=True)
1-7: Organize imports according to PEP 8 conventions.

The imports could be organized more clearly by grouping them into standard library imports, third-party imports, and local imports.
-import os
-import json
-import pandas as pd
-import subprocess
-import modal
-import streamlit as st
+# Standard library imports
+import json
+import os
+import subprocess
+
+# Third-party imports
+import modal
+import pandas as pd
+import streamlit as st

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c52ef6 and 7eee769.

📒 Files selected for processing (1)

cognee/eval_framework/modal_eval_dashboard.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
GitHub Check: End-to-End Tests / Deduplication Test
GitHub Check: Basic Tests / Run Unit Tests
GitHub Check: Publish Cognee Docker image

🔇 Additional comments (1)

cognee/eval_framework/modal_eval_dashboard.py (1)

36-40:
Details

❓ Verification inconclusive

Security concern: CORS and XSRF protection disabled.

Your Streamlit configuration disables both CORS and XSRF protection, which could expose the dashboard to cross-site attacks. While this might be convenient for development, it's not recommended for production environments.

Consider enabling these protections if the dashboard will be accessible outside a secure network:

🏁 Script executed:
#!/bin/bash
# Check if this is a development or staging environment
grep -r "development\|staging" . --include="*.py" --include="*.yaml" --include="*.json" | head -n 10
Length of output: 567
Security hardening: re-enable CORS and XSRF protection for production

Disabling both CORS and XSRF protection:
streamlit run /root/serve_dashboard.py \
  --server.port 8000 \
  --server.enableCORS=false \
  --server.enableXsrfProtection=false
may be fine for local development, but if this dashboard is ever exposed beyond a locked-down network, you’ll want to turn these protections back on.

• Confirm whether this service is strictly for local/dev use only or if it will run in staging/production.
• If it must remain open for external users, gate these flags behind an environment check (e.g. ENV != "production").
• Example pattern:
-  streamlit run /root/serve_dashboard.py \
-    --server.enableCORS=false \
-    --server.enableXsrfProtection=false
+  if [ "$ENV" = "production" ]; then
+    streamlit run /root/serve_dashboard.py \
+      --server.enableCORS=true \
+      --server.enableXsrfProtection=true
+  else
+    streamlit run /root/serve_dashboard.py \
+      --server.enableCORS=false \
+      --server.enableXsrfProtection=false
+  fi
Ensure you document and enforce the correct ENV value in your deployment pipeline.

coderabbitai · 2025-05-21T07:10:26Z

cognee/eval_framework/modal_eval_dashboard.py

+        total_corr = sum(q["metrics"]["correctness"]["score"] for q in items)
+        records.append(
+            {
+                "file": parts[0].upper() + "_____" + parts[2],


⚠️ Potential issue

Prevent potential IndexError in filename parsing.

If parts has fewer than 3 elements, accessing parts[2] would raise an IndexError.

- "file": parts[0].upper() + "_____" + parts[2], + "file": parts[0].upper() + "_____" + (parts[2] if len(parts) > 2 else ""),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"file": parts[0].upper() + "_____" + parts[2],

- "file": parts[0].upper() + "_____" + parts[2],

+ "file": parts[0].upper() + "_____" + (parts[2] if len(parts) > 2 else ""),

🤖 Prompt for AI Agents

In cognee/eval_framework/modal_eval_dashboard.py at line 70, the code accesses parts[2] without checking if parts has at least 3 elements, which can cause an IndexError. Add a condition to verify the length of parts before accessing parts[2], and handle cases where parts has fewer than 3 elements safely, such as by providing a default value or skipping the operation.

coderabbitai · 2025-05-21T07:10:26Z

cognee/eval_framework/modal_eval_dashboard.py

+        total_em = sum(q["metrics"]["EM"]["score"] for q in items)
+        total_f1 = sum(q["metrics"]["f1"]["score"] for q in items)
+        total_corr = sum(q["metrics"]["correctness"]["score"] for q in items)


🛠️ Refactor suggestion

Add exception handling for JSON structure inconsistencies.

The code assumes a specific structure in the JSON files and would raise errors if the expected fields are missing.

- total_em = sum(q["metrics"]["EM"]["score"] for q in items) - total_f1 = sum(q["metrics"]["f1"]["score"] for q in items) - total_corr = sum(q["metrics"]["correctness"]["score"] for q in items) + try: + total_em = sum(q.get("metrics", {}).get("EM", {}).get("score", 0) for q in items) + total_f1 = sum(q.get("metrics", {}).get("f1", {}).get("score", 0) for q in items) + total_corr = sum(q.get("metrics", {}).get("correctness", {}).get("score", 0) for q in items) + except Exception as e: + st.warning(f"Error processing metrics in {filename}: {str(e)}") + continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

total_em = sum(q["metrics"]["EM"]["score"] for q in items)

total_f1 = sum(q["metrics"]["f1"]["score"] for q in items)

total_corr = sum(q["metrics"]["correctness"]["score"] for q in items)

try:

total_em = sum(q.get("metrics", {}).get("EM", {}).get("score", 0) for q in items)

total_f1 = sum(q.get("metrics", {}).get("f1", {}).get("score", 0) for q in items)

total_corr = sum(q.get("metrics", {}).get("correctness", {}).get("score", 0) for q in items)

except Exception as e:

st.warning(f"Error processing metrics in {filename}: {str(e)}")

continue

🤖 Prompt for AI Agents

In cognee/eval_framework/modal_eval_dashboard.py around lines 65 to 67, the code sums scores assuming all JSON items have the expected nested keys, which can cause errors if keys are missing. Add exception handling such as try-except blocks or use dict.get() with default values to safely access nested keys and handle missing fields gracefully without raising exceptions.

coderabbitai · 2025-05-21T07:10:26Z

cognee/eval_framework/modal_eval_dashboard.py

+        if not filename.endswith(".json"):
+            continue
+        base = filename.rsplit(".", 1)[0]
+        parts = base.split("_")
+        benchmark = parts[1] if len(parts) >= 3 else ""


🛠️ Refactor suggestion

Add robust error handling for filename parsing.

The current filename parsing logic assumes a specific format with underscores as separators. This could be fragile if filenames don't follow the expected pattern.

- base = filename.rsplit(".", 1)[0] - parts = base.split("_") - benchmark = parts[1] if len(parts) >= 3 else "" + try: + base = filename.rsplit(".", 1)[0] + parts = base.split("_") + benchmark = parts[1] if len(parts) >= 3 else "" + except (IndexError, ValueError) as e: + st.warning(f"Skipping file {filename} due to unexpected format: {str(e)}") + continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not filename.endswith(".json"):

continue

base = filename.rsplit(".", 1)[0]

parts = base.split("_")

benchmark = parts[1] if len(parts) >= 3 else ""

if not filename.endswith(".json"):

continue

try:

base = filename.rsplit(".", 1)[0]

parts = base.split("_")

benchmark = parts[1] if len(parts) >= 3 else ""

except (IndexError, ValueError) as e:

st.warning(f"Skipping file {filename} due to unexpected format: {str(e)}")

continue

🤖 Prompt for AI Agents

In cognee/eval_framework/modal_eval_dashboard.py around lines 55 to 59, the filename parsing assumes filenames have at least three underscore-separated parts, which can cause errors if the format is unexpected. Add robust error handling by checking the length of parts before accessing parts[1], and handle cases where the filename format is invalid, such as by skipping those files or logging a warning, to prevent crashes or incorrect behavior.

## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Vasilije1990 and others added 30 commits April 18, 2025 16:31

merge

8afbecf

Merge branch 'dev' of github.com:topoteretes/cognee into dev

7bdb2ab

Merge branch 'dev' of github.com:topoteretes/cognee into dev

b35e047

Merge branch 'dev' of github.com:topoteretes/cognee into dev

2a485f9

Merge branch 'dev' of github.com:topoteretes/cognee into dev

f072e8d

Merge remote-tracking branch 'origin/main' into dev

17a77c5

Merge branch 'dev' of github.com:topoteretes/cognee into dev

80e5edc

Merge branch 'dev' of github.com:topoteretes/cognee into dev

0a9e1a4

Merge remote-tracking branch 'origin/main' into dev

79921f8

merged

d417c71

borisarzentar and others added 16 commits May 15, 2025 10:05

Merge branch 'dev' of github.com:topoteretes/cognee into dev

badd73c

Clean up core cognee repo

c058219

Revert "Clean up core cognee repo"

729cb9b

This reverts commit c058219.

Merge branch 'main' into dev

7ac5761

Fix: Fixes graph completion search limit (#839)

f8f7877

 ## Description Fixes graph completion limit ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

coderabbitai bot reviewed May 19, 2025

View reviewed changes

coderabbitai bot reviewed May 20, 2025

View reviewed changes

coderabbitai bot reviewed May 21, 2025

View reviewed changes

Vasilije1990 closed this May 21, 2025

	"file": parts[0].upper() + "_____" + parts[2],
	- "file": parts[0].upper() + "_____" + parts[2],
	+ "file": parts[0].upper() + "_____" + (parts[2] if len(parts) > 2 else ""),

-        total_em = sum(q["metrics"]["EM"]["score"] for q in items)
-        total_f1 = sum(q["metrics"]["f1"]["score"] for q in items)
-        total_corr = sum(q["metrics"]["correctness"]["score"] for q in items)
+        try:
+            total_em = sum(q.get("metrics", {}).get("EM", {}).get("score", 0) for q in items)
+            total_f1 = sum(q.get("metrics", {}).get("f1", {}).get("score", 0) for q in items)
+            total_corr = sum(q.get("metrics", {}).get("correctness", {}).get("score", 0) for q in items)
+        except Exception as e:
+            st.warning(f"Error processing metrics in {filename}: {str(e)}")
+            continue

feat: fixes and updates to MCP, retrievers, general fixes #840

feat: fixes and updates to MCP, retrievers, general fixes #840

Uh oh!

Conversation

Vasilije1990 commented May 19, 2025

Description

DCO Affirmation

Uh oh!

pull-checklist bot commented May 19, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

gitguardian bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented May 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

gitguardian bot commented May 19, 2025 •

edited

Loading