Skip to content

Conversation

@borisarzentar
Copy link
Member

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Vasilije1990 and others added 30 commits April 18, 2025 16:31
<!-- .github/pull_request_template.md -->

## Description
Resolve issue with .venv being broken when using docker compose with
Cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris Arzentar <[email protected]>
… 1947 (#760)

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <[email protected]>
Co-authored-by: Igor Ilic <[email protected]>
Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
Add support for UV and for Poetry package management

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Switch typing from str to UUID for NetworkX node_id

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Add both sse and stdio support for Cognee MCP

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
…83] (#782)

<!-- .github/pull_request_template.md -->

## Description
Add log handling options for cognee exceptions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Fix issue with failing versions gh actions

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Hande <[email protected]>
Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Hande <[email protected]>
Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
This PR adds support for the Memgraph graph database following the
[graph database integration
guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration):
- Implemented `MemgraphAdapter` for interfacing with Memgraph.
- Updated `get_graph_engine.py` to return MemgraphAdapter when
appropriate.
- Added a test script:` test_memgraph.py.`
- Created a dedicated test workflow:
`.github/workflows/test_memgraph.yml.`

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Vasilije <[email protected]>
Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
refactor: Handle boto3 s3fs dependencies better

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Update LanceDB and rewrite data points to run async

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <[email protected]>
Co-authored-by: Boris Arzentar <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
As discussed with @hande-k and Lazar, I've created a short demo to
illustrate how to get the pagerank rankings from the knowledge graph
given the nx engine. This is a POC, and a first of step towards solving
#643 .

Please let me know what you think, and how to proceed from here. :)

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Boris <[email protected]>
Co-authored-by: Hande <[email protected]>
Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
Added tools to check current cognify and codify status

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
Vasilije1990 and others added 23 commits May 19, 2025 13:16
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
…exist case

<!-- .github/pull_request_template.md -->

## Description
Fixes pipeline run status migration

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Fixes graph completion limit

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Adds modal parallel evaluation for retriever development

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- set the parallel option to None in Fastembed's embedding function

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <[email protected]>
Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
Adds dashboard application to parallel modal evals to enable fast
retriever development/evaluation

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: lxobr <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Removes hardcoded user prompts from adapters

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: lxobr <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
Adds chain of thought retriever

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Adds context extension search

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
Add info about installing Cognee locally

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Adds subgraph retriever to graph based completion searches

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Removes ontology resolver initialization at import.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Removes graph metrics calculation from dynamic steps and ontology demos

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
Removes unused properies from node and edge pydantic models.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
@borisarzentar borisarzentar self-assigned this May 30, 2025
@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 30, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces major enhancements across the codebase, including support for new graph and vector database providers, expanded retriever and search functionality with node type and name filtering, new retriever classes, OpenAI-compatible API endpoints, improved pipeline execution with context propagation, and comprehensive documentation. Numerous bug fixes, test additions, and code refactoring are also included.

Changes

File(s) / Path(s) Change Summary
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (new), supported_databases.py, use_graph_adapter.py, cognee/infrastructure/databases/graph/get_graph_engine.py, neo4j_driver/adapter.py, kuzu/adapter.py, networkx/adapter.py Added MemgraphAdapter; introduced adapter registration mechanism for graph DBs; updated graph engine creation to support dynamic adapters; added get_nodeset_subgraph method and extensive docstrings.
cognee/infrastructure/databases/vector/supported_databases.py (new), use_vector_adapter.py, create_vector_engine.py, chromadb/ChromaDBAdapter.py, lancedb/LanceDBAdapter.py, milvus/MilvusAdapter.py, pgvector/PGVectorAdapter.py, qdrant/QDrantAdapter.py, vector_db_interface.py, weaviate_db/WeaviateAdapter.py Added vector DB adapter registration; updated vector engine creation for dynamic adapter support; unified search API; improved error handling; removed deprecated methods; added comprehensive docstrings.
cognee/modules/retrieval/graph_completion_cot_retriever.py, graph_completion_context_extension_retriever.py (new), graph_completion_retriever.py, graph_summary_completion_retriever.py, search/methods/search.py, search/types/SearchType.py Added new retriever classes for CoT and context extension; integrated node type/name filtering in retrievers and search; extended SearchType enum; updated search functions and retriever constructors.
cognee/modules/graph/cognee_graph/CogneeGraph.py, modules/retrieval/utils/brute_force_triplet_search.py Enhanced graph projection and triplet search with node type/name filtering; updated vector search integration to use search with limit=0; improved error handling.
cognee/modules/engine/models/ColumnValue.py (new), init.py, tasks/ingestion/migrate_relational_database.py Added ColumnValue entity for representing column values as nodes; updated relational migration to create these nodes; updated module exports.
cognee/modules/pipelines/operations/run_tasks_base.py, run_tasks.py, pipeline.py, operations/log_pipeline_run_initiated.py, operations/get_pipeline_status.py, operations/init.py Extended pipeline and task execution functions to propagate an optional context dictionary; added pipeline run initiation logging; enhanced pipeline status filtering by pipeline name; reordered imports.
cognee/api/v1/responses/init.py, default_tools.py, dispatch_function.py, models.py, routers/init.py, routers/default_tools.py, routers/get_responses_router.py, api/client.py Introduced OpenAI-compatible responses API, including models, routers, default tools, and dispatch logic; added API router to main client.
cognee/modules/observability/observers.py (new), get_observe.py (new), base_config.py Refactored observability/monitoring tool selection using Observer enum and dynamic decorator import; updated default monitoring tool.
cognee/infrastructure/llm/openai/adapter.py, anthropic/adapter.py, gemini/adapter.py, generic_llm_api/adapter.py, ollama/adapter.py, tokenizer/*, llm_interface.py, llm_rate_limiter.py, llm/utils.py, llm/config.py Added/expanded docstrings for LLM adapters, tokenizers, and utility functions; improved error handling and observability integration; switched to async clients where applicable; removed prompt prefixes from user messages.
cognee/modules/data/methods/create_dataset.py, get_unique_dataset_id.py, modules/data/methods/init.py Refactored dataset creation to use user object and unique ID generation function.
cognee/modules/pipelines/models/PipelineRun.py, modules/pipelines/operations/log_pipeline_run_initiated.py Added new pipeline run status and logging function; updated pipeline status tracking and queries.
cognee/modules/visualization/cognee_network_visualization.py Added color mapping for new node type ColumnValue.
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py, graph_completion_retriever_context_extension_test.py, test_memgraph.py, test_kuzu.py, test_neo4j.py, test_weaviate.py, unit/modules/pipelines/run_tasks_with_context_test.py, unit/modules/pipelines/run_tasks_test.py, unit/modules/retrieval/chunks_retriever_test.py, unit/modules/retrieval/graph_completion_retriever_test.py Added and updated integration and unit tests for new DB support, retriever features, and pipeline context propagation; updated test invocation style and paths.
cognee/tasks/ingestion/migrate_relational_database.py Added logic to create ColumnValue nodes for table rows during relational DB migration.
cognee/shared/data_models.py Removed MonitoringTool enum and properties fields from Node/Edge classes.
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx, CognifyStep/CognifyStep.tsx, WizardPage.tsx, src/modules/datasets/cognifyDataset.ts, exploration/getExplorationGraphUrl.ts, src/ui/Partials/Explorer/Explorer.tsx Updated dataset prop types from id to name; improved dataset handling in frontend modules and functions.
cognee/api/v1/search/search.py, modules/search/methods/search.py Added node_type and node_name parameters to search functions for advanced filtering.
cognee/infrastructure/data/chunking/, files/storage/, tasks/chunks/chunk_by_sentence.py, chunk_by_paragraph.py, chunk_by_word.py Added comprehensive docstrings to chunking, file, and storage modules; expanded and clarified chunking function docstrings.
cognee/infrastructure/databases/graph/graph_db_interface.py, neo4j_driver/neo4j_metrics_utils.py, kuzu/adapter.py, memgraph/memgraph_adapter.py, networkx/adapter.py Added detailed docstrings and new get_nodeset_subgraph method to graph adapters; improved subgraph retrieval and type safety; added Memgraph adapter implementation.
cognee/infrastructure/databases/vector/embeddings/*, vector_db_interface.py, vector/utils.py Added/expanded docstrings for vector DB models, embeddings, and utility functions.
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py Removed deprecated get_distance_from_collection_elements; improved search method; added docstrings.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py Removed deprecated get_distance_from_collection_elements; centralized collection retrieval; improved async handling.
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py Removed duplicate search method; added error handling for collection missing; improved docstrings.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py Refactored to fully async client usage; added batch search; improved error handling and docstrings.
cognee/infrastructure/databases/vector/exceptions/exceptions.py Extended CollectionNotFoundError constructor with logging control parameters.
cognee/infrastructure/databases/relational/SQLAlchemyAdapter.py, create_relational_engine.py, get_migration_relational_engine.py, get_relational_engine.py Added comprehensive docstrings to relational DB adapter and engine creation functions.
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py Added comprehensive docstrings to FalkorDB adapter and methods.
cognee/infrastructure/files/storage/LocalStorage.py, StorageManager.py, add_file_to_storage.py, remove_file_from_storage.py Added detailed docstrings to local storage management classes and functions.
cognee/infrastructure/files/utils/extract_text_from_file.py, guess_file_type.py, is_text_content.py, get_file_metadata.py Added detailed docstrings to file utility functions and classes.
cognee/infrastructure/engine/models/DataPoint.py, ExtendableDataPoint.py Added detailed docstrings to DataPoint and ExtendableDataPoint classes and methods.
cognee/infrastructure/engine/utils/parse_id.py Added docstring to parse_id function.
cognee/infrastructure/entities/BaseEntityExtractor.py Expanded docstrings for BaseEntityExtractor and extract_entities method.
cognee/infrastructure/llm/gemini/adapter.py, anthropic/adapter.py, generic_llm_api/adapter.py, ollama/adapter.py, openai/adapter.py Added detailed docstrings; improved observability integration; switched to async clients; removed prompt prefixes.
cognee/infrastructure/llm/tokenizer/* Added detailed docstrings to tokenizer adapters and interface.
cognee/infrastructure/llm/prompts/* Added new prompt templates for chain-of-thought validation and follow-up questions.
cognee/infrastructure/llm/rate_limiter.py, embedding_rate_limiter.py Added detailed docstrings and explanations for rate limiting and retry decorators.
cognee/infrastructure/llm/utils.py Added detailed docstrings to LLM utility functions.
cognee/modules/retrieval/* Added detailed docstrings to retriever classes and methods; added new retrievers; updated exception handling.
cognee/modules/search/types/SearchType.py Added new search types GRAPH_COMPLETION_COT and GRAPH_COMPLETION_CONTEXT_EXTENSION.
cognee/modules/settings/get_settings.py Made LLMConfig fields endpoint and api_version optional.
cognee/modules/graph/utils/expand_with_nodes_and_edges.py Deferred OntologyResolver instantiation; added belongs_to_set attribute to entities.
cognee/modules/observability/get_observe.py, observers.py Added Observer enum and dynamic observe decorator import.
cognee/modules/pipelines/models/PipelineRun.py Added new pipeline run status DATASET_PROCESSING_INITIATED.
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py Added function to log pipeline run initiation.
cognee/modules/pipelines/operations/get_pipeline_status.py Added pipeline_name filter to pipeline status queries.
cognee/modules/pipelines/operations/run_tasks.py, run_tasks_base.py Added support for context parameter propagation in task execution.
cognee/modules/retrieval/graph_completion_context_extension_retriever.py, graph_completion_cot_retriever.py Added new retriever classes implementing context extension and chain-of-thought completion.
cognee/modules/retrieval/graph_completion_retriever.py Added node_type and node_name parameters; improved docstrings and error handling.
cognee/modules/retrieval/graph_summary_completion_retriever.py Added node_type and node_name parameters; enhanced docstrings.
cognee/modules/retrieval/utils/brute_force_triplet_search.py Added node_type and node_name filtering; improved error handling for missing collections.
cognee/modules/search/methods/search.py Added node_type and node_name parameters; integrated new retriever classes.
cognee/modules/engine/models/init.py Added export for new ColumnValue entity.
cognee/modules/engine/models/EntityType.py Added class-level docstring.
cognee/modules/engine/models/node_set.py Removed metadata index_fields from NodeSet class.
cognee/modules/graph/cognee_graph/CogneeGraph.py Added optional node_type and node_name parameters to project_graph_from_db; updated vector engine call.
cognee/modules/data/methods/get_unique_dataset_id.py Added function to generate unique dataset UUID based on name and user.
cognee/modules/data/methods/create_dataset.py Updated to use User object and unique dataset ID generation.
cognee/modules/retrieval/exceptions/init.py Removed import of CollectionDistancesNotFoundError.
cognee/modules/retrieval/exceptions/exceptions.py Removed CollectionDistancesNotFoundError class.
cognee/tasks/chunks/chunk_by_paragraph.py, chunk_by_sentence.py, chunk_by_word.py Expanded and clarified chunking function docstrings.
cognee/tasks/code/enrich_dependency_graph_checker.py, get_repo_dependency_graph_checker.py Added docstrings to main functions.
cognee/tasks/documents/classify_documents.py Expanded docstrings for classification functions.
cognee/tasks/graph/infer_data_ontology.py, models.py Added detailed docstrings for ontology extraction and graph models.
cognee/tasks/ingestion/get_dlt_destination.py Expanded docstring for destination retrieval function.
cognee/tasks/ingestion/ingest_data.py Updated call to create_dataset to pass User object.
cognee/tasks/ingestion/transform_data.py Added docstring to get_data_from_llama_index function.
cognee/tasks/repo_processor/get_local_dependencies.py, get_non_code_files.py, get_repo_file_dependencies.py Added detailed docstrings to classes and functions.
cognee/tasks/summarization/mock_summary.py, models.py, summarize_text.py Added detailed docstrings to summarization classes and functions.
cognee/tasks/temporal_awareness/graphiti_model.py, index_graphiti_objects.py Added docstrings and fixed graph data iteration.
cognee/shared/logging_utils.py Added environment and version info logging at setup; adjusted warning filter level.
cognee/tests/integration/run_toy_tasks/conftest.py (deleted) Removed test fixture copying database for integration tests.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant API
  participant ResponsesRouter
  participant OpenAI
  participant ToolDispatcher
  participant Retriever
  participant DB

  User->>API: POST /api/v1/responses (input, tools, ...)
  API->>ResponsesRouter: create_response(request)
  ResponsesRouter->>OpenAI: Call responses API (input, tools)
  OpenAI-->>ResponsesRouter: Returns function_call(s)
  loop For each function_call
    ResponsesRouter->>ToolDispatcher: dispatch_function(tool_call)
    alt search
      ToolDispatcher->>Retriever: handle_search(arguments, user)
      Retriever->>DB: search/query (with node_type/node_name)
      DB-->>Retriever: Results
      Retriever-->>ToolDispatcher: Search results
    else cognify/prune
      ToolDispatcher->>DB: handle_cognify/prune(arguments, user)
      DB-->>ToolDispatcher: Status/result
    end
    ToolDispatcher-->>ResponsesRouter: ToolCallOutput
  end
  ResponsesRouter-->>API: ResponseBody (tool_calls, usage, status)
  API-->>User: JSON response
Loading

Possibly related PRs

  • topoteretes/cognee#766: Refactors vector database adapters by removing get_distance_from_collection_elements and updating search API, directly related to the main PR’s changes in vector engine integration and search method usage.
  • topoteretes/cognee#788: Adds a context parameter to pipeline task execution functions, matching the main PR’s extension of context propagation through pipeline and task execution.
  • topoteretes/cognee#589: Enhances triplet search and context providers with memory projections and entity-based filtering, directly related to the main PR’s introduction of node type and name filtering in retrievers and search logic.

Suggested labels

run-checks

Poem

Oh, what a leap in the Cognee warren,
New graphs and vectors, adapters are darin’!
Retrievers now search by type and by name,
Pipelines pass context, no two runs the same.
🐇 With OpenAI routes and Memgraph in tow,
This bunny’s code garden continues to grow! 🌱

— Your ever-curious CodeRabbit


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gitguardian
Copy link

gitguardian bot commented May 30, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
17116131 Triggered Generic Password 3b07f3c examples/database_examples/neo4j_example.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

🛑 Comments failed to post (70)
cognee/infrastructure/llm/prompts/render_prompt.py (1)

6-11: 🛠️ Refactor suggestion

⚠️ Potential issue

Docstring mismatch: function is synchronous, not asynchronous
The summary states “Render a Jinja2 template asynchronously,” but the implementation is purely synchronous. Please update the docstring to reflect the correct behavior.

Proposed diff:

-    """
-    Render a Jinja2 template asynchronously.
+    """
+    Render a Jinja2 template synchronously.
     Set the base directory if not provided, initialize the Jinja2 environment,
     load the specified template, and render it using the provided context.


<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
    """
    Render a Jinja2 template synchronously.

    Set the base directory if not provided, initialize the Jinja2 environment,
    load the specified template, and render it using the provided context.
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/prompts/render_prompt.py around lines 6 to 11, the
docstring incorrectly states that the function renders a Jinja2 template
asynchronously, but the implementation is synchronous. Update the docstring
summary to say "Render a Jinja2 template synchronously" to accurately describe
the function's behavior.
cognee/infrastructure/llm/tokenizer/tokenizer_interface.py (1)

12-21: 🛠️ Refactor suggestion

Improve docstring completeness by adding return type documentation.

The docstring is well-structured but missing the return type specification. Based on the relevant code snippets from the adapters, this method returns different types depending on the implementation (token IDs for TikToken, token strings for HuggingFace).

Apply this diff to add the missing return type documentation:

 def extract_tokens(self, text: str) -> List[Any]:
     """
     Extract tokens from the given text.

     Parameters:
     -----------

         - text (str): The input text from which to extract tokens.
+
+    Returns:
+    --------
+
+        - List[Any]: A list of tokens extracted from the input text. The specific 
+          type of tokens depends on the implementation (e.g., token IDs or strings).
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def extract_tokens(self, text: str) -> List[Any]:
        """
        Extract tokens from the given text.

        Parameters:
        -----------

            - text (str): The input text from which to extract tokens.

        Returns:
        --------

            - List[Any]: A list of tokens extracted from the input text. The specific 
              type of tokens depends on the implementation (e.g., token IDs or strings).
        """
        raise NotImplementedError
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/tokenizer_interface.py around lines 12 to
21, the extract_tokens method's docstring lacks a return type description. Add a
Returns section specifying that the method returns a list of tokens, noting that
the exact type may vary by implementation (e.g., token IDs or token strings).
This will complete the docstring and clarify the expected output.
Dockerfile_modal (1)

21-21: 💡 Verification agent

🧩 Analysis chain

Verify README.md necessity in Docker build.

Including README.md in the initial copy step is efficient if it's required for the build process. Please ensure that README.md is actually needed during the Docker build.


🏁 Script executed:

#!/bin/bash
# Check if README.md is referenced in pyproject.toml or used during Poetry installation
echo "Checking if README.md is referenced in pyproject.toml..."
grep -i "readme" pyproject.toml || echo "No readme references found in pyproject.toml"

echo -e "\nChecking for README references in Poetry-related files..."
rg -i "readme" --type toml

Length of output: 459


Let’s locate the actual file paths and check all references before concluding.


🏁 Script executed:

#!/bin/bash
set -eux

# Locate project files
echo "Locating pyproject.toml..."
fd --hidden --max-depth 4 --glob pyproject.toml

echo "Locating Dockerfile_modal..."
fd --hidden --max-depth 4 --glob Dockerfile_modal

# Use the first matches
PY_PATH=$(fd --hidden --max-depth 4 --glob pyproject.toml | head -n1)
DOCKER_PATH=$(fd --hidden --max-depth 4 --glob Dockerfile_modal | head -n1)

echo "pyproject.toml found at: $PY_PATH"
echo "Dockerfile_modal found at: $DOCKER_PATH"

# Check for README references in pyproject.toml
echo "Searching $PY_PATH for README:"
grep -i "readme" "$PY_PATH" || echo "No README references in $PY_PATH"

# List all COPY lines in Dockerfile_modal
echo "Listing COPY instructions in $DOCKER_PATH:"
grep -n "COPY " "$DOCKER_PATH"

# Show context around README.md in Dockerfile_modal
echo "Context around README.md usage:"
grep -n -C3 "README.md" "$DOCKER_PATH" || echo "No further README.md usage found"

Length of output: 1195


Let’s locate the actual pyproject.toml and inspect it for any readme metadata.


🏁 Script executed:

#!/bin/bash
set -eux

# Search for pyproject.toml anywhere in the repo
echo "Searching for pyproject.toml in the repository..."
find . -type f -iname 'pyproject.toml' -print || echo "No pyproject.toml found"

# If found, check for a 'readme' field
echo -e "\nChecking each pyproject.toml for a readme entry:"
find . -type f -iname 'pyproject.toml' -exec grep -H -i '^readme' {} \; || echo "No 'readme' metadata entries found"

Length of output: 353


Remove unnecessary README.md from Dockerfile_modal COPY

The Dockerfile_modal is copying README.md (alongside pyproject.toml and poetry.lock) on line 21, but there’s no readme field in any pyproject.toml and no evidence that README.md is used during the build. Keeping it will bloat your image and may even break the build if the file isn’t present in the context.

• File: Dockerfile_modal
Line 21:

- COPY pyproject.toml poetry.lock README.md /app/
+ COPY pyproject.toml poetry.lock /app/

If you do need README.md at build time (for example, for packaging or documentation), please add a comment in the Dockerfile explaining its usage or adjust its path accordingly.

🤖 Prompt for AI Agents
In Dockerfile_modal at line 21, the COPY instruction includes README.md, but
there is no indication that README.md is required during the build process or
referenced in pyproject.toml. To fix this, remove README.md from the COPY
command to avoid unnecessary image bloat and potential build errors if the file
is missing. If README.md is actually needed, add a comment explaining its
purpose or adjust the path accordingly.
Dockerfile (1)

32-44: 💡 Verification agent

🧩 Analysis chain

Verify compatibility with Python 3.12 upgrade and uv dependency resolution.

The change from Poetry to uv and Python 3.11 to 3.12 represents significant infrastructure changes that could affect:

  • Package compatibility with Python 3.12
  • Dependency resolution differences between Poetry and uv
  • Lock file format changes

Run the following script to verify the compatibility:


🏁 Script executed:

#!/bin/bash
# Description: Check for Python 3.12 compatibility issues and verify uv.lock exists

# Check if uv.lock exists and is valid
if [ -f "uv.lock" ]; then
    echo "✓ uv.lock file exists"
else
    echo "✗ uv.lock file missing - required for uv sync"
fi

# Check for Python version constraints in pyproject.toml
rg -A 5 -B 5 "python.*=" pyproject.toml

# Look for any Python 3.11 specific code or comments
rg -i "python.*3\.11|py311" --type py

Length of output: 355


Fix missing uv.lock and update Python version references

The Dockerfile’s uv sync steps require a committed uv.lock, and there’s still a hard-coded Python 3.11 reference in the modal dashboard. Please:

  • Add uv.lock to the repository root so that
    RUN uv sync … --frozen can resolve dependencies reproducibly.
  • Update cognee/eval_framework/modal_eval_dashboard.py:
    - modal.Image.debian_slim(python_version="3.11")
    + modal.Image.debian_slim(python_version="3.12")
  • Verify and update your project’s Python version constraint (e.g. in pyproject.toml or equivalent) to allow 3.12.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In Dockerfile lines 32 to 44, the uv sync commands require a uv.lock file to
ensure reproducible dependency resolution, but uv.lock is missing from the
repository. Add a valid uv.lock file at the repository root so that the uv sync
commands with --frozen flag can work correctly. Additionally, update the Python
version constraints in pyproject.toml and any hard-coded Python 3.11 references,
such as in cognee/eval_framework/modal_eval_dashboard.py, to support Python 3.12
compatibility.
cognee/infrastructure/databases/vector/__init__.py (1)

6-6: 🛠️ Refactor suggestion

Handle the unused import warning and explicitly expose the adapter.
The static analyzer flagged use_vector_adapter as an unused import. Since it’s meant to be part of the public API, consider adding an __all__ list to this module (e.g., __all__ = [..., "use_vector_adapter"]) or remove the import if it’s not intended for direct external usage.

🧰 Tools
🪛 Ruff (0.11.9)

6-6: .use_vector_adapter.use_vector_adapter imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/__init__.py at line 6, the import of
use_vector_adapter is flagged as unused. To fix this, add an __all__ list to
explicitly declare use_vector_adapter as part of the public API, for example
__all__ = ["use_vector_adapter"], so the import is recognized as intentional and
exposed properly.
cognee/infrastructure/llm/prompts/cot_validation_system_prompt.txt (1)

1-1: ⚠️ Potential issue

Fix grammar error in the prompt.

The static analysis tool correctly identified a subject-verb disagreement. "You are a helpful agent who are allowed" should be "You are a helpful agent who is allowed."

Apply this diff to fix the grammar:

-You are a helpful agent who are allowed to use only the provided question answer and context.
+You are a helpful agent who is allowed to use only the provided question answer and context.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

You are a helpful agent who is allowed to use only the provided question answer and context.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~1-~1: This verb does not appear to agree with the subject. Consider using a different form.
Context: You are a helpful agent who are allowed to use only the provided questi...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)

🤖 Prompt for AI Agents
In cognee/infrastructure/llm/prompts/cot_validation_system_prompt.txt at line 1,
fix the subject-verb agreement by changing "who are allowed" to "who is allowed"
to correct the grammar in the prompt.
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)

5-5: 💡 Verification agent

🧩 Analysis chain

Verify dataset prop type consistency across frontend components.

The change from { id: string } to { name: string } aligns with the frontend standardization efforts mentioned in the AI summary. Ensure this change is consistently applied across all components that use dataset props.


🏁 Script executed:

#!/bin/bash
# Description: Verify consistent dataset prop types across frontend components
# Expected: All dataset props should use {name: string} consistently

# Search for dataset prop type definitions
rg -A 2 -B 2 "dataset.*:\s*\{.*\}" cognee-frontend/src --type tsx --type ts

# Search for Explorer component usage
rg -A 3 -B 1 "<Explorer.*dataset" cognee-frontend/src --type tsx

Length of output: 200


To ensure we catch all occurrences without relying on rg’s built-in types, let’s search by file extension instead:


🏁 Script executed:

#!/bin/bash
# Verify all dataset prop definitions use { name: string }
rg -n "dataset\s*:\s*{[^}]+}" --glob '*.ts' --glob '*.tsx' cognee-frontend/src

# Verify all <Explorer> usages pass a dataset prop
rg -n "<Explorer[^>]*\bdataset\b" --glob '*.tsx' cognee-frontend/src

Length of output: 481


Let’s check for any lingering dataset.id usages and locate the Explorer component’s prop definition:


🏁 Script executed:

#!/bin/bash
# 1. Look for any dataset.id usages
rg -n "dataset\.id" --glob '*.ts' --glob '*.tsx' cognee-frontend/src

# 2. Find all dataset prop declarations (inline or via interface)
rg -n "dataset\s*:" --glob '*.ts' --glob '*.tsx' cognee-frontend/src

# 3. Locate the Explorer component definition and its props
rg -n "export \(function\|const\) Explorer" -A 10 --glob '*.ts' --glob '*.tsx' cognee-frontend/src

Length of output: 3262


Fix dataset prop type inconsistencies across the codebase

While ExploreStep and Explorer now use dataset: { name: string }, many other modules still expect or reference dataset.id. This mismatch will lead to type errors and broken runtime behavior. Please update all consumers and definitions of “dataset” to use the standardized { name: string } shape (or a shared Dataset type) consistently.

Locations needing attention:

  • src/modules/ingestion/addData.ts – function signature uses { id?: string, name?: string }
  • src/app/page.tsx
    • openDatasetData(dataset: { id: string })
    • onDataAdd(dataset: { id: string }, …)
    • onDatasetCognify(dataset: { id: string, name: string })
  • src/modules/datasets/getDatasetData.ts – expects { id: string }
  • src/modules/datasets/deleteDataset.ts – expects { id: string }
  • src/modules/datasets/cognifyDataset.ts – signature uses { id?: string, name?: string }
  • src/modules/ingestion/useDatasets.ts – references dataset.id for status mapping
  • Any other callsites or interfaces where dataset.id is used

Please refactor these to rely solely on dataset.name (or extend the shared Dataset interface) so that all consumers align with the new standard.

🤖 Prompt for AI Agents
In cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx at line 5, the
dataset prop type was changed from { id: string } to { name: string } to
standardize usage. However, many other files still use dataset.id, causing type
inconsistencies and potential runtime errors. To fix this, update all dataset
prop definitions, function signatures, and usages across the codebase—including
src/modules/ingestion/addData.ts, src/app/page.tsx,
src/modules/datasets/getDatasetData.ts, src/modules/datasets/deleteDataset.ts,
src/modules/datasets/cognifyDataset.ts, and
src/modules/ingestion/useDatasets.ts—to use { name: string } consistently or a
shared Dataset interface that includes name but not id. Replace all references
to dataset.id with dataset.name accordingly.
cognee/modules/data/methods/get_unique_dataset_id.py (1)

5-6: 🛠️ Refactor suggestion

Address potential UUID collision risk and async function design.

The function has several concerns:

  1. Collision Risk: Direct string concatenation of dataset_name and user.id could lead to collisions. For example, dataset_name="user123" with user.id="456" produces the same input as dataset_name="user123456" with user.id="".

  2. Unnecessary async: The function is declared as async but performs no asynchronous operations.

Consider this improved implementation:

-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
-    return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")
+def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
+    """Generate a deterministic UUID for a dataset based on name and user."""
+    if not dataset_name or not dataset_name.strip():
+        raise ValueError("Dataset name cannot be empty")
+    # Use a delimiter to prevent collision
+    unique_string = f"{dataset_name}::{user.id}"
+    return uuid5(NAMESPACE_OID, unique_string)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def get_unique_dataset_id(dataset_name: str, user: User) -> UUID:
    """Generate a deterministic UUID for a dataset based on name and user."""
    if not dataset_name or not dataset_name.strip():
        raise ValueError("Dataset name cannot be empty")
    # Use a delimiter to prevent collisions between name and ID
    unique_string = f"{dataset_name}::{user.id}"
    return uuid5(NAMESPACE_OID, unique_string)
🤖 Prompt for AI Agents
In cognee/modules/data/methods/get_unique_dataset_id.py at lines 5 to 6, the
function get_unique_dataset_id is unnecessarily declared async and concatenates
dataset_name and user.id directly, risking UUID collisions. Remove the async
keyword since no await is used, and instead of simple concatenation, combine
dataset_name and user.id with a clear delimiter or use a tuple-like structure to
ensure uniqueness and avoid collisions before passing to uuid5.
cognee/api/v1/config/config.py (1)

158-158: ⚠️ Potential issue

Fix inconsistent error handling.

This method uses AttributeError while all other similar config setter methods in this file use InvalidAttributeError with the message= parameter. This creates inconsistency in error handling.

Apply this diff to maintain consistency:

-                raise AttributeError(f"'{key}' is not a valid attribute of the config.")
+                raise InvalidAttributeError(
+                    message=f"'{key}' is not a valid attribute of the config."
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                raise InvalidAttributeError(
                    message=f"'{key}' is not a valid attribute of the config."
                )
🤖 Prompt for AI Agents
In cognee/api/v1/config/config.py at line 158, replace the use of AttributeError
with InvalidAttributeError and pass the error message using the message=
parameter to align with the error handling style used in other config setter
methods. This will ensure consistent error handling across the file.
cognee/infrastructure/databases/vector/use_vector_adapter.py (1)

4-5: 🛠️ Refactor suggestion

Enhance function robustness and documentation.

The core functionality looks good for enabling dynamic vector adapter registration. Consider these improvements:

-def use_vector_adapter(vector_db_name, vector_db_adapter):
+def use_vector_adapter(vector_db_name: str, vector_db_adapter) -> None:
+    """
+    Register a vector database adapter for dynamic use.
+    
+    Args:
+        vector_db_name: Name of the vector database provider
+        vector_db_adapter: Adapter class or instance for the provider
+    """
+    if not vector_db_name:
+        raise ValueError("vector_db_name cannot be empty")
+    if vector_db_adapter is None:
+        raise ValueError("vector_db_adapter cannot be None")
+        
     supported_databases[vector_db_name] = vector_db_adapter

This adds type hints, documentation, and basic input validation to improve code quality and prevent common usage errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def use_vector_adapter(vector_db_name: str, vector_db_adapter) -> None:
    """
    Register a vector database adapter for dynamic use.
    
    Args:
        vector_db_name: Name of the vector database provider
        vector_db_adapter: Adapter class or instance for the provider
    """
    if not vector_db_name:
        raise ValueError("vector_db_name cannot be empty")
    if vector_db_adapter is None:
        raise ValueError("vector_db_adapter cannot be None")
    supported_databases[vector_db_name] = vector_db_adapter
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/use_vector_adapter.py at lines 4 to 5,
the function use_vector_adapter lacks type hints, documentation, and input
validation. Add appropriate type hints for the parameters and return type,
include a docstring explaining the function's purpose and usage, and add basic
input validation to check that vector_db_name is a string and vector_db_adapter
is a valid adapter object before registering it in supported_databases.
cognee/infrastructure/databases/graph/use_graph_adapter.py (1)

4-5: 🛠️ Refactor suggestion

Fix misleading parameter name and add input validation.

The parameter name vector_db_name is misleading since this function registers graph database adapters, not vector database adapters. Additionally, the function lacks input validation and documentation.

Apply this diff to improve the implementation:

-def use_graph_adapter(vector_db_name, vector_db_adapter):
-    supported_databases[vector_db_name] = vector_db_adapter
+def use_graph_adapter(graph_db_name: str, graph_db_adapter):
+    """
+    Register or update a graph database adapter in the supported databases registry.
+    
+    Args:
+        graph_db_name (str): The name identifier for the graph database
+        graph_db_adapter: The adapter class for the graph database
+        
+    Raises:
+        ValueError: If graph_db_name is empty or None
+        TypeError: If graph_db_adapter is None
+    """
+    if not graph_db_name:
+        raise ValueError("Graph database name cannot be empty or None")
+    if graph_db_adapter is None:
+        raise TypeError("Graph database adapter cannot be None")
+    
+    supported_databases[graph_db_name] = graph_db_adapter
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def use_graph_adapter(graph_db_name: str, graph_db_adapter):
    """
    Register or update a graph database adapter in the supported databases registry.
    
    Args:
        graph_db_name (str): The name identifier for the graph database
        graph_db_adapter: The adapter class for the graph database
        
    Raises:
        ValueError: If graph_db_name is empty or None
        TypeError: If graph_db_adapter is None
    """
    if not graph_db_name:
        raise ValueError("Graph database name cannot be empty or None")
    if graph_db_adapter is None:
        raise TypeError("Graph database adapter cannot be None")
    
    supported_databases[graph_db_name] = graph_db_adapter
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/use_graph_adapter.py around lines 4 to
5, rename the parameter vector_db_name to graph_db_name to accurately reflect
that the function registers graph database adapters. Add input validation to
check that graph_db_name is a non-empty string and vector_db_adapter is a valid
adapter object. Also, include a docstring explaining the function's purpose,
parameters, and behavior.
cognee/modules/observability/get_observe.py (1)

5-11: ⚠️ Potential issue

Fix missing return statement and improve error handling.

The function has a critical issue where it doesn't return anything when monitoring is not Observer.LANGFUSE, which will return None implicitly. Additionally, the function lacks error handling for import failures and proper documentation.

Apply this diff to fix the issues:

+from typing import Optional, Callable
+
 def get_observe():
+    """
+    Get the appropriate observation decorator based on the configured monitoring tool.
+    
+    Returns:
+        Optional[Callable]: The observe decorator if available, None otherwise
+        
+    Raises:
+        ImportError: If the required monitoring tool package is not installed
+    """
     monitoring = get_base_config().monitoring_tool

     if monitoring == Observer.LANGFUSE:
-        from langfuse.decorators import observe
-
-        return observe
+        try:
+            from langfuse.decorators import observe
+            return observe
+        except ImportError as e:
+            raise ImportError(f"Langfuse package not installed: {e}")
+    
+    # Return None for unsupported monitoring tools
+    return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Optional, Callable

def get_observe():
    """
    Get the appropriate observation decorator based on the configured monitoring tool.
    
    Returns:
        Optional[Callable]: The observe decorator if available, None otherwise
        
    Raises:
        ImportError: If the required monitoring tool package is not installed
    """
    monitoring = get_base_config().monitoring_tool

    if monitoring == Observer.LANGFUSE:
        try:
            from langfuse.decorators import observe
            return observe
        except ImportError as e:
            raise ImportError(f"Langfuse package not installed: {e}")
    
    # Return None for unsupported monitoring tools
    return None
🧰 Tools
🪛 Pylint (3.3.7)

[error] 9-9: Unable to import 'langfuse.decorators'

(E0401)


[convention] 9-9: Import outside toplevel (langfuse.decorators.observe)

(C0415)


[refactor] 5-5: Either all return statements in a function should return an expression, or none of them should.

(R1710)

🤖 Prompt for AI Agents
In cognee/modules/observability/get_observe.py around lines 5 to 11, the
function get_observe lacks a return statement when monitoring is not
Observer.LANGFUSE, causing it to implicitly return None. To fix this, add a
default return value or raise an appropriate exception for unsupported
monitoring tools. Also, wrap the import statement in a try-except block to
handle import errors gracefully and add a docstring to document the function's
behavior and possible exceptions.
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)

3-3: 🛠️ Refactor suggestion

Add validation for required parameters.

The function signature makes both id and name optional, but at least one should be provided for the API request to be meaningful.

-export default function cognifyDataset(dataset: { id?: string, name?: string }) {
+export default function cognifyDataset(dataset: { id?: string, name?: string }) {
+  if (!dataset.id && !dataset.name) {
+    throw new Error('Either dataset id or name must be provided');
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

export default function cognifyDataset(dataset: { id?: string, name?: string }) {
  if (!dataset.id && !dataset.name) {
    throw new Error('Either dataset id or name must be provided');
  }

  // …rest of the existing function body…
}
🤖 Prompt for AI Agents
In cognee-frontend/src/modules/datasets/cognifyDataset.ts at line 3, the
function parameters id and name are both optional, but the function requires at
least one to be provided. Add validation inside the function to check if either
id or name is present; if neither is provided, throw an error or return early to
prevent meaningless API requests.
cognee-frontend/src/app/page.tsx (1)

63-69: ⚠️ Potential issue

Add null safety check for dataset lookup.

The callback uses non-null assertion (dataset!) without verifying that the dataset was found, which could cause a runtime error if selectedDataset doesn't match any dataset ID.

 const onCognify = useCallback(() => {
   const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
+  if (!dataset) {
+    showNotification("Selected dataset not found. Please try again.", 5000);
+    return Promise.reject(new Error("Dataset not found"));
+  }
   return onDatasetCognify({
-    id: dataset!.id,
-    name: dataset!.name,
+    id: dataset.id,
+    name: dataset.name,
   });
 }, [datasets, onDatasetCognify, selectedDataset]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

  const onCognify = useCallback(() => {
    const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
    if (!dataset) {
      showNotification("Selected dataset not found. Please try again.", 5000);
      return Promise.reject(new Error("Dataset not found"));
    }
    return onDatasetCognify({
      id: dataset.id,
      name: dataset.name,
    });
  }, [datasets, onDatasetCognify, selectedDataset]);
🤖 Prompt for AI Agents
In cognee-frontend/src/app/page.tsx around lines 63 to 69, the onCognify
callback uses non-null assertions on dataset without checking if the dataset
exists, risking runtime errors. Modify the callback to first check if the
dataset is found; if not, handle the null case gracefully (e.g., return early or
show an error). Only access dataset properties after confirming it is not null
or undefined.
cognee/eval_framework/metrics_dashboard.py (1)

170-170: 💡 Verification agent

🧩 Analysis chain

Verify breaking change in return type.

The function now returns HTML content instead of the output file path. This is a breaking change that could affect existing callers expecting a file path.

Run the following script to verify how this function is used across the codebase:


🏁 Script executed:

#!/bin/bash
# Description: Check all usages of create_dashboard function to verify the impact of changing return type from file path to HTML content.

# Search for function calls to create_dashboard
rg -A 5 -B 2 "create_dashboard\(" --type py

# Search for any variable assignments or return value usage
ast-grep --pattern $'$var = create_dashboard($$$)'

# Search for any code that might expect a file path return
rg -A 3 -B 3 "\.html" --type py | rg -A 3 -B 3 "create_dashboard"

Length of output: 4871


Update create_dashboard’s return type and dependent code

The change in create_dashboard from returning the output file path to returning the HTML content is a breaking change. Callers and tests that expect a file path must be updated to handle HTML output.

• In cognee/eval_framework/metrics_dashboard.py, update the docstring/signature to state that the function returns the rendered HTML, not the file path.
• In cognee/tests/unit/eval_framework/dashboard_test.py:
– Rename the returned variable to dashboard_html for clarity.
– Replace the file-path assertion with an HTML-content check, then assert file creation separately.
Diff example:

- output = create_dashboard(metrics_path, aggregate_metrics_path, output_file, "Test Benchmark")
- self.assertEqual(output, output_file)
+ dashboard_html = create_dashboard(metrics_path, aggregate_metrics_path, output_file, "Test Benchmark")
+ self.assertIn("<html", dashboard_html)
+ self.assertTrue(os.path.exists(output_file))

• In cognee/eval_framework/modal_run_eval.py, review how html_output = create_dashboard(...) is used downstream—ensure it’s treated as HTML, not a file path.
• In cognee/eval_framework/run_eval.py, consider whether you need to capture the return value now that it’s HTML (or explicitly ignore it).
• In cognee/eval_framework/analysis/dashboard_generator.py, if there’s a duplicate create_dashboard, ensure its signature and return semantics match.

Please update these locations to align with the new HTML-return behavior.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/eval_framework/metrics_dashboard.py at line 170, the create_dashboard
function now returns HTML content instead of a file path, which is a breaking
change. Update the function's docstring and signature to reflect that it returns
rendered HTML. Then, in cognee/tests/unit/eval_framework/dashboard_test.py,
rename variables to indicate HTML content, replace file path assertions with
checks on the HTML content, and separately assert that the output file is
created. Also, review and update all callers in
cognee/eval_framework/modal_run_eval.py, run_eval.py, and
analysis/dashboard_generator.py to handle the returned HTML correctly instead of
expecting a file path, adjusting variable names and logic as needed to align
with this change.
cognee/tests/test_relational_db_migration.py (1)

161-162: ⚠️ Potential issue

Fix potential NameError for uninitialized variables.

The static analysis correctly identifies that node_count and edge_count may be used before assignment if an unsupported graph database provider is encountered.

Apply this diff to initialize the variables and improve error handling:

     else:
         raise ValueError(f"Unsupported graph database provider: {graph_db_provider}")
 
+    # Ensure variables are initialized before assertions
+    if 'node_count' not in locals() or 'edge_count' not in locals():
+        raise ValueError(f"Failed to retrieve node/edge counts for provider: {graph_db_provider}")
+
     # NOTE: Because of the different size of the postgres and sqlite databases,
     #       different number of nodes and edges are expected
     assert node_count == 543, f"Expected 543 nodes, got {node_count}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

     else:
         raise ValueError(f"Unsupported graph database provider: {graph_db_provider}")

     # Ensure variables are initialized before assertions
     if 'node_count' not in locals() or 'edge_count' not in locals():
         raise ValueError(f"Failed to retrieve node/edge counts for provider: {graph_db_provider}")

     # NOTE: Because of the different size of the postgres and sqlite databases,
     #       different number of nodes and edges are expected
     assert node_count == 543, f"Expected 543 nodes, got {node_count}"
     assert edge_count == 1317, f"Expected 1317 edges, got {edge_count}"
🧰 Tools
🪛 Pylint (3.3.7)

[error] 161-161: Possibly using variable 'node_count' before assignment

(E0606)


[error] 162-162: Possibly using variable 'edge_count' before assignment

(E0606)

🤖 Prompt for AI Agents
In cognee/tests/test_relational_db_migration.py around lines 161 to 162,
initialize the variables node_count and edge_count before their usage to prevent
potential NameError if an unsupported graph database provider is encountered.
Add default initializations for these variables at the start of the relevant
code block and include error handling to manage unsupported providers
gracefully, ensuring the variables are always assigned before the assertions.
cognee/shared/logging_utils.py (1)

7-7: ⚠️ Potential issue

Fix import issues identified by static analysis.

The static analysis tools correctly identified import-related issues that should be addressed for code quality:

 import os
 import sys
 import threading
 import logging
-import structlog
-import traceback
 import platform
-from datetime import datetime
-from pathlib import Path
-import importlib.metadata
+import traceback
+from datetime import datetime
+from pathlib import Path
+
+import structlog

-from cognee import __version__ as cognee_version
+from cognee import __version__ as cognee_version

The importlib.metadata import should be removed as it's unused (structlog.version is accessed directly).

Also applies to: 10-10, 12-12

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 7-7: standard import "platform" should be placed before third party import "structlog"

(C0411)

🤖 Prompt for AI Agents
In cognee/shared/logging_utils.py at lines 7, 10, and 12, remove the import of
importlib.metadata since it is unused and structlog.__version__ is accessed
directly. This will fix the import issues flagged by static analysis and improve
code quality.
alembic/versions/1d0bb7fede17_add_pipeline_run_status.py (1)

32-33: 🛠️ Refactor suggestion

Consider documenting the downgrade limitation.

The empty downgrade function could be problematic if rollback is needed. PostgreSQL doesn't easily support removing enum values once they're in use, but this should be documented.

Consider adding a comment explaining the limitation:

 def downgrade() -> None:
-    pass
+    # Note: PostgreSQL does not support removing enum values that are in use.
+    # Manual intervention may be required if rollback is necessary.
+    pass

Alternatively, consider implementing a check to ensure the enum value is not in use before attempting removal.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def downgrade() -> None:
    # Note: PostgreSQL does not support removing enum values that are in use.
    # Manual intervention may be required if rollback is necessary.
    pass
🤖 Prompt for AI Agents
In alembic/versions/1d0bb7fede17_add_pipeline_run_status.py at lines 32-33, the
downgrade function is empty which can cause issues during rollback since
PostgreSQL does not easily support removing enum values. Add a comment inside
the downgrade function explaining this limitation and why the downgrade is not
implemented. Optionally, you can add a check to ensure the enum value is not in
use before attempting removal, but at minimum document the limitation clearly.
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)

6-10: 💡 Verification agent

🧩 Analysis chain

Verify pipeline_id type consistency.

The pipeline_id parameter is typed as str, but the PipelineRun model's pipeline_id field is defined as UUID type in the relevant code snippets. This could lead to type conversion issues.


🏁 Script executed:

#!/bin/bash
# Description: Check how pipeline_id is used across the codebase to verify expected type
# Expected: Find usages that clarify whether pipeline_id should be str or UUID

echo "=== Searching for pipeline_id field usage in PipelineRun model ==="
ast-grep --pattern 'pipeline_id = Column($$$)'

echo -e "\n=== Searching for pipeline_id parameter usage in other functions ==="
rg -A 3 -B 3 "pipeline_id.*:" --type py

echo -e "\n=== Searching for log_pipeline_run_initiated function calls ==="
rg -A 5 -B 2 "log_pipeline_run_initiated" --type py

Length of output: 6983


Ensure pipeline_id parameters use UUID, not str

The PipelineRun.pipeline_id column is defined as a UUID, but all log_pipeline_run_* functions currently annotate pipeline_id: str. Call sites already pass a uuid.UUID, so we should update the signatures (and any related tooling) to match:

• Update in cognee/modules/pipelines/operations/log_pipeline_run_initiated.py
• Also apply to log_pipeline_run_start, log_pipeline_run_complete, and log_pipeline_run_error

Example diff for log_pipeline_run_initiated:

- async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID):
+ async def log_pipeline_run_initiated(pipeline_id: UUID, pipeline_name: str, dataset_id: UUID):

Repeat for the other three functions to prevent type mismatches when persisting to PipelineRun (pipeline_id = Column(UUID, …)).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

 async def log_pipeline_run_initiated(pipeline_id: UUID, pipeline_name: str, dataset_id: UUID):
     pipeline_run = PipelineRun(
         pipeline_run_id=uuid4(),
         pipeline_name=pipeline_name,
         pipeline_id=pipeline_id,
🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/log_pipeline_run_initiated.py around
lines 6 to 10, the pipeline_id parameter is typed as str but the PipelineRun
model expects a UUID type for pipeline_id. Change the type annotation of
pipeline_id from str to UUID in the function signature to ensure type
consistency and prevent conversion issues. Also, update the other related
log_pipeline_run_* functions similarly to use UUID for pipeline_id.
cognee/infrastructure/llm/tokenizer/HuggingFace/adapter.py (1)

67-78: ⚠️ Potential issue

Fix parameter name to match interface contract.

The static analysis correctly identifies that the parameter name should be token instead of encoding to match the interface definition in TokenizerInterface.

-    def decode_single_token(self, encoding: int):
+    def decode_single_token(self, token: int):
        """
        Attempt to decode a single token from its encoding, which is not implemented in this
        tokenizer.

        Parameters:
        -----------

-            - encoding (int): The integer encoding of the token to decode.
+            - token (int): The integer encoding of the token to decode.
        """
        # HuggingFace tokenizer doesn't have the option to decode tokens
        raise NotImplementedError
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def decode_single_token(self, token: int):
        """
        Attempt to decode a single token from its encoding, which is not implemented in this
        tokenizer.

        Parameters:
        -----------

            - token (int): The integer encoding of the token to decode.
        """
        # HuggingFace tokenizer doesn't have the option to decode tokens
        raise NotImplementedError
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 67-67: Parameter 'token' has been renamed to 'encoding' in overriding 'HuggingFaceTokenizer.decode_single_token' method

(W0237)

🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/HuggingFace/adapter.py between lines 67
and 78, rename the parameter of the decode_single_token method from 'encoding'
to 'token' to match the interface definition in TokenizerInterface. This ensures
consistency with the expected method signature and resolves the static analysis
warning.
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (1)

36-36: 🛠️ Refactor suggestion

Consider renaming the component to avoid shadowing.

The component name DataView shadows the global DataView constructor, which could lead to confusion when debugging or referencing the global object.

Consider renaming the component to something more specific like DataSetView or CogneeDataView:

-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {
+export default function DataSetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {
+export default function DataSetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {
🧰 Tools
🪛 Biome (1.9.4)

[error] 36-36: Do not shadow the global "DataView" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🤖 Prompt for AI Agents
In cognee-frontend/src/modules/ingestion/DataView/DataView.tsx at line 36, the
component name DataView shadows the global DataView constructor, which can cause
confusion. Rename the component to a more specific name such as DataSetView or
CogneeDataView throughout the file, including the export statement and any
references to this component, to avoid shadowing the global object.
cognee/infrastructure/engine/models/DataPoint.py (1)

159-159: ⚠️ Potential issue

Fix class method parameter convention violation.

The static analysis correctly identifies that class methods should use cls as the first parameter instead of self. This violates Python conventions and could cause confusion.

Apply this fix:

 @classmethod
-def from_json(self, json_str: str):
+def from_json(cls, json_str: str):
     """
     Deserialize a DataPoint instance from a JSON string.
     ...
     """
-    return self.model_validate_json(json_str)
+    return cls.model_validate_json(json_str)

 @classmethod  
-def from_pickle(self, pickled_data: bytes):
+def from_pickle(cls, pickled_data: bytes):
     """
     Deserialize a DataPoint instance from a pickled byte stream.
     ...
     """
     data = pickle.loads(pickled_data)
-    return self(**data)
+    return cls(**data)

Also applies to: 195-195

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 159-159: Class method from_json should have 'cls' as first argument

(C0202)

🤖 Prompt for AI Agents
In cognee/infrastructure/engine/models/DataPoint.py at lines 159 and 195, the
class methods currently use 'self' as the first parameter, which violates Python
conventions. Change the first parameter of these class methods from 'self' to
'cls' to correctly follow the class method parameter convention.
cognee/eval_framework/answer_generation/answer_generation_executor.py (1)

15-15: 🛠️ Refactor suggestion

Type annotation weakens type safety.

Changing from Dict[str, BaseRetriever] to Dict[str, Any] reduces type safety. If the new retrievers inherit from BaseRetriever, consider keeping the stronger typing or using a Union type.

-retriever_options: Dict[str, Any] = {
+retriever_options: Dict[str, BaseRetriever] = {

Alternatively, if some retrievers don't inherit from BaseRetriever, consider using a Union type or creating a common interface.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

-retriever_options: Dict[str, Any] = {
+retriever_options: Dict[str, BaseRetriever] = {
🤖 Prompt for AI Agents
In cognee/eval_framework/answer_generation/answer_generation_executor.py at line
15, the type annotation for retriever_options is currently Dict[str, Any], which
weakens type safety. To fix this, change the annotation to Dict[str,
BaseRetriever] if all retrievers inherit from BaseRetriever. If some retrievers
do not inherit from BaseRetriever, use a Union type including BaseRetriever and
other relevant types or define a common interface that all retrievers implement,
then use that interface in the type annotation.
cognee/tasks/chunks/chunk_by_sentence.py (1)

36-52: 🛠️ Refactor suggestion

Incomplete docstring - missing Returns section.

The docstring provides excellent detail about the function's behavior and parameters, but appears to be missing the Returns section that should describe the Iterator[Tuple[UUID, str, int, Optional[str]]] return type.

Add the missing Returns section:

         generated. (default None)
+
+    Returns:
+    --------
+
+        - Iterator[Tuple[UUID, str, int, Optional[str]]]: An iterator yielding tuples containing:
+          - UUID: Unique identifier for the paragraph
+          - str: The sentence text
+          - int: The size of the sentence in tokens
+          - Optional[str]: The sentence type ('sentence_end', 'paragraph_end', 'sentence_cut', etc.)
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Splits text into sentences while preserving word and paragraph boundaries.

    This function processes the input string, dividing it into sentences based on word-level
    tokenization. Each sentence is identified with a unique UUID, and it handles scenarios
    where the text may end mid-sentence by tagging it with a specific type. If a maximum
    sentence length is specified, the function ensures that sentences do not exceed this
    length, raising a ValueError if an individual word surpasses it. The function utilizes
    an external word processing function `chunk_by_word` to determine the structure of the
    text.

    Parameters:
    -----------

        - data (str): The input text to be split into sentences.
        - maximum_size (Optional[int]): An optional limit on the maximum size of sentences
          generated. (default None)

    Returns:
    --------

        - Iterator[Tuple[UUID, str, int, Optional[str]]]: An iterator yielding tuples containing:
          - UUID: Unique identifier for the paragraph
          - str: The sentence text
          - int: The size of the sentence in tokens
          - Optional[str]: The sentence type ('sentence_end', 'paragraph_end', 'sentence_cut', etc.)
🤖 Prompt for AI Agents
In cognee/tasks/chunks/chunk_by_sentence.py around lines 36 to 52, the
function's docstring lacks a Returns section describing the return type. Add a
Returns section that clearly states the function returns an Iterator of Tuples
containing a UUID, a sentence string, an integer, and an optional string, to
complete the documentation.
cognee/infrastructure/databases/vector/create_vector_engine.py (1)

42-49: ⚠️ Potential issue

Fix critical parameter name bug.

There's a typo in the adapter instantiation that will cause runtime errors.

        return adapter(
-            utl=vector_db_url,
+            url=vector_db_url,
             api_key=vector_db_key,
             embedding_engine=embedding_engine,
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if vector_db_provider in supported_databases:
        adapter = supported_databases[vector_db_provider]

        return adapter(
            url=vector_db_url,
            api_key=vector_db_key,
            embedding_engine=embedding_engine,
        )
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/create_vector_engine.py between lines
42 and 49, the parameter name 'utl' used in the adapter instantiation is a typo
and should be corrected to 'url' to match the expected parameter name. Update
the adapter call to use 'url=vector_db_url' instead of 'utl=vector_db_url' to
fix the runtime error caused by this incorrect parameter name.
cognee/tasks/documents/classify_documents.py (1)

54-70: 🛠️ Refactor suggestion

Robustness: broaden the error guard in update_node_set.

json.loads(document.external_metadata) will also raise a TypeError if external_metadata is None – a scenario we occasionally hit when ingesting legacy records.
A tiny tweak makes the helper safer:

-    except json.JSONDecodeError:
+    except (json.JSONDecodeError, TypeError):

Optional, but prevents silent crashes during bulk imports.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def update_node_set(document):
    """
    Extracts node_set from document's external_metadata.

    Parses the external_metadata of the given document and updates the document's
    belongs_to_set attribute with NodeSet objects generated from the node_set found in the
    external_metadata. If the external_metadata is not valid JSON, is not a dictionary, does
    not contain the 'node_set' key, or if node_set is not a list, the function has no effect
    and will return early.

    Parameters:
    -----------

        - document: The document object which contains external_metadata from which the
          node_set will be extracted.
    """
    try:
        metadata = json.loads(document.external_metadata)
        if not isinstance(metadata, dict):
            return
        node_set = metadata.get("node_set")
        if not isinstance(node_set, list):
            return

        document.belongs_to_set = [
            NodeSet.from_dict(node) for node in node_set
        ]

-   except json.JSONDecodeError:
+   except (json.JSONDecodeError, TypeError):
        return
🤖 Prompt for AI Agents
In cognee/tasks/documents/classify_documents.py around lines 54 to 70, the
current error handling in update_node_set only catches JSONDecodeError when
parsing document.external_metadata, but it can also raise a TypeError if
external_metadata is None. To fix this, broaden the except clause to catch both
JSONDecodeError and TypeError exceptions to prevent silent crashes during bulk
imports with legacy records.
cognee/tasks/ingestion/migrate_relational_database.py (1)

100-108: ⚠️ Potential issue

Foreign-key filter uses the wrong column + identity operator bug.

  1. We want to skip columns in the current table that serve as FKs.
    The column resides in fk["column"], not fk["ref_column"].

  2. key is primary_key_col compares object identity; use equality.

Patch:

-    foreign_keys.append(fk["ref_column"])
+    foreign_keys.append(fk["column"])

-    if key is primary_key_col or key in foreign_keys:
+    if key == primary_key_col or key in foreign_keys:

This prevents accidental creation of ColumnValue nodes for FK columns and avoids the brittle is comparison.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                    # Get foreign key columns to filter them out from column migration
                    foreign_keys = []
                    for fk in details.get("foreign_keys", []):
-                        foreign_keys.append(fk["ref_column"])
+                        foreign_keys.append(fk["column"])

                    for key, value in row_properties.items():
                        # Skip mapping primary key information to itself and mapping of foreign key information (as it will be mapped bellow)
-                        if key is primary_key_col or key in foreign_keys:
+                        if key == primary_key_col or key in foreign_keys:
                            continue
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 106-106: Line too long (141/100)

(C0301)

🤖 Prompt for AI Agents
In cognee/tasks/ingestion/migrate_relational_database.py around lines 100 to
108, the foreign key filtering incorrectly uses fk["ref_column"] instead of
fk["column"] to identify foreign key columns in the current table, and it uses
the identity operator 'is' to compare key and primary_key_col, which should be
an equality check. Fix this by changing foreign_keys to collect fk["column"]
values and replace 'key is primary_key_col' with 'key == primary_key_col' to
correctly skip primary and foreign key columns.
cognee/tasks/temporal_awareness/index_graphiti_objects.py (1)

34-34: 💡 Verification agent

🧩 Analysis chain

Verify the method name change is consistent across adapters.

The change from get_model_independent_graph_data() to get_graph_data() appears to align with the updated graph database adapters. Please ensure this method exists in all graph adapters.


🏁 Script executed:

#!/bin/bash
# Description: Verify that get_graph_data method exists in all graph adapters
# Expected: All adapters should implement get_graph_data method

# Search for get_graph_data method implementations
ast-grep --pattern $'async def get_graph_data($_) {
  $$$
}'

# Also check for any remaining get_model_independent_graph_data references
rg "get_model_independent_graph_data" -A 2

Length of output: 2104


Implement get_graph_data in all graph adapters

The adapters still define and invoke get_model_independent_graph_data() without providing get_graph_data(). Please update the following files to either rename the existing method or add a thin wrapper so that every adapter implements async def get_graph_data(...):

  • cognee/infrastructure/databases/graph/kuzu/adapter.py
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
  • cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py

For example, you can replace:

- async def get_model_independent_graph_data(self) -> Dict[str, List[str]]:
+ async def get_graph_data(self) -> Dict[str, List[str]]:
    """
    ...
    """
    # existing implementation

Or add:

async def get_graph_data(self, *args, **kwargs):
    return await self.get_model_independent_graph_data(*args, **kwargs)

so that the call in cognee/tasks/temporal_awareness/index_graphiti_objects.py line 34 resolves correctly.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/kuzu/adapter.py,
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py, and
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py, ensure each
adapter implements an async method named get_graph_data. This can be done by
either renaming the existing get_model_independent_graph_data method to
get_graph_data or by adding a new async get_graph_data method that calls and
returns the result of get_model_independent_graph_data with the same arguments.
This will align all adapters with the call made in
cognee/tasks/temporal_awareness/index_graphiti_objects.py line 34.
cognee/infrastructure/llm/tokenizer/Gemini/adapter.py (1)

48-59: ⚠️ Potential issue

Fix parameter name inconsistency.

The method parameter should be named token to match the interface definition, not encoding. This will resolve the static analysis warning and maintain consistency with the base interface.

Apply this diff to fix the parameter name:

-    def decode_single_token(self, encoding: int):
+    def decode_single_token(self, token: int):
         """
         Raise NotImplementedError when called, as Gemini tokenizer does not support decoding of
         tokens.

         Parameters:
         -----------

-            - encoding (int): The token encoding to decode.
+            - token (int): The token encoding to decode.
         """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def decode_single_token(self, token: int):
        """
        Raise NotImplementedError when called, as Gemini tokenizer does not support decoding of
        tokens.

        Parameters:
        -----------

            - token (int): The token encoding to decode.
        """
        # Gemini tokenizer doesn't have the option to decode tokens
        raise NotImplementedError
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 48-48: Parameter 'token' has been renamed to 'encoding' in overriding 'GeminiTokenizer.decode_single_token' method

(W0237)

🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/Gemini/adapter.py around lines 48 to 59,
rename the method parameter from 'encoding' to 'token' in the
decode_single_token function to match the interface definition and resolve the
static analysis warning. This change ensures consistency with the base interface
without altering the method's behavior.
cognee/infrastructure/files/storage/StorageManager.py (1)

9-9: ⚠️ Potential issue

Fix type inconsistency between protocol and implementation.

There's a type mismatch between the Storage protocol and StorageManager implementation:

  • Protocol expects data: bytes (line 9)
  • Implementation expects data: BinaryIO (line 60)
  • LocalStorage actually accepts Union[BinaryIO, str] according to the relevant code snippets

This inconsistency could cause type checking issues and developer confusion.

Consider updating the protocol to match the actual usage:

-    def store(self, file_path: str, data: bytes):
+    def store(self, file_path: str, data: Union[BinaryIO, str]):

Or update the implementation to match the protocol if bytes-only is the intended interface.

Also applies to: 60-60

🤖 Prompt for AI Agents
In cognee/infrastructure/files/storage/StorageManager.py at lines 9 and 60,
there is a type inconsistency between the Storage protocol and StorageManager
implementation regarding the `data` parameter type. The protocol defines `data`
as bytes, but the implementation uses BinaryIO, and LocalStorage accepts
Union[BinaryIO, str]. To fix this, align the protocol and implementation types
by either updating the protocol's `data` parameter to accept the broader type
used in the implementation (e.g., BinaryIO or Union[BinaryIO, str]) or restrict
the implementation to accept bytes only if that is the intended interface,
ensuring consistent type annotations across all related classes.
cognee/infrastructure/llm/tokenizer/Mistral/adapter.py (1)

78-89: ⚠️ Potential issue

Fix parameter name inconsistency with interface.

The parameter name encoding doesn't match the interface expectation of token. This could cause issues when the method is called through the interface.

Apply this diff to fix the parameter name:

-    def decode_single_token(self, encoding: int):
+    def decode_single_token(self, token: int):
         """
         Attempt to decode a single token, although this functionality is not implemented and
         raises NotImplementedError.

         Parameters:
         -----------

-            - encoding (int): The integer representation of the token to decode.
+            - token (int): The integer representation of the token to decode.
         """
         # Mistral tokenizer doesn't have the option to decode tokens
         raise NotImplementedError
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def decode_single_token(self, token: int):
        """
        Attempt to decode a single token, although this functionality is not implemented and
        raises NotImplementedError.

        Parameters:
        -----------

            - token (int): The integer representation of the token to decode.
        """
        # Mistral tokenizer doesn't have the option to decode tokens
        raise NotImplementedError
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 78-78: Parameter 'token' has been renamed to 'encoding' in overriding 'MistralTokenizer.decode_single_token' method

(W0237)

🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/Mistral/adapter.py around lines 78 to 89,
the method decode_single_token uses the parameter name 'encoding' which is
inconsistent with the interface that expects 'token'. Rename the parameter from
'encoding' to 'token' to match the interface and avoid potential issues when the
method is called through the interface.
cognee/api/v1/responses/routers/default_tools.py (1)

1-86: 🛠️ Refactor suggestion

Well-structured tool definitions with room for security improvements.

The tool definitions follow OpenAI function calling standards and provide comprehensive parameter specifications. However, consider adding input validation constraints for security.

Consider these security enhancements:

                 "search_query": {
                     "type": "string",
                     "description": "The query to search for in the knowledge graph",
+                    "maxLength": 1000,
+                    "pattern": "^[\\w\\s\\-.,!?()]+$"
                 },
                 "text": {
                     "type": "string",
                     "description": "Text content to be converted into a knowledge graph",
+                    "maxLength": 50000
                 },
                 "graph_model_file": {
                     "type": "string",
                     "description": "Path to a custom graph model file",
+                    "pattern": "^[\\w\\-./]+\\.(json|yaml|yml)$"
                 },

These constraints help prevent injection attacks and ensure reasonable input sizes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

DEFAULT_TOOLS = [
    {
        "type": "function",
        "name": "search",
        "description": "Search for information within the knowledge graph",
        "parameters": {
            "type": "object",
            "properties": {
                "search_query": {
                    "type": "string",
                    "description": "The query to search for in the knowledge graph",
                    "maxLength": 1000,
                    "pattern": "^[\\w\\s\\-.,!?()]+$"
                },
                "search_type": {
                    "type": "string",
                    "description": "Type of search to perform",
                    "enum": [
                        "INSIGHTS",
                        "CODE",
                        "GRAPH_COMPLETION",
                        "SEMANTIC",
                        "NATURAL_LANGUAGE",
                    ],
                },
                "top_k": {
                    "type": "integer",
                    "description": "Maximum number of results to return",
                    "default": 10,
                },
                "datasets": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Optional list of dataset names to search within",
                },
            },
            "required": ["search_query"],
        },
    },
    {
        "type": "function",
        "name": "cognify",
        "description": "Convert text into a knowledge graph or process all added content",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "Text content to be converted into a knowledge graph",
                    "maxLength": 50000
                },
                "graph_model_name": {
                    "type": "string",
                    "description": "Name of the graph model to use",
                },
                "graph_model_file": {
                    "type": "string",
                    "description": "Path to a custom graph model file",
                    "pattern": "^[\\w\\-./]+\\.(json|yaml|yml)$"
                },
            },
        },
    },
    {
        "type": "function",
        "name": "prune",
        "description": "Remove unnecessary or outdated information from the knowledge graph",
        "parameters": {
            "type": "object",
            "properties": {
                "prune_strategy": {
                    "type": "string",
                    "enum": ["light", "moderate", "aggressive"],
                    "description": "Strategy for pruning the knowledge graph",
                    "default": "moderate",
                },
                "min_confidence": {
                    "type": "number",
                    "description": "Minimum confidence score to retain (0-1)",
                    "minimum": 0,
                    "maximum": 1,
                },
                "older_than": {
                    "type": "string",
                    "description": "ISO date string - prune nodes older than this date",
                },
            },
        },
    },
]
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/default_tools.py within lines 1 to 86, the
tool definitions lack input validation constraints that can prevent injection
attacks and control input sizes. Add constraints such as maxLength for string
fields like "search_query", "text", "graph_model_name", and "graph_model_file";
minLength where appropriate; and maxItems for arrays like "datasets". Also,
consider pattern restrictions if applicable to further validate inputs. These
additions will enhance security by limiting input size and format.
cognee/api/v1/responses/routers/get_responses_router.py (4)

54-55: 🛠️ Refactor suggestion

Remove hardcoded model override or document the reasoning.

The TODO comment indicates this is temporary, but hardcoding the model to "gpt-4o" regardless of the request parameter could confuse users.

Either implement proper model support or document why this override is necessary:

-        # TODO: Support other models (e.g. cognee-v1-openai-gpt-3.5-turbo, etc.)
-        model = "gpt-4o"
+        # Currently only gpt-4o is supported for responses API
+        if model != "gpt-4o":
+            logger.warning(f"Model {model} not supported, using gpt-4o")
+            model = "gpt-4o"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        # Currently only gpt-4o is supported for responses API
        if model != "gpt-4o":
            logger.warning(f"Model {model} not supported, using gpt-4o")
            model = "gpt-4o"
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 54-54: TODO: Support other models (e.g. cognee-v1-openai-gpt-3.5-turbo, etc.)

(W0511)

🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 54 to
55, the model variable is hardcoded to "gpt-4o" overriding any request
parameter, which can confuse users. To fix this, either remove the hardcoded
assignment so the model parameter from the request is used as intended, or if
the override is necessary temporarily, add a clear comment explaining why this
is done and when it will be removed or replaced with proper model support.

43-49: ⚠️ Potential issue

Fix dangerous default value and improve parameter handling.

Using a mutable default value can lead to unexpected behavior when the list is modified.

Apply this diff to fix the dangerous default:

-    async def call_openai_api_for_model(
-        input_text: str,
-        model: str,
-        tools: Optional[List[Dict[str, Any]]] = DEFAULT_TOOLS,
-        tool_choice: Any = "auto",
-        temperature: float = 1.0,
-    ) -> Dict[str, Any]:
+    async def call_openai_api_for_model(
+        input_text: str,
+        model: str,
+        tools: Optional[List[Dict[str, Any]]] = None,
+        tool_choice: Any = "auto",
+        temperature: float = 1.0,
+    ) -> Dict[str, Any]:

Then handle the None case in the function:

+        if tools is None:
+            tools = DEFAULT_TOOLS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    async def call_openai_api_for_model(
        input_text: str,
        model: str,
        tools: Optional[List[Dict[str, Any]]] = None,
        tool_choice: Any = "auto",
        temperature: float = 1.0,
    ) -> Dict[str, Any]:
        if tools is None:
            tools = DEFAULT_TOOLS

        # …rest of function…
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 43-43: Dangerous default value DEFAULT_TOOLS (builtins.list) as argument

(W0102)

🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 43 to
49, the function call_openai_api_for_model uses a mutable default argument
DEFAULT_TOOLS for the parameter tools, which can cause unexpected behavior if
the list is modified. To fix this, change the default value of tools to None and
inside the function, check if tools is None and if so, assign it to
DEFAULT_TOOLS. This prevents sharing the same list instance across function
calls and ensures safer parameter handling.

72-75: ⚠️ Potential issue

Fix dependency injection pattern and remove unused parameter.

The Depends call in the default argument should be avoided, and the unused user parameter should be handled properly.

Apply this diff to fix the dependency injection:

     @router.post("/", response_model=ResponseBody)
     async def create_response(
         request: ResponseRequest,
-        user: User = Depends(get_authenticated_user),
+        user: User = Depends(get_authenticated_user),
     ) -> ResponseBody:

If the user parameter is required for authentication but not used in the function body, add a comment explaining this:

         user: User = Depends(get_authenticated_user),
     ) -> ResponseBody:
         """
         OpenAI-compatible responses endpoint with function calling support
         """
+        # User parameter ensures authentication but is not used in processing

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.11.9)

74-74: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🪛 Pylint (3.3.7)

[refactor] 72-72: Too many local variables (18/15)

(R0914)


[warning] 74-74: Unused argument 'user'

(W0613)

🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 72 to
75, the function create_response uses Depends in the default argument for the
user parameter, which is not recommended, and the user parameter is unused.
Remove the user parameter from the function signature if it is not needed, or if
it is required for authentication side effects but not used, keep it but add a
comment explaining why it is present. Avoid using Depends directly in default
arguments and instead use it in the function parameters properly.

114-122: 🛠️ Refactor suggestion

Improve exception handling specificity.

Catching Exception is too broad and may hide important error details.

Apply this diff for more specific exception handling:

                 # Dispatch the function
                 try:
                     function_result = await dispatch_function(tool_call)
                     output_status = "success"
-                except Exception as e:
-                    logger.exception(f"Error executing function {function_name}: {e}")
+                except (ValueError, TypeError, KeyError) as e:
+                    logger.exception("Error executing function %s: %s", function_name, e)
+                    function_result = f"Error executing {function_name}: {str(e)}"
+                    output_status = "error"
+                except Exception as e:
+                    logger.exception("Unexpected error executing function %s: %s", function_name, e)
                     function_result = f"Error executing {function_name}: {str(e)}"
                     output_status = "error"
+                    # Re-raise unexpected errors after logging
+                    raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                # Dispatch the function
                try:
                    function_result = await dispatch_function(tool_call)
                    output_status = "success"
                except (ValueError, TypeError, KeyError) as e:
                    logger.exception("Error executing function %s: %s", function_name, e)
                    function_result = f"Error executing {function_name}: {str(e)}"
                    output_status = "error"
                except Exception as e:
                    logger.exception("Unexpected error executing function %s: %s", function_name, e)
                    function_result = f"Error executing {function_name}: {str(e)}"
                    output_status = "error"
                    # Re-raise unexpected errors after logging
                    raise
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 118-118: Catching too general exception Exception

(W0718)


[warning] 119-119: Use lazy % formatting in logging functions

(W1203)

🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 114 to
122, the current code catches a broad Exception which can obscure specific error
types. Refine the exception handling by catching more specific exceptions
relevant to dispatch_function, such as asyncio.TimeoutError or any known custom
exceptions it may raise. This will improve error clarity and handling precision.
Adjust the except blocks accordingly to handle these specific exceptions before
a general fallback if necessary.
cognee/infrastructure/databases/exceptions/exceptions.py (1)

69-86: 🛠️ Refactor suggestion

New exception class follows consistent patterns but has a similar issue with base class initialization.

The NodesetFilterNotSupportedError class is well-documented and serves a clear purpose. However, like EntityNotFoundError, it doesn't call super().__init__(), which bypasses the base class logging and initialization logic.

Consider maintaining consistency with the base class pattern. If logging needs to be disabled for certain scenarios, consider adding a parameter to control it:

 def __init__(
     self,
     message: str = "The nodeset filter is not supported in the current graph database.",
     name: str = "NodeSetFilterNotSupportedError",
     status_code=status.HTTP_404_NOT_FOUND,
 ):
-    self.message = message
-    self.name = name
-    self.status_code = status_code
+    super().__init__(message, name, status_code, log=False)

This approach maintains inheritance patterns while allowing selective logging control.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

class NodesetFilterNotSupportedError(CogneeApiError):
    """
    Raise an exception when a nodeset filter is not supported by the current database.

    This exception inherits from `CogneeApiError` and is designed to provide information
    about the specific issue of unsupported nodeset filters in the context of graph
    databases.
    """

    def __init__(
        self,
        message: str = "The nodeset filter is not supported in the current graph database.",
        name: str = "NodeSetFilterNotSupportedError",
        status_code=status.HTTP_404_NOT_FOUND,
    ):
        super().__init__(message, name, status_code, log=False)
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 78-78: init method from base class 'CogneeApiError' is not called

(W0231)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/exceptions/exceptions.py around lines 69 to
86, the NodesetFilterNotSupportedError class does not call super().__init__(),
which skips the base class initialization and logging logic. To fix this, modify
the __init__ method to call super().__init__() with the appropriate parameters,
optionally adding a parameter to control logging if needed, ensuring consistent
inheritance and proper base class setup.
cognee/infrastructure/databases/graph/graph_db_interface.py (2)

93-96: 🛠️ Refactor suggestion

Log level & broad exception handling might hide production errors

Downgrading all exceptions to debug risks silently swallowing issues that corrupt the relationship ledger. Consider:

  1. Logging at warning (or error) when persistence fails.
  2. Narrowing the except clause (Exception → expected DB errors).
  3. Surfacing failures to callers (optional retry / circuit-breaker).

Keeping visibility high is critical for audit trails.

Also applies to: 120-121

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/graph_db_interface.py around lines 93
to 96, the current exception handling logs all errors at debug level and catches
broad exceptions, which may hide critical issues. Change the log level to
warning or error to ensure visibility of persistence failures, narrow the except
clause to catch only expected database-related exceptions, and consider
re-raising the exception or implementing a retry mechanism to surface failures
to callers. Apply similar changes to lines 120-121 for consistency.

353-366: ⚠️ Potential issue

Return-type uses int IDs while the rest of the interface uses str

Node, EdgeData, and concrete adapters (e.g., kuzu, neo4j) all employ str for node IDs. The signature below introduces int, breaking static typing and causing mypy / IDE warnings.

-    ) -> Tuple[List[Tuple[int, dict]], List[Tuple[int, int, str, dict]]]:
+    ) -> Tuple[List[Tuple[str, dict]], List[Tuple[str, str, str, dict]]]:

Please align the type hints and docstring accordingly, or justify the divergent type with explicit conversion logic.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    @abstractmethod
    async def get_nodeset_subgraph(
        self, node_type: Type[Any], node_name: List[str]
    ) -> Tuple[List[Tuple[str, dict]], List[Tuple[str, str, str, dict]]]:
        """
        Fetch a subgraph consisting of a specific set of nodes and their relationships.

        Parameters:
        -----------

            - node_type (Type[Any]): The type of nodes to include in the subgraph.
            - node_name (List[str]): A list of names of the nodes to include in the subgraph.
        """
        raise NotImplementedError
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/graph_db_interface.py around lines 353
to 366, the method get_nodeset_subgraph uses int for node IDs in its return
type, while the rest of the interface and implementations use str for node IDs.
To fix this, change the return type hints from int to str for node IDs in both
the node tuples and edge tuples. Also update the docstring to reflect that node
IDs are strings, ensuring consistency with the rest of the interface and
avoiding static typing conflicts.
cognee/infrastructure/databases/graph/get_graph_engine.py (1)

79-87: 🛠️ Refactor suggestion

Adapter instantiation may omit mandatory parameters

The generic branch instantiates adapters with only url/username/password. Adapters such as KuzuAdapter (requires db_path) or future adapters needing port/file_path will break:

adapter = supported_databases[graph_database_provider]
return adapter(                       # missing graph_file_path / port
    graph_database_url=graph_database_url,
    graph_database_username=graph_database_username,
    graph_database_password=graph_database_password,
)

Consider forwarding all kwargs or registering lambdas/partial objects in supported_databases that satisfy each constructor.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/get_graph_engine.py around lines 79 to
87, the adapter instantiation only passes url, username, and password, which
breaks adapters requiring additional parameters like db_path or port. To fix
this, modify the code to forward all relevant keyword arguments (kwargs) to the
adapter constructor or update supported_databases to store factory functions
(e.g., lambdas or partials) that supply the correct parameters for each adapter
type, ensuring all mandatory parameters are provided during instantiation.
cognee/modules/retrieval/graph_completion_cot_retriever.py (2)

74-80: ⚠️ Potential issue

Return type becomes a nested list

answer is already a list (initialised at L79 and assigned from generate_completion).
Wrapping it again (return [answer]) yields List[List[str]], which is unlikely what callers expect and breaks type hints.

-        return [answer]
+        return answer

Also applies to: 125-125

🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_cot_retriever.py around lines 74 to
80 and line 125, the variable 'answer' is already a list of strings, but the
code wraps it again in another list when returning, causing the return type to
be a nested list (List[List[str]]). To fix this, remove the extra list wrapping
in the return statement so that 'answer' is returned directly as a List[str],
matching the expected type hints and avoiding type errors.

81-88: ⚠️ Potential issue

Off-by-one: max_iter + 1 executes one extra round

range(max_iter + 1) performs max_iter + 1 iterations while the docstring says “maximum number of iterations … (default 4)”.
Either make the loop range(max_iter) or clarify the docstring.

-        for round_idx in range(max_iter + 1):
+        for round_idx in range(max_iter):

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_cot_retriever.py around lines 81 to
88, the loop uses range(max_iter + 1) which causes one extra iteration beyond
the intended maximum. To fix this, change the loop to range(max_iter) to ensure
it runs exactly max_iter times as described in the docstring, or alternatively
update the docstring to reflect the current behavior if the extra iteration is
intentional.
cognee/modules/retrieval/graph_completion_context_extension_retriever.py (2)

1-8: ⚠️ Potential issue

Remove unused imports to satisfy Ruff/Pylint and avoid dead code

get_llm_client, read_query_prompt, and render_prompt are imported but never referenced.
Keeping stale imports bloats byte-code, hurts import-time perf, and now fails Ruff (F401) & Pylint (W0611).

-from cognee.infrastructure.llm.get_llm_client import get_llm_client
-from cognee.infrastructure.llm.prompts import read_query_prompt, render_prompt
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Any, Optional, List, Type
from cognee.shared.logging_utils import get_logger
from cognee.modules.retrieval.graph_completion_retriever import GraphCompletionRetriever
from cognee.modules.retrieval.utils.completion import generate_completion

logger = get_logger()
🧰 Tools
🪛 Ruff (0.11.9)

3-3: cognee.infrastructure.llm.get_llm_client.get_llm_client imported but unused

Remove unused import: cognee.infrastructure.llm.get_llm_client.get_llm_client

(F401)


6-6: cognee.infrastructure.llm.prompts.read_query_prompt imported but unused

Remove unused import

(F401)


6-6: cognee.infrastructure.llm.prompts.render_prompt imported but unused

Remove unused import

(F401)

🪛 Pylint (3.3.7)

[warning] 3-3: Unused get_llm_client imported from cognee.infrastructure.llm.get_llm_client

(W0611)


[warning] 6-6: Unused read_query_prompt imported from cognee.infrastructure.llm.prompts

(W0611)


[warning] 6-6: Unused render_prompt imported from cognee.infrastructure.llm.prompts

(W0611)

🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_context_extension_retriever.py
lines 1 to 8, remove the unused imports get_llm_client, read_query_prompt, and
render_prompt since they are not referenced anywhere in the file. This will
satisfy Ruff and Pylint by eliminating dead code and improve import-time
performance.

110-117: 🛠️ Refactor suggestion

Return signature mismatch – should return a string, not a single-item list

Callers of get_completion will likely expect the completion text, yet the method wraps it in a list (return [answer]).
Unless a higher-level protocol mandates a list, simplify:

-        return [answer]
+        return answer

Align the declared return type accordingly (-> str).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        answer = await generate_completion(
            query=query,
            context=context,
            user_prompt_path=self.user_prompt_path,
            system_prompt_path=self.system_prompt_path,
        )

        return answer
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_context_extension_retriever.py
around lines 110 to 117, the method get_completion currently returns a
single-item list containing the answer, but it should return just the string
answer itself. Remove the list brackets around answer in the return statement so
it returns a string, and update the method's return type annotation to -> str to
reflect this change.
cognee/modules/pipelines/operations/pipeline.py (2)

113-135: 🛠️ Refactor suggestion

“Ugly hack” block risks data inconsistency & merits extraction

The manual uuid5 reconstruction and double log_pipeline_run_initiated calls (lines 116-134) suggest a workaround for stale status logic.

Problems
• Divergent dataset_id vs dataset.id (reset on 135) may desynchronise logs.
• Hard-coded pipeline names hinder future refactors.

Recommendation
Move this logic into log_pipeline_run_initiated or a dedicated helper that returns the consistent dataset_id used throughout the call, eliminating the re-assignment surprise.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 118-118: Line too long (112/100)

(C0301)


[convention] 128-128: Line too long (116/100)

(C0301)

🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/pipeline.py around lines 113 to 135, the
current code manually reconstructs dataset_id using uuid5 and makes two separate
calls to log_pipeline_run_initiated with hard-coded pipeline names, then resets
dataset_id to dataset.id, causing potential data inconsistency and maintenance
issues. Refactor by extracting this logic into a dedicated helper function or
incorporate it into log_pipeline_run_initiated so that it consistently computes
and returns the dataset_id used in all calls, avoiding the manual reassignment
and hard-coded strings, ensuring consistent dataset_id usage and easier future
refactoring.

64-92: 🛠️ Refactor suggestion

Dataset resolution loop is O(n²) – convert lookup to hash-table

For each requested name you iterate over every existing dataset (nested loops).
With large tenant datasets this scales poorly.

Optimised sketch:

-    existing_datasets = await get_datasets(user.id)
-    ...
-    for dataset_name in datasets:
-        for existing_dataset in existing_datasets:
+    existing_by_name = {d.name: d for d in existing_datasets}
+    existing_by_id   = {str(d.id): d for d in existing_datasets}
+    for dataset_name in datasets:
+        existing_dataset = existing_by_name.get(dataset_name) or existing_by_id.get(dataset_name)
+        if existing_dataset:
+            dataset_instances.append(existing_dataset)
+            continue

This trims complexity to O(n).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/pipeline.py between lines 64 and 92, the
current dataset resolution uses nested loops causing O(n²) complexity. To fix
this, create a dictionary (hash table) mapping existing dataset names and IDs to
their instances before the loop. Then, for each dataset_name, directly check
this dictionary for existence to append the existing dataset or create a new one
if not found. This change reduces the complexity to O(n).
cognee/infrastructure/llm/openai/adapter.py (1)

18-21: ⚠️ Potential issue

observe may be None – provide a safe no-op fallback decorator

get_observe() can legitimately return None when no monitoring tool is configured.
Applying @None as a decorator raises a TypeError during module import, breaking every runtime path that imports this adapter.

-from cognee.modules.observability.get_observe import get_observe
-
-observe = get_observe()
+from cognee.modules.observability.get_observe import get_observe
+
+def _noop(func):
+    return func
+
+# Use the real decorator if available; otherwise fall back to a no-op
+observe = get_observe() or (lambda *_, **__: _noop)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from cognee.modules.observability.get_observe import get_observe

def _noop(func):
    return func

# Use the real decorator if available; otherwise fall back to a no-op
observe = get_observe() or (lambda *_, **__: _noop)
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/openai/adapter.py around lines 18 to 21, the
variable observe assigned from get_observe() may be None, which causes a
TypeError if used as a decorator. To fix this, check if observe is None and if
so, assign it a no-op decorator function that simply returns the original
function unchanged. This ensures that applying @observe will not fail even when
no monitoring tool is configured.
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (1)

174-183: ⚠️ Potential issue

Manual runner invokes non-existent methods – will raise AttributeError

test.test_graph_completion_context_simple() (etc.) does not exist – the defined
method names include _cot_. The manual block is unused by pytest but will fail
for anyone invoking the file directly.

Either:

• rename the calls to the correct method names, or
• remove the if __name__ == "__main__": block entirely.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 179-179: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_simple' member

(E1101)


[error] 180-180: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_complex' member

(E1101)


[error] 181-181: Instance of 'TestGraphCompletionRetriever' has no 'test_get_graph_completion_context_on_empty_graph' member

(E1101)

🤖 Prompt for AI Agents
In cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
around lines 174 to 183, the manual test runner calls methods that do not exist
because the actual test method names include '_cot_'. To fix this, either rename
the calls in the manual runner to match the correct method names with '_cot_' or
remove the entire manual runner block to prevent AttributeError when running the
file directly.
cognee/eval_framework/modal_run_eval.py (1)

80-97: ⚠️ Potential issue

html_output may be undefined and files are opened without encoding

  1. html_output is only set inside the if eval_params.get("dashboard") block yet
    written unconditionally – raises UnboundLocalError when dashboard is falsy.
  2. open() defaults to platform encoding; specify encoding="utf-8" for
    deterministic behaviour.
-    with open("/data/" + answers_filename, "w") as f:
+    with open("/data/" + answers_filename, "w", encoding="utf-8") as f:
         json.dump(answers, f, ensure_ascii=False, indent=4)
@@
-    if eval_params.get("dashboard"):
+    if eval_params.get("dashboard"):
         ...
+        with open("/data/" + html_filename, "w", encoding="utf-8") as f:
+            f.write(html_output)
+        vol.commit()

Also place the second open() inside the if block to avoid the undefined-variable issue.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Pylint (3.3.7)

[warning] 80-80: Using open without explicitly specifying an encoding

(W1514)


[warning] 93-93: Using open without explicitly specifying an encoding

(W1514)


[error] 94-94: Possibly using variable 'html_output' before assignment

(E0606)

🤖 Prompt for AI Agents
In cognee/eval_framework/modal_run_eval.py lines 80 to 97, the variable
html_output is assigned only inside the if eval_params.get("dashboard") block
but used outside it, causing an UnboundLocalError if the condition is false.
Also, the open() calls lack explicit encoding, which can lead to inconsistent
behavior across platforms. To fix this, move the second open() call that writes
html_output inside the if block and add encoding="utf-8" to both open() calls to
ensure consistent file encoding.
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)

620-620: ⚠️ Potential issue

Fix method name typo.

The static analysis correctly identifies that get_neighbours should be get_neighbors to match the actual method name defined in this class.

-        return await self.get_neighbours(node_id)
+        return await self.get_neighbors(node_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        return await self.get_neighbors(node_id)
🧰 Tools
🪛 Pylint (3.3.7)

[error] 620-620: Instance of 'Neo4jAdapter' has no 'get_neighbours' member; maybe 'get_neighbors'?

(E1101)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/neo4j_driver/adapter.py at line 620,
the method name called is get_neighbours, which is a typo. Change this to
get_neighbors to match the actual method name defined in the class and ensure
the call works correctly.

778-778: 🛠️ Refactor suggestion

Fix dangerous default mutable argument.

Using dict() as a default argument is dangerous because the same dictionary instance is shared across all function calls, potentially causing unexpected side effects.

-    def serialize_properties(self, properties=dict()):
+    def serialize_properties(self, properties=None):
         """
         Convert properties of a node or edge into a serializable format suitable for storage.

         Parameters:
         -----------

             - properties: A dictionary of properties to serialize, defaults to an empty
               dictionary. (default dict())

         Returns:
         --------

             A dictionary with serialized property values.
         """
+        if properties is None:
+            properties = {}
         serialized_properties = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def serialize_properties(self, properties=None):
        """
        Convert properties of a node or edge into a serializable format suitable for storage.

        Parameters:
        -----------

            - properties: A dictionary of properties to serialize, defaults to an empty
              dictionary. (default dict())

        Returns:
        --------

            A dictionary with serialized property values.
        """
        if properties is None:
            properties = {}
        serialized_properties = {}
🧰 Tools
🪛 Ruff (0.11.9)

778-778: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🪛 Pylint (3.3.7)

[warning] 778-778: Dangerous default value dict() (builtins.dict) as argument

(W0102)


[refactor] 778-778: Consider using '{}' instead of a call to 'dict'.

(R1735)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/neo4j_driver/adapter.py at line 778,
the method serialize_properties uses a mutable default argument dict(), which
can lead to shared state across calls. Change the default argument to None and
inside the method initialize properties to an empty dictionary if it is None to
avoid this issue.
cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py (2)

176-185: ⚠️ Potential issue

__main__ runner is broken and will raise AttributeError

The manual runner calls non-existent method names (test_graph_completion_context_simple vs the actual test_graph_completion_extension_context_simple, etc.).
Besides being incorrect, the block is redundant because pytest is the authoritative test runner.

-if __name__ == "__main__":
-    from asyncio import run
-
-    test = TestGraphCompletionRetriever()
-
-    async def main():
-        await test.test_graph_completion_context_simple()
-        await test.test_graph_completion_context_complex()
-        await test.test_get_graph_completion_context_on_empty_graph()
-
-    run(main())
+# The explicit asyncio runner is unnecessary; invoke `pytest` instead.
+# If you want manual execution, ensure the method names match exactly:
+#   await test.test_graph_completion_extension_context_simple()
+# but ideally delete this whole block.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

# The explicit asyncio runner is unnecessary; invoke `pytest` instead.
# If you really need a manual entrypoint, make sure the test methods exist:
#   await test.test_graph_completion_extension_context_simple()
# but in most cases it’s best to delete this whole block and let pytest do its job.
🧰 Tools
🪛 Pylint (3.3.7)

[error] 181-181: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_simple' member

(E1101)


[error] 182-182: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_complex' member

(E1101)


[error] 183-183: Instance of 'TestGraphCompletionRetriever' has no 'test_get_graph_completion_context_on_empty_graph' member

(E1101)

🤖 Prompt for AI Agents
In
cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py
around lines 176 to 185, the __main__ runner calls test methods with incorrect
names that do not exist, causing AttributeError. Remove this manual runner block
entirely since it is redundant and pytest should be used as the test runner
instead.

18-30: 🛠️ Refactor suggestion

Use tmp-based directories to prevent cross-test interference

All three tests write to hard-coded .cognee_system / .data_storage sub-folders under the test directory.
When tests are executed in parallel (pytest -xdist, CI matrix, etc.) they will race on the same folders, causing flaky failures and sporadic DatabaseAlreadyExists/permission errors. Prefer the tmp_path/tmp_path_factory fixtures (or tempfile.TemporaryDirectory) so each test gets an isolated workspace.

- system_directory_path = os.path.join(
-     pathlib.Path(__file__).parent, ".cognee_system/test_graph_context"
- )
+ system_directory_path = tmp_path_factory.mktemp("cognee_system")

(Apply to every path that is currently statically concatenated.)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py
around lines 18 to 30, the test uses hard-coded directories for system and data
storage which can cause race conditions and flaky failures when tests run in
parallel. Modify the code to use pytest's tmp_path or tmp_path_factory fixtures
to create temporary, unique directories for system_root_directory and
data_root_directory instead of static paths. This ensures each test runs in an
isolated workspace and prevents interference.
cognee/tasks/repo_processor/get_local_dependencies.py (1)

233-234: 🛠️ Refactor suggestion

Mutable default argument can lead to cross-call leakage

existing_nodes: list[DataPoint] = {} uses a shared dict between invocations.

-    tree_root: Node, script_path: str, existing_nodes: list[DataPoint] = {}
+    tree_root: Node,
+    script_path: str,
+    existing_nodes: Optional[dict[str, DataPoint]] = None,
 ):
@@
-    for child_node in tree_root.children:
+    if existing_nodes is None:
+        existing_nodes = {}
+
+    for child_node in tree_root.children:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

async def get_local_dependencies(
    tree_root: Node,
    script_path: str,
    existing_nodes: Optional[dict[str, DataPoint]] = None,
) -> AsyncGenerator[DataPoint, None]:
    if existing_nodes is None:
        existing_nodes = {}

    for child_node in tree_root.children:
        …
🧰 Tools
🪛 Ruff (0.11.9)

233-233: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In cognee/tasks/repo_processor/get_local_dependencies.py around lines 233 to
234, the function uses a mutable default argument existing_nodes set to an empty
dictionary, which can cause data to persist across calls unexpectedly. Change
the default value of existing_nodes to None and inside the function initialize
it to an empty list if it is None to avoid cross-call data leakage.
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

68-77: 🛠️ Refactor suggestion

Silently swallowing EntityNotFoundError hides real problems

A bare try / except … : pass will mask unexpected states and makes debugging difficult.
Using contextlib.suppress conveys intent better and keeps the block readable, but logging once is still valuable.

+import contextlib
 ...
-    try:
-        await memory_fragment.project_graph_from_db(
-            graph_engine,
-            node_properties_to_project=properties_to_project,
-            edge_properties_to_project=["relationship_name"],
-            node_type=node_type,
-            node_name=node_name,
-        )
-    except EntityNotFoundError:
-        pass
+    with contextlib.suppress(EntityNotFoundError):
+        await memory_fragment.project_graph_from_db(
+            graph_engine,
+            node_properties_to_project=properties_to_project,
+            edge_properties_to_project=["relationship_name"],
+            node_type=node_type,
+            node_name=node_name,
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

# at the top of the file
import contextlib

# … later in your async function …
    with contextlib.suppress(EntityNotFoundError):
        await memory_fragment.project_graph_from_db(
            graph_engine,
            node_properties_to_project=properties_to_project,
            edge_properties_to_project=["relationship_name"],
            node_type=node_type,
            node_name=node_name,
        )
🧰 Tools
🪛 Ruff (0.11.9)

68-77: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass

Replace with contextlib.suppress(EntityNotFoundError)

(SIM105)

🤖 Prompt for AI Agents
In cognee/modules/retrieval/utils/brute_force_triplet_search.py around lines 68
to 77, the current try-except block silently swallows EntityNotFoundError, which
can hide real issues. Replace the bare except-pass with
contextlib.suppress(EntityNotFoundError) to clearly convey intent, and add a
single log statement to record when this exception occurs, improving readability
and aiding debugging without interrupting flow.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

536-536: 💡 Verification agent

🧩 Analysis chain

Verify the existence of list_collections method.

The code calls self.list_collections() but this method is not defined in the class. This could cause a runtime error.


🏁 Script executed:

#!/bin/bash
# Description: Search for list_collections method definition in ChromaDBAdapter
ast-grep --pattern $'class ChromaDBAdapter {
  $$$
  def list_collections($$$) {
    $$$
  }
  $$$
}'

Length of output: 102


🏁 Script executed:

#!/bin/bash
# Search for the definition of list_collections in the adapter file
rg -n "def list_collections" cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

# Search for any references to list_collections across the entire repository
rg -n "list_collections" .

Length of output: 581


🏁 Script executed:

#!/bin/bash
# Locate class definition and inspect its base classes
rg -n "class ChromaDBAdapter" -n cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

# Show the first 200 lines of the file to see imports and class signature
sed -n '1,200p' cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

Length of output: 6703


🏁 Script executed:

#!/bin/bash
# Locate the file containing VectorDBInterface definition
rg -l "class VectorDBInterface" -n .

# Once located, display its contents to check for list_collections definition
# Replace <path> with the actual path from the previous command
sed -n '1,200p' cognee/infrastructure/databases/vector/vector_db_interface.py

# Search within the interface for any mention of list_collections
rg -n "list_collections" cognee/infrastructure/databases/vector/vector_db_interface.py

Length of output: 4946


Fix missing list_collections implementation in ChromaDBAdapter

It appears ChromaDBAdapter calls self.list_collections() on line 536, but neither the adapter nor VectorDBInterface defines this method—this will lead to an AttributeError at runtime.

• File: cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py
• Line: 536

Suggested remedy: invoke the ChromaDB client’s list_collections method instead, for example:

- collections = await self.list_collections()
+ conn = await self.get_connection()
+ collections = await conn.list_collections()

Or, if you intend to keep self.list_collections(), add and implement that method in the adapter to delegate to the client.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        conn = await self.get_connection()
        collections = await conn.list_collections()
🧰 Tools
🪛 Pylint (3.3.7)

[error] 536-536: Instance of 'ChromaDBAdapter' has no 'list_collections' member

(E1101)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py at line
536, the code calls self.list_collections() but this method is not defined in
the class or its base classes, causing a runtime error. To fix this, replace the
call to self.list_collections() with a direct call to the ChromaDB client’s
list_collections method, or alternatively, define a list_collections method in
ChromaDBAdapter that delegates to the client’s list_collections method.
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)

759-759: ⚠️ Potential issue

Fix method call with missing argument.

The call to delete_data_points is missing the required collection_name argument and should be awaited.

-        self.delete_data_points(data_point_ids)
+        await self.delete_data_points(collection_name, data_point_ids)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

-        self.delete_data_points(data_point_ids)
+        await self.delete_data_points(collection_name, data_point_ids)
🧰 Tools
🪛 Pylint (3.3.7)

[error] 759-759: No value for argument 'data_point_ids' in method call

(E1120)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py at line
759, the method call to delete_data_points is missing the required
collection_name argument and is not awaited. Fix this by adding the appropriate
collection_name argument to the call and prefixing the call with await to
properly handle the asynchronous operation.

745-745: ⚠️ Potential issue

Fix method call with missing argument.

The call to delete_data_points is missing the required data_point_ids argument.

-        return await self.delete_data_points([data_point_id])
+        return await self.delete_data_points(collection_name, [data_point_id])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        return await self.delete_data_points(collection_name, [data_point_id])
🧰 Tools
🪛 Pylint (3.3.7)

[error] 745-745: No value for argument 'data_point_ids' in method call

(E1120)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py at line
745, the call to delete_data_points is missing the required argument
data_point_ids. Fix this by passing the data_point_id inside a list as the
argument to delete_data_points, ensuring the method receives the expected
parameter.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

87-98: 💡 Verification agent

🧩 Analysis chain

Verify method signature consistency with base interface.

Static analysis indicates that the add_node method parameter count differs from the base interface (3 parameters expected vs 2 provided). Please ensure this method signature matches the GraphDBInterface definition.


🏁 Script executed:

#!/bin/bash
# Verify the add_node method signature in the base interface
ast-grep --pattern 'async def add_node($$$)'

Length of output: 14746


Align add_node signature with GraphDBInterface

The GraphDBInterface defines:

async def add_node(self, node_id: str, properties: Dict[str, Any]) -> None

but in networkx/adapter.py (and other adapters) we have:

async def add_node(self, node: DataPoint) -> None

These signatures don’t match and will break the polymorphic contract. Please choose one of the following approaches and apply it consistently across the interface and all adapters:

  • Update the interface to accept a DataPoint:
    --- graph_db_interface.py:166
    - async def add_node(self, node_id: str, properties: Dict[str, Any]) -> None:
    + async def add_node(self, node: DataPoint) -> None:
  • Or update all adapters to split the DataPoint into node_id and properties:
    --- networkx/adapter.py:87
    - async def add_node(self, node: DataPoint) -> None:
    + async def add_node(self, node_id: str, properties: Dict[str, Any]) -> None:
         self.graph.add_node(node_id, **properties)
         await self.save_graph_to_file(self.filename)

• Files needing updates:

  • cognee/infrastructure/databases/graph/graph_db_interface.py: 166
  • cognee/infrastructure/databases/graph/networkx/adapter.py: 87
  • (And the other adapters in falkordb, neo4j_driver, memgraph, kuzu)
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 87-87: Number of parameters was 3 in 'GraphDBInterface.add_node' and is now 2 in overriding 'NetworkXAdapter.add_node' method

(W0221)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/networkx/adapter.py lines 87 to 98, the
add_node method signature uses a single DataPoint parameter, which conflicts
with the GraphDBInterface definition expecting two parameters: node_id (str) and
properties (Dict[str, Any]). To fix this, refactor the add_node method to accept
node_id and properties separately, then update the method body to add the node
using these parameters. Also, ensure this change is applied consistently across
all adapter implementations and the interface to maintain polymorphic
compatibility.
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)

274-276: ⚠️ Potential issue

await missing – points are never uploaded

AsyncQdrantClient.upload_points returns a coroutine.
Without await, the upload is skipped and a warning is raised at runtime.

-            client.upload_points(collection_name=collection_name, points=points)
+            await client.upload_points(collection_name=collection_name, points=points)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        try:
            await client.upload_points(collection_name=collection_name, points=points)
        except UnexpectedResponse as error:
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py around lines
274 to 276, the call to client.upload_points is missing an await keyword,
causing the coroutine to not execute and points not to be uploaded. Add the
await keyword before client.upload_points to properly await the asynchronous
upload operation and ensure points are uploaded as intended.
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (10)

178-195: 🛠️ Refactor suggestion

add_nodes batch query has two breaking issues

  1. n:node.label (and the analogue in the ON MATCH clause) suffers from the same dynamic-label problem as above.
  2. The unwind variable is named node, yet the property access in Cypher uses node.label while the Python dictionary key is "label".

A minimal fix that keeps the single query could be:

 UNWIND $nodes AS node_data
-MERGE (n {id: node.node_id})
-ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp()
-ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp()
+MERGE (n:{label} {id: node_data.node_id})
+ON CREATE SET n += node_data.properties, n.updated_at = timestamp()
+ON MATCH  SET n += node_data.properties, n.updated_at = timestamp()
 RETURN ID(n) AS internal_id, n.id AS nodeId

with

query = query.format(label="{label}")  # or build one query per distinct label

Otherwise split the list by label and run one query per label.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 195-195: Consider using '{"nodes": nodes}' instead of a call to 'dict'.

(R1735)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 178 to 195, the Cypher query incorrectly uses a dynamic label with
n:node.label which is not valid, and the unwind variable 'node' conflicts with
the property access node.label. To fix this, modify the query to use Python
string formatting to inject the label dynamically, for example by formatting the
query with the label placeholder replaced by the actual label string.
Alternatively, split the nodes list by label and run separate queries per label
to avoid dynamic label issues. Adjust the query and the call to self.query
accordingly to ensure the label is correctly applied in the Cypher MERGE
statement.

255-260: ⚠️ Potential issue

Malformed Cypher in delete_node

MATCH (node: {{id: $node_id}}) contains an extra colon and double braces, leading to a syntax error. Property-maps don’t use : after the variable and don’t need sanitising.

-sanitized_id = node_id.replace(":", "_")
-query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node"
-params = {"node_id": sanitized_id}
+query = "MATCH (node {id: $node_id}) DETACH DELETE node"
+params = {"node_id": node_id}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        query = "MATCH (node {id: $node_id}) DETACH DELETE node"
        params = {"node_id": node_id}

        return await self.query(query, params)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 255 to 260, the Cypher query in delete_node is malformed due to an extra
colon and double braces in MATCH (node: {{id: $node_id}}). Remove the colon
after node and replace double braces with single braces to correctly specify the
property map as MATCH (node {id: $node_id}). Also, remove the sanitization of
node_id since it is unnecessary for property values.

697-705: ⚠️ Potential issue

Syntax issues in remove_connection_to_predecessors_of

MATCH (node {id: nid}) is missing $ before nid; additionally, the curly braces after UNWIND should refer to the alias without extra braces.

UNWIND $node_ids AS nid
-MATCH (node {id: nid})-[r]->(predecessor)
+MATCH (node {id: nid})-[r]->(predecessor)
 WHERE type(r) = $edge_label
 DELETE r

(The first line is fine; it is the second that needs parameterization.)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 697 to 705, the Cypher query has syntax issues: the MATCH clause should
use parameterized syntax with `$nid` instead of `nid` without the dollar sign,
and the curly braces after UNWIND should not enclose the alias with extra
braces. Fix the MATCH line to use `(node {id: $nid})` and ensure the UNWIND line
correctly references the parameter without unnecessary braces.

330-350: ⚠️ Potential issue

has_edges mixes internal ids with business ids

The Cypher uses id(a)/id(b) (database-internal ids) whereas the supplied parameters are the application ids (edge.from_node, str(UUID)). The check always fails unless internal and business ids coincide.

Fix by matching on the id property instead:

-MATCH (a)-[r]->(b)
-WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name
+MATCH (a {id: edge.from_node})-[r]->(b {id: edge.to_node})
+WHERE type(r) = edge.relationship_name
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        query = """
            UNWIND $edges AS edge
            MATCH (a {id: edge.from_node})-[r]->(b {id: edge.to_node})
            WHERE type(r) = edge.relationship_name
            RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists
        """

        try:
            params = {
                "edges": [
                    {
                        "from_node": str(edge[0]),
                        "to_node": str(edge[1]),
                        "relationship_name": edge[2],
                    }
                    for edge in edges
                ],
            }

            results = await self.query(query, params)
            return [result["edge_exists"] for result in results]
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 330 to 350, the Cypher query incorrectly matches nodes using internal
database ids with id(a) and id(b), while the parameters use application-level
ids as strings. To fix this, change the query to match nodes by their 'id'
property (e.g., a.id = edge.from_node and b.id = edge.to_node) instead of using
the internal id() function, ensuring the query compares the correct identifiers.

750-777: 🛠️ Refactor suggestion

Dangerous mutable default argument

def serialize_properties(self, properties=dict()): binds a single dict shared by every call. Use None and initialise inside.

-    def serialize_properties(self, properties=dict()):
+    def serialize_properties(self, properties: dict | None = None):
         ...
-        for property_key, property_value in properties.items():
+        for property_key, property_value in (properties or {}).items():
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def serialize_properties(self, properties: dict | None = None):
        """
        Convert property values to a suitable representation for storage.

        Parameters:
        -----------

            - properties: A dictionary of properties to serialize. (default dict())

        Returns:
        --------

            A dictionary of serialized properties.
        """
        serialized_properties = {}

        for property_key, property_value in (properties or {}).items():
            if isinstance(property_value, UUID):
                serialized_properties[property_key] = str(property_value)
                continue

            if isinstance(property_value, dict):
                serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder)
                continue

            serialized_properties[property_key] = property_value

        return serialized_properties
🧰 Tools
🪛 Ruff (0.11.9)

750-750: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🪛 Pylint (3.3.7)

[warning] 750-750: Dangerous default value dict() (builtins.dict) as argument

(W0102)


[refactor] 750-750: Consider using '{}' instead of a call to 'dict'.

(R1735)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 750 to 777, the method serialize_properties uses a mutable default
argument (properties=dict()), which can lead to unexpected behavior due to
shared state across calls. Change the default value of properties to None and
inside the method, initialize it to an empty dictionary if it is None before
proceeding with serialization.

65-70: 🛠️ Refactor suggestion

Add explicit driver-shutdown hook

AsyncGraphDatabase.driver() opens TCP connections that remain open for the entire process lifetime. The class currently never calls await self.driver.close() which will eventually leak sockets (esp. in test-suites that spin many adapters).

 class MemgraphAdapter(GraphDBInterface):
     ...
     def __init__(...):
         ...
         self.driver = driver or AsyncGraphDatabase.driver(
             graph_database_url,
             auth=(graph_database_username, graph_database_password),
             max_connection_lifetime=120,
         )
+
+    async def close(self) -> None:
+        """
+        Gracefully dispose the underlying Neo4j/Memgraph driver.
+        """
+        if self.driver:
+            await self.driver.close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        self.driver = driver or AsyncGraphDatabase.driver(
            graph_database_url,
            auth=(graph_database_username, graph_database_password),
            max_connection_lifetime=120,
        )

    async def close(self) -> None:
        """
        Gracefully dispose the underlying Neo4j/Memgraph driver.
        """
        if self.driver:
            await self.driver.close()
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 65 to 70, the AsyncGraphDatabase.driver() creates TCP connections that
stay open indefinitely because the driver is never explicitly closed. To fix
this, add an asynchronous shutdown method in the class that calls await
self.driver.close() to properly close the connections when the adapter is no
longer needed, preventing socket leaks especially during tests.

651-676: 🛠️ Refactor suggestion

get_connections builds tuples from the wrong object

The loop variable neighbour shadows the record resulting in type errors; moreover it again indexes into the relationship object. Use the fields returned by the query:

for record in predecessors:
-    neighbour = neighbour["relation"]
-    connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2]))
+    rel = record["relation"]
+    connections.append(
+        (record["neighbour"]["id"], {"relationship_name": type(rel).__name__}, record["node"]["id"])
+    )

Do the analogous change for the successors loop.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 663-663: Consider using '{"node_id": str(node_id)}' instead of a call to 'dict'.

(R1735)


[refactor] 664-664: Consider using '{"node_id": str(node_id)}' instead of a call to 'dict'.

(R1735)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py lines 651
to 676, the variable 'neighbour' in the loops shadows the record and incorrectly
indexes into the relationship object. To fix this, rename the loop variable to
avoid shadowing and directly use the fields returned by the query (neighbour,
relation, node) or (node, relation, neighbour) as appropriate. Adjust the tuple
construction to use these fields correctly without re-indexing into the
relationship object. Apply the same fix to both the predecessors and successors
loops.

149-161: ⚠️ Potential issue

Dynamic label injection is syntactically invalid

node:$node_label (and the identical pattern in the ON MATCH clause) is not valid Cypher – label names cannot be parameterised. The statement will raise a Neo4jError: Parameter maps cannot be used for labels.
Embed the label into the query text instead (safe because it comes from type(node).__name__, i.e. trusted code) and drop the unused $node_label parameter.

-        MERGE (node {id: $node_id})
-        ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp()
-        ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp()
+        MERGE (node:{node_label} {{id: $node_id}})
+        ON CREATE SET node += $properties, node.updated_at = timestamp()
+        ON MATCH  SET node += $properties, node.updated_at = timestamp()

and

-        params = {
-            "node_id": str(node.id),
-            "node_label": type(node).__name__,
-            "properties": serialized_properties,
-        }
+        params = {
+            "node_id": str(node.id),
+            "properties": serialized_properties,
+        }
+        query = query.format(node_label=type(node).__name__)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        query = """
        MERGE (node:{node_label} {{id: $node_id}})
        ON CREATE SET node += $properties, node.updated_at = timestamp()
        ON MATCH SET node += $properties, node.updated_at = timestamp()
        RETURN ID(node) AS internal_id,node.id AS nodeId
        """

        params = {
            "node_id": str(node.id),
            "properties": serialized_properties,
        }
        query = query.format(node_label=type(node).__name__)
        return await self.query(query, params)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 149 to 161, the Cypher query incorrectly uses a parameter for the node
label, which is not allowed. To fix this, remove the parameterized label
`$node_label` from the query and instead directly embed the label string from
`type(node).__name__` into the query text. Also, remove the `node_label` entry
from the `params` dictionary since it will no longer be used.

468-478: ⚠️ Potential issue

Relationship extraction returns garbage

result["r"][1] indexes into the relationship object (which is not subscriptable). Return the relationship type captured in Cypher instead.

-MATCH (n {id: $node_id})-[r]-(m)
-RETURN n, r, m
+MATCH (n {id: $node_id})-[r]-(m)
+RETURN n, TYPE(r) AS rel_type, m
...
 return [
-    (result["n"]["id"], result["m"]["id"], {"relationship_name": result["r"][1]})
+    (result["n"]["id"], result["m"]["id"], {"relationship_name": result["rel_type"]})
     for result in results
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        query = """
        MATCH (n {id: $node_id})-[r]-(m)
        RETURN n, TYPE(r) AS rel_type, m
        """

        results = await self.query(query, dict(node_id=node_id))

        return [
            (result["n"]["id"], result["m"]["id"], {"relationship_name": result["rel_type"]})
            for result in results
        ]
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 473-473: Consider using '{"node_id": node_id}' instead of a call to 'dict'.

(R1735)

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 468 to 478, the code incorrectly tries to subscript the relationship
object with result["r"][1], which is not valid. Modify the Cypher query to
explicitly return the relationship type using the type() function, e.g., RETURN
n, r, m, type(r) AS rel_type, and then update the code to use result["rel_type"]
instead of result["r"][1] to correctly extract the relationship type.

726-733: ⚠️ Potential issue

remove_connection_to_successors_of uses dynamic labels incorrectly

MATCH (node:{id})<-[r:{edge_label}]-(successor) is malformed:

  1. Back-ticks are placed around the text {id} instead of the variable.
  2. edge_label is interpolated directly, risking Cypher injection.

Prefer property based matching (consistent with the rest of the adapter):

UNWIND $node_ids AS nid
-MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
+MATCH (successor)-[r]->(node {id: nid})
+WHERE type(r) = $edge_label
 DELETE r

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 726 to 733, the Cypher query uses dynamic labels incorrectly by placing
back-ticks around the literal text `{id}` instead of the variable and directly
interpolating `edge_label`, which risks injection. To fix this, replace dynamic
label usage with property-based matching by removing back-ticks and
parameterizing both node labels and edge labels as properties in the query,
ensuring all variables are passed safely via parameters to prevent injection.

<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: vasilije <[email protected]>
Co-authored-by: Igor Ilic <[email protected]>
Co-authored-by: Vasilije <[email protected]>
Co-authored-by: Igor Ilic <[email protected]>
Co-authored-by: Hande <[email protected]>
Co-authored-by: Matea Pesic <[email protected]>
Co-authored-by: hajdul88 <[email protected]>
Co-authored-by: Daniel Molnar <[email protected]>
Co-authored-by: Diego Baptista Theuerkauf <[email protected]>
Co-authored-by: github-actions[bot] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.