-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: fixes and updates to MCP, retrievers, general fixes #840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<!-- .github/pull_request_template.md --> ## Description Resolve issue with .venv being broken when using docker compose with Cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <[email protected]>
… 1947 (#760) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Add support for UV and for Poetry package management ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Switch typing from str to UUID for NetworkX node_id ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Add both sse and stdio support for Cognee MCP ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…83] (#782) <!-- .github/pull_request_template.md --> ## Description Add log handling options for cognee exceptions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Fix issue with failing versions gh actions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description This PR adds support for the Memgraph graph database following the [graph database integration guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration): - Implemented `MemgraphAdapter` for interfacing with Memgraph. - Updated `get_graph_engine.py` to return MemgraphAdapter when appropriate. - Added a test script:` test_memgraph.py.` - Created a dedicated test workflow: `.github/workflows/test_memgraph.yml.` ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]> Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md --> ## Description refactor: Handle boto3 s3fs dependencies better ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Update LanceDB and rewrite data points to run async ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Boris Arzentar <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description As discussed with @hande-k and Lazar, I've created a short demo to illustrate how to get the pagerank rankings from the knowledge graph given the nx engine. This is a POC, and a first of step towards solving #643 . Please let me know what you think, and how to proceed from here. :) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Added tools to check current cognify and codify status ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
This reverts commit c058219.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Add ability to map column values from relational databases to graph ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Matea Pesic <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Daniel Molnar <[email protected]> Co-authored-by: Diego Baptista Theuerkauf <[email protected]>
## Description /api/v1/responses In this PR manages function calls - search - cognify - prune Next steps - codify ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Signed-off-by: Diego B Theuerkauf <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Diego Baptista Theuerkauf <[email protected]> Co-authored-by: Boris <[email protected]> Co-authored-by: Boris <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Fixes Anthropic bug as reported by the user #812 ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…exist case <!-- .github/pull_request_template.md --> ## Description Fixes pipeline run status migration ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Fixes graph completion limit ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Please make sure all the checkboxes are checked:
|
|
Caution Review failedThe pull request is closed. WalkthroughThis update introduces broad enhancements and refactoring across the codebase. Major changes include: new OpenAI-compatible response APIs, expanded support for graph/vector databases (e.g., Memgraph), improved pipeline and dataset handling, refined error handling, and new example scripts. The frontend and documentation are updated for clarity and new features, while several tests and legacy files are removed or adjusted. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant ResponsesRouter
participant Dispatcher
participant ToolFunction
participant DB/Engine
Client->>API: POST /api/v1/responses (input, tools)
API->>ResponsesRouter: Validate and parse request
ResponsesRouter->>OpenAI API: Call model with input/tools
OpenAI API-->>ResponsesRouter: Response (may include function call)
alt Function call present
ResponsesRouter->>Dispatcher: dispatch_function(tool_call)
Dispatcher->>ToolFunction: handle_search/cognify/prune(...)
ToolFunction->>DB/Engine: Execute async operation
DB/Engine-->>ToolFunction: Result
ToolFunction-->>Dispatcher: Output/result
Dispatcher-->>ResponsesRouter: Tool call output
end
ResponsesRouter-->>Client: Structured response (status, tool_calls, usage)
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure ✨ Finishing Touches
🧪 Generate Unit Tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 17116131 | Triggered | Generic Password | 3b07f3c | examples/database_examples/neo4j_example.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 42
🔭 Outside diff range comments (6)
cognee/exceptions/exceptions.py (1)
38-44: 🛠️ Refactor suggestionUpdate child exception classes to use new parameters
The child exception classes (
ServiceError,InvalidValueError,InvalidAttributeError, etc.) don't forward the newlogandlog_levelparameters to the parent constructor, limiting their logging flexibility.def __init__( self, message: str = "Service is unavailable.", name: str = "ServiceError", status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, + log=True, + log_level="ERROR", ): - super().__init__(message, name, status_code) + super().__init__(message, name, status_code, log, log_level)Apply similar updates to other child exception classes to ensure consistent behavior.
cognee/modules/observability/get_observe.py (1)
1-12: 🛠️ Refactor suggestionGood centralization of observability decorator, but add handling for all Observer enum values
This centralized function for retrieving observability decorators helps eliminate duplicate conditional import logic across the codebase, which is a good practice.
However, the function only explicitly handles
Observer.LANGFUSEwithout providing fallback behavior for other enum values likeObserver.LLMLITEandObserver.LANGSMITH. Consider adding explicit handling for all possible values:from cognee.base_config import get_base_config from .observers import Observer def get_observe(): monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: from langfuse.decorators import observe return observe + elif monitoring == Observer.LLMLITE: + # Import and return LLMLITE observe decorator + # For example: + # from llmlite.decorators import observe + # return observe + elif monitoring == Observer.LANGSMITH: + # Import and return LANGSMITH observe decorator + # For example: + # from langsmith.decorators import observe + # return observe + else: + # Return a no-op decorator as fallback + def noop_observe(name=None, **kwargs): + def decorator(func): + return func + return decorator + return noop_observecognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
170-191:⚠️ Potential issueUpdate convertToSearchTypeOutput function for new search types
The
convertToSearchTypeOutputfunction still handles "INSIGHTS", "SUMMARIES", and "CHUNKS" cases, but none of these match the current search options in the dropdown ("GRAPH_COMPLETION" and "RAG_COMPLETION").The function needs to be updated to handle the new search types:
function convertToSearchTypeOutput(systemMessages: any[], searchType: string): string { if (systemMessages.length > 0 && typeof(systemMessages[0]) === "string") { return systemMessages[0]; } switch (searchType) { - case 'INSIGHTS': + case 'GRAPH_COMPLETION': return systemMessages.map((message: InsightMessage) => { const [node1, relationship, node2] = message; if (node1.name && node2.name) { return `${node1.name} ${relationship.relationship_name} ${node2.name}.`; } return ''; }).join('\n'); - case 'SUMMARIES': + case 'RAG_COMPLETION': return systemMessages.map((message: { text: string }) => message.text).join('\n'); case 'CHUNKS': return systemMessages.map((message: { text: string }) => message.text).join('\n'); default: return ""; } }cognee/infrastructure/databases/graph/networkx/adapter.py (1)
267-279:⚠️ Potential issueMixed identifier types in edge-removal helpers
The public API now demands
list[UUID], but deeper in the codeself.graph.has_edge()is called with theedge_labelas key (OK) while the stored edge endpoints were converted tostrinadd_edges(). This mismatch silently leaves “string” nodes dangling.Either:
- Stop casting UUIDs to
strinadd_edges, or- Cast incoming
node_idback tostrhere.Failing to normalise will corrupt the graph.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (2)
85-103:⚠️ Potential issueCollection may not exist –
create_data_pointsshould ensure it
create_data_pointsfetches the collection first:collection = await self.get_collection(collection_name)If the collection is missing, a
CollectionNotFoundErroris raised.
All external callers (e.g.index_data_points) rely on this method and do not create the collection beforehand, leading to runtime errors.Add an existence check or simply call
await self.create_collection(collection_name)at the start:+await self.create_collection(collection_name) collection = await self.get_collection(collection_name)
90-98: 🛠️ Refactor suggestionInefficient
index()lookup turns O(n²)Inside
convert_to_weaviate_data_points:vector = data_vectors[data_points.index(data_point)]
list.indexis O(n). For large batches this becomes quadratic.
Iterate withenumerateinstead:-data_points = [convert_to_weaviate_data_points(dp) for dp in data_points] +data_points = [ + DataObject( + uuid=dp.id, + properties={**dp.model_dump(), "uuid": str(dp.id), "id": None}, + vector=vec, + ) + for dp, vec in zip(data_points, data_vectors) +]
♻️ Duplicate comments (2)
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
624-636: Interface-mismatch recurrence
get_node(s)repeats theUUIDparameter divergence noted earlier. Please align with whichever identifier strategy you settle on.cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
81-98: Dynamic label & property placeholders render invalid Cypher
ON CREATE SET n:node.label(and the similarON MATCH) tries to usenode.labelas a literal label. This suffers from the same problem as above and will break batch insertion.Additionally, because you overwrite the list
nodeslater (nodes = [ {...} for node in nodes ]) the outer reference is lost – consider renaming the temporary variable to avoid confusion.
🧹 Nitpick comments (60)
cognee/modules/engine/models/__init__.py (1)
6-6: Consider addressing the unused import linter warning.The import for
ColumnValueis correctly added to make it part of the package's public API, but it's flagged as unused by the static analysis tool.To silence the linter warning while maintaining intended behavior, consider one of these approaches:
- from .ColumnValue import ColumnValue + from .ColumnValue import ColumnValue # noqa: F401 - Exported as part of public APIOr define an
__all__list:from .Entity import Entity from .EntityType import EntityType from .TableRow import TableRow from .TableType import TableType from .node_set import NodeSet from .ColumnValue import ColumnValue + __all__ = ["Entity", "EntityType", "TableRow", "TableType", "NodeSet", "ColumnValue"]🧰 Tools
🪛 Ruff (0.11.9)
6-6:
.ColumnValue.ColumnValueimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
cognee/modules/pipelines/operations/__init__.py (1)
1-1: Consider addressing the unused import linter warning.The import for
log_pipeline_run_initiatedmakes this function part of the package's public API, consistent with other similar functions, but it's flagged as unused by the static analysis tool.To silence the linter warning while maintaining intended behavior, consider one of these approaches:
- from .log_pipeline_run_initiated import log_pipeline_run_initiated + from .log_pipeline_run_initiated import log_pipeline_run_initiated # noqa: F401 - Exported as part of public APIOr define an
__all__list:from .log_pipeline_run_initiated import log_pipeline_run_initiated from .log_pipeline_run_start import log_pipeline_run_start from .log_pipeline_run_complete import log_pipeline_run_complete from .log_pipeline_run_error import log_pipeline_run_error from .pipeline import cognee_pipeline + __all__ = [ + "log_pipeline_run_initiated", + "log_pipeline_run_start", + "log_pipeline_run_complete", + "log_pipeline_run_error", + "cognee_pipeline" + ]🧰 Tools
🪛 Ruff (0.11.9)
1-1:
.log_pipeline_run_initiated.log_pipeline_run_initiatedimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
cognee/modules/data/methods/__init__.py (1)
10-10: Add imported function to__all__listYou've imported
get_unique_dataset_idto make it part of the package's public API, which is good for module organization. However, the static analyzer has flagged this as potentially unused. To explicitly mark it as part of the public API and silence the linter warning, consider adding an__all__list to the file.# Create from .create_dataset import create_dataset # Get from .get_dataset import get_dataset from .get_datasets import get_datasets from .get_datasets_by_name import get_datasets_by_name from .get_dataset_data import get_dataset_data from .get_data import get_data from .get_unique_dataset_id import get_unique_dataset_id +__all__ = [ + "create_dataset", + "get_dataset", + "get_datasets", + "get_datasets_by_name", + "get_dataset_data", + "get_data", + "get_unique_dataset_id", + "delete_dataset", + "delete_data", +] # Delete from .delete_dataset import delete_dataset from .delete_data import delete_data🧰 Tools
🪛 Ruff (0.11.9)
10-10:
.get_unique_dataset_id.get_unique_dataset_idimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
cognee/api/v1/add/add.py (1)
2-6: Reordered importsThe import statements have been reordered without changing functionality. While this is fine, consider following a consistent import ordering convention across the codebase (e.g., standard library imports first, third-party imports second, local imports third) and documenting this in your coding guidelines.
cognee-frontend/src/utils/fetch.ts (1)
4-4: Using localhost instead of IP address is more readableChanging from '127.0.0.1' to 'localhost' improves readability while maintaining the same functionality.
Consider using an environment variable for the API base URL instead of hardcoding it, which would make it easier to configure for different environments (development, testing, production):
- return global.fetch('http://localhost:8000/api' + url, { + const baseUrl = process.env.NEXT_PUBLIC_API_BASE_URL || 'http://localhost:8000/api'; + return global.fetch(baseUrl + url, {cognee/infrastructure/databases/graph/graph_db_interface.py (1)
88-88: Consider adding explicit error handling for commit failuresWhile logging level has been changed to
debug, note that unlike the previous error blocks, there's no explicit handling of commit failures (such as retry logic or transaction management). Consider whether additional error handling would be appropriate here.try: await session.commit() except Exception as e: logger.debug(f"Error committing session: {e}") + # Consider adding retry logic or additional error handling here + # await session.rollback() # Ensure clean state if neededcognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
19-28: Consider handling empty dataset nameThe code doesn't check if
dataset.nameis empty before callinggetExplorationGraphUrl. If an empty name is provided, this could result in unexpected errors.const exploreData = useCallback(() => { + if (!dataset.name) { + setError(new Error('Dataset name is required')); + return; + } getExplorationGraphUrl(dataset) .then((graphHtml) => { setError(null); setGraphHtml(graphHtml); }) .catch((error) => { setError(error); }); }, [dataset]);cognee/api/v1/config/config.py (1)
158-158: Inconsistent exception handling pattern in config methodsWhile removing the
message=keyword argument makes this line use standard Python exception syntax, it creates inconsistency with other similar methods in this file. Other methods likeset_llm_config,set_relational_db_config, etc. useInvalidAttributeErrorwith amessageparameter.Consider standardizing the exception handling approach across all similar methods either by:
-raise AttributeError(f"'{key}' is not a valid attribute of the config.") +raise InvalidAttributeError(message=f"'{key}' is not a valid attribute of the config.")Or updating all other methods to use standard exceptions without keyword arguments for consistency.
cognee/modules/data/methods/get_unique_dataset_id.py (1)
5-6: Consider removing async keyword for synchronous functionThis function is marked as async but contains no await statements. Since UUID generation is a synchronous operation, the async keyword is unnecessary and may mislead callers into using await with this function.
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: +def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}")cognee/base_config.py (1)
11-11: Consider using a more specific type annotationThe type annotation for monitoring_tool is
object, which is very general. Consider usingObserveras the type to provide better type checking and code completion.- monitoring_tool: object = Observer.LANGFUSE + monitoring_tool: Observer = Observer.LANGFUSEcognee/modules/observability/observers.py (1)
1-9: Consider adding enum validation methodFor added robustness, you might want to include a static method to validate string values against the enum. This would be helpful when processing configuration from external sources.
class Observer(str, Enum): """Monitoring tools""" LANGFUSE = "langfuse" LLMLITE = "llmlite" LANGSMITH = "langsmith" + + @classmethod + def is_valid(cls, value: str) -> bool: + """Check if a string value is a valid observer type""" + return value in [item.value for item in cls]cognee/modules/engine/models/ColumnValue.py (1)
5-7: Consider adding field type annotations and documentationThe string fields would benefit from more detailed type annotations like Field() with descriptions to better document their purpose.
class ColumnValue(DataPoint): - name: str - description: str - properties: str + name: str = Field(description="The name of the column") + description: str = Field(description="Description of the column's purpose") + properties: str = Field(description="The properties/values of the column used for indexing")cognee/tests/test_relational_db_migration.py (1)
115-193: Consider making expected counts more maintainableWith multiple hard-coded count values throughout the test, future schema changes might require updates in multiple places. Consider defining these as constants at the top of the file to improve maintainability.
import json import pathlib import os from cognee.infrastructure.databases.graph import get_graph_engine from cognee.infrastructure.databases.relational import ( get_migration_relational_engine, create_db_and_tables as create_relational_db_and_tables, ) from cognee.infrastructure.databases.vector.pgvector import ( create_db_and_tables as create_pgvector_db_and_tables, ) from cognee.tasks.ingestion import migrate_relational_database from cognee.modules.search.types import SearchType import cognee +# Expected counts for graph elements +EXPECTED_DISTINCT_NODE_COUNT = 12 +EXPECTED_DISTINCT_EDGE_COUNT = 15 + +# Provider-specific expected counts +SQLITE_EXPECTED_NODE_COUNT = 543 +SQLITE_EXPECTED_EDGE_COUNT = 1317 +POSTGRES_EXPECTED_NODE_COUNT = 522 +POSTGRES_EXPECTED_EDGE_COUNT = 961Then use these constants in the assertions.
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2)
108-110: Good error handling enhancementAdding error handling to restore the input text if the fetch request fails is a great UX improvement, ensuring users don't lose their message on network failures.
Consider enhancing this further with visual feedback to the user when a fetch error occurs:
.catch(() => { setInputValue(inputValue); + // Add a toast/notification to inform the user about the error + // e.g., showErrorNotification("Failed to send message. Please try again."); });
149-152: Fix typo in Stack alignment attributeThere's a typo in the Stack component's align prop:
align="end/"should likely bealign="end".-<Stack orientation="horizontal" align="end/" gap="2"> +<Stack orientation="horizontal" align="end" gap="2">alembic/versions/1d0bb7fede17_add_pipeline_run_status.py (1)
13-14: Remove unused importsThe static analysis tool correctly identified that
PipelineRunandPipelineRunStatusare imported but not used in this file.- from cognee.modules.pipelines.models.PipelineRun import PipelineRun, PipelineRunStatus + from cognee.modules.pipelines.models.PipelineRun import PipelineRunStatusOr remove both imports if
PipelineRunStatusisn't used either:- from cognee.modules.pipelines.models.PipelineRun import PipelineRun, PipelineRunStatus🧰 Tools
🪛 Ruff (0.11.9)
13-13:
cognee.modules.pipelines.models.PipelineRun.PipelineRunimported but unusedRemove unused import
(F401)
13-13:
cognee.modules.pipelines.models.PipelineRun.PipelineRunStatusimported but unusedRemove unused import
(F401)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
6-22: Clean implementation of pipeline run initiation logging.This function effectively creates and persists a new pipeline run record with the
DATASET_PROCESSING_INITIATEDstatus. The implementation follows good practices:
- Properly typed parameters
- Appropriate UUID generation
- Correct use of async database session
- Clean error handling with proper session commit
One improvement suggestion would be to add docstrings to clearly document the function's purpose, parameters, and return value.
async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID): + """ + Create and persist a new pipeline run record with DATASET_PROCESSING_INITIATED status. + + Args: + pipeline_id: Unique identifier for the pipeline + pipeline_name: Name of the pipeline to run + dataset_id: UUID of the dataset being processed + + Returns: + PipelineRun: The created pipeline run record + """ pipeline_run = PipelineRun(cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)
37-39: Add explanation for the expected result calculationThe hardcoded expected result (4586471424) is not immediately obvious. Consider adding a comment explaining how this value is calculated from the pipeline tasks.
- final_result = 4586471424 + # Expected calculation: ((5 + 7) * 2)^7 = (12 * 2)^7 = 24^7 = 4586471424 + final_result = 4586471424examples/database_examples/kuzu_example.py (1)
1-6: Remove unused importThe
osmodule is imported but not used in the code.-import os import pathlib import asyncio import cognee from cognee.modules.search.types import SearchType🧰 Tools
🪛 Ruff (0.11.9)
1-1:
osimported but unusedRemove unused import:
os(F401)
cognee/tasks/temporal_awareness/index_graphiti_objects.py (1)
63-66: Consider more robust edge data accessAccessing the relationship name at a fixed index position (edge[2]) makes the code brittle to changes in the underlying data structure. Consider using a more descriptive approach to access this data.
- edge_types = Counter( - edge[2] # The edge key (relationship name) is at index 2 - for edge in edges_data - ) + # Access relationship name (at index 2) from edge data + edge_types = Counter( + relationship_name # More descriptive variable name + for _, _, relationship_name, *_ in edges_data + )examples/database_examples/falkordb_example.py (3)
1-1: Remove unused importThe
osmodule is imported but not used in the code. Consider removing this import to maintain clean dependencies.-import os import pathlib import asyncio import cognee from cognee.modules.search.types import SearchType🧰 Tools
🪛 Ruff (0.11.9)
1-1:
osimported but unusedRemove unused import:
os(F401)
20-30: Add more guidance on directory configurationThe example sets up data directories relative to the script location. Consider adding more explanation about how users should adapt these paths for their own environment, especially for production use cases.
# Set up data directories for storing documents and system files -# You should adjust these paths to your needs +# NOTE: These paths are relative to the example script location. +# For production use, you should: +# - Use absolute paths +# - Ensure the directories are persistent and have appropriate permissions +# - Consider environment-specific configurations current_dir = pathlib.Path(__file__).parent data_directory_path = str(current_dir / "data_storage") cognee.config.data_root_directory(data_directory_path)
83-85: Clarify cleanup commentThe commented-out cleanup code might confuse users. Consider adding a note explaining when it would be appropriate to uncomment and use these lines.
# Clean up (optional) +# Uncomment the following lines if you want to remove all data after running the example +# Note: This will delete all datasets and system data created by Cognee # await cognee.prune.prune_data() # await cognee.prune.prune_system(metadata=True)cognee/tasks/ingestion/migrate_relational_database.py (1)
111-117: Improve ColumnValue node properties for better searchabilityThe current properties field is a simple space-separated string, which might not be optimal for semantic searches. Consider using a more structured format like JSON for better searchability and clarity.
column_node = ColumnValue( id=uuid5(NAMESPACE_OID, name=column_node_id), name=column_node_id, - properties=f"{key} {value} {table_name}", + properties=f"{{\"column\": \"{key}\", \"value\": \"{value}\", \"table\": \"{table_name}\"}}", - description=f"Column name={key} and value={value} from column from table={table_name}", + description=f"Column '{key}' with value '{value}' from table '{table_name}'", )examples/database_examples/milvus_example.py (3)
29-29: Use consistent pathlib approach instead of os.pathSince you're already using pathlib for directory path management, consider using it consistently throughout the code rather than mixing with os.path.
-local_milvus_db_path = os.path.join(cognee_directory_path, "databases", "milvus.db") +local_milvus_db_path = str(pathlib.Path(cognee_directory_path) / "databases" / "milvus.db")
46-53: Enhance sample text with Cognee integration detailsThe sample text describes Milvus but doesn't mention how it integrates with Cognee specifically. Consider adding information about the integration to make the example more informative.
# Add sample text to the dataset sample_text = """Milvus is an open-source vector database built to power AI applications. It is designed for storing, indexing, and querying large-scale vector datasets. Milvus implements efficient approximate nearest neighbor search algorithms. It features advanced indexing techniques like HNSW, IVF, PQ, and more. Milvus supports hybrid searches combining vector similarity with scalar filtering. -The system can be deployed standalone, in clusters, or through a cloud service.""" +The system can be deployed standalone, in clusters, or through a cloud service. +When integrated with Cognee, Milvus provides fast vector similarity search capabilities +that enable semantic search, knowledge retrieval, and AI-powered insights generation."""
83-85: Clarify cleanup commentThe commented-out cleanup code might confuse users. Consider adding a note explaining when it would be appropriate to uncomment and use these lines.
# Clean up (optional) +# Uncomment the following lines if you want to remove all data after running the example +# Note: This will delete all datasets and system data created by Cognee # await cognee.prune.prune_data() # await cognee.prune.prune_system(metadata=True)notebooks/cognee_openai_compatable_demo.ipynb (3)
28-31: Fix URL redirection by using the correct endpoint.The log shows a 307 redirect from
/api/v1/responsesto/api/v1/responses/(with trailing slash). Using the correct URL directly would save an HTTP request.-client = OpenAI(api_key="COGNEE_API_KEY", base_url="http://localhost:8000/api/v1/") +client = OpenAI(api_key="COGNEE_API_KEY", base_url="http://localhost:8000/api/v1/") client.responses.create( model="cognee-v1", input="Cognify: Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval.", )Note: While the URL in the client initialization is correct, the OpenAI client is still making the request without a trailing slash. This appears to be an internal implementation detail of the OpenAI client rather than an issue with your code.
1-109: Add markdown cells to improve notebook documentation.The notebook lacks descriptive markdown cells that would help users understand what each cell is demonstrating. Adding markdown cells between code cells would significantly improve clarity.
Consider adding:
- An introductory markdown cell at the beginning explaining the purpose of the notebook
- A markdown cell before each code example explaining what it demonstrates
- A markdown cell after each output explaining the structure of the response
Example structure:
# Cognee OpenAI-Compatible API Demo This notebook demonstrates how to use Cognee's OpenAI-compatible API to perform various operations. ## Setup and Cognify Example The following cell demonstrates how to initialize an OpenAI client pointed at a Cognee server and run a cognify operation. [Code cell 1] ## Search Example The following cell demonstrates how to perform a search operation using the same API interface. [Code cell 2]
55-62: Extract client initialization to avoid code duplication.The OpenAI client initialization is duplicated in both cells. Consider extracting this to a reusable cell at the beginning of the notebook.
You could create a new first cell that initializes the client, then use that client instance in subsequent cells:
import os from openai import OpenAI # Get API key from environment variable with a fallback for demo purposes api_key = os.environ.get("COGNEE_API_KEY", "demo_api_key") # Initialize client that will be reused throughout the notebook client = OpenAI(api_key=api_key, base_url="http://localhost:8000/api/v1/")Then in subsequent cells, you would just use the existing
clientvariable instead of reinitializing it.examples/database_examples/chromadb_example.py (2)
1-1: Remove unused import.The
osmodule is imported but never used in this example. Consider removing it for cleaner code.-import os import pathlib import asyncio import cognee🧰 Tools
🪛 Ruff (0.11.9)
1-1:
osimported but unusedRemove unused import:
os(F401)
19-26: Add configuration flexibility for different environments.The ChromaDB URL is hardcoded which limits configuration flexibility. Consider adding a comment explaining how to adapt this for different environments.
# Configure ChromaDB as the vector database provider cognee.config.set_vector_db_config( { - "vector_db_url": "http://localhost:8000", # Default ChromaDB server URL + "vector_db_url": "http://localhost:8000", # Default local ChromaDB server URL "vector_db_key": "", # ChromaDB doesn't require an API key by default "vector_db_provider": "chromadb", # Specify ChromaDB as provider } ) + +# Note: For production environments, you might want to: +# 1. Load the URL from environment variables +# 2. Use a different port or hostname +# 3. Add authentication if using a managed ChromaDB instancecognee/modules/pipelines/operations/run_tasks.py (2)
23-25: Update docstring to document the new context parameter.You've added a new
contextparameter, but there's no docstring explaining its purpose, expected structure, or usage. This would be helpful for developers using this API.async def run_tasks_with_telemetry( tasks: list[Task], data, user: User, pipeline_name: str, context: dict = None ): + """ + Run a list of tasks with telemetry tracking. + + Args: + tasks: List of Task objects to execute + data: The data to process + user: The user running the tasks + pipeline_name: Name of the pipeline for telemetry and logging + context: Optional dictionary containing contextual information to be passed to tasks + that support receiving context + + Yields: + Results from the executed tasks + """ config = get_current_settings()
71-78: Update docstring to document the new context parameter.Similar to the
run_tasks_with_telemetryfunction, this function also needs a docstring update to include information about the newcontextparameter.async def run_tasks( tasks: list[Task], dataset_id: UUID = uuid4(), data: Any = None, user: User = None, pipeline_name: str = "unknown_pipeline", context: dict = None, ): + """ + Run a list of tasks with pipeline run logging. + + Args: + tasks: List of Task objects to execute + dataset_id: UUID for the dataset being processed + data: The data to process + user: The user running the tasks (defaults to the system default user if None) + pipeline_name: Name of the pipeline for logging and identification + context: Optional dictionary containing contextual information to be passed to tasks + that support receiving context + + Yields: + Pipeline run status objects + """ pipeline_id = uuid5(NAMESPACE_OID, pipeline_name)🧰 Tools
🪛 Ruff (0.11.9)
73-73: Do not perform function call
uuid4in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
examples/database_examples/pgvector_example.py (1)
1-1: Unused import.The 'os' module is imported but not used in this file. Unlike the Qdrant example which uses os.getenv(), this example uses hardcoded credentials.
-import os import pathlib import asyncio import cognee from cognee.modules.search.types import SearchType🧰 Tools
🪛 Ruff (0.11.9)
1-1:
osimported but unusedRemove unused import:
os(F401)
cognee/api/v1/responses/routers/get_responses_router.py (2)
36-42: Client can be reused instead of re-instantiated per request
openai.AsyncOpenAIcreation is cheap but not free. Re-creating it on every request causes unnecessary overhead and UDP socket exhaustion under load. Cache it at module-level or store it inapp.state.- def _get_model_client(): + _client: Optional[openai.AsyncOpenAI] = None + + def _get_model_client(): """ Get appropriate client based on model name """ - llm_config = get_llm_config() - return openai.AsyncOpenAI(api_key=llm_config.llm_api_key) + nonlocal _client + if _client is None: + llm_config = get_llm_config() + _client = openai.AsyncOpenAI(api_key=llm_config.llm_api_key) + return _client
72-75:Depends(...)false-positive from Ruff B008 – suppress or refactor
FastAPI relies onDependsas a sentinel object, not a function call. Add# noqa: B008or configure Ruff to ignore FastAPI-specific patterns to keep CI green.🧰 Tools
🪛 Ruff (0.11.9)
74-74: Do not perform function call
Dependsin argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
examples/database_examples/weaviate_example.py (1)
42-45: Bulk pruning in examples – add a disclaimer
await cognee.prune.prune_data()andprune_system(metadata=True)irreversibly delete user data. Consider adding a loud comment or runtime prompt so newcomers don’t copy-paste this into production by accident.cognee/tests/test_memgraph.py (1)
35-37: Relative path may break when tests are executed from project root
explanation_file_pathis built viaos.path.join(pathlib.Path(__file__).parent, "test_data/…"), buttest_datais a directory, not a file. UsePath/.joinpath()and call.resolve()to ensure cross-platform correctness:explanation_file_path = ( pathlib.Path(__file__).parent / "test_data" / "Natural_language_processing.txt" ).resolve()cognee/modules/pipelines/operations/run_tasks_base.py (1)
35-37: Simplify conditional context appending logicThe current approach of checking a condition and then appending is straightforward but could be more concise.
- if has_context: - args.append(context) + args.extend([context] if has_context else [])Alternatively, for even more clarity, you could use a guard clause approach:
if has_context and context is not None: args.append(context)cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
66-73: Use contextlib.suppress for cleaner error handlingThe current try-except-pass pattern can be simplified using
contextlib.suppress.+from contextlib import suppress # Later in the code: - try: - await memory_fragment.project_graph_from_db( - graph_engine, - node_properties_to_project=properties_to_project, - edge_properties_to_project=["relationship_name"], - ) - except EntityNotFoundError: - pass + with suppress(EntityNotFoundError): + await memory_fragment.project_graph_from_db( + graph_engine, + node_properties_to_project=properties_to_project, + edge_properties_to_project=["relationship_name"], + )🧰 Tools
🪛 Ruff (0.11.9)
66-73: Use
contextlib.suppress(EntityNotFoundError)instead oftry-except-passReplace with
contextlib.suppress(EntityNotFoundError)(SIM105)
examples/data/car_and_tech_companies.txt (1)
17-17: Fix grammatical error in text descriptionThere's a grammatical error in the text - the plural determiner "these" doesn't agree with the singular noun "manufacturer".
-Each of these car manufacturer contributes to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence. +Each of these car manufacturers contributes to Germany's reputation as a leader in the global automotive industry, showcasing a blend of innovation, performance, and design excellence.🧰 Tools
🪛 LanguageTool
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
Dockerfile (2)
8-8: Consider enabling bytecode compilation for productionThe commented ENV for bytecode compilation could improve runtime performance. Consider enabling it for production builds.
-# ENV UV_COMPILE_BYTECODE=1 +ENV UV_COMPILE_BYTECODE=1
50-52: Clean up commented linesThese lines appear to be from the transition to the current build process and are no longer needed.
-# COPY --from=uv /app/.venv /app/.venv -# COPY --from=uv /root/.local /root/.localcognee/api/v1/responses/dispatch_function.py (1)
58-68: Consider using Enum validation directlyInstead of validating against string lists, consider using the SearchType enum's built-in validation capabilities.
- valid_search_types = ( - search_tool["parameters"]["properties"]["search_type"]["enum"] - if search_tool - else ["INSIGHTS", "CODE", "GRAPH_COMPLETION", "SEMANTIC", "NATURAL_LANGUAGE"] - ) - - if search_type_str not in valid_search_types: - logger.warning(f"Invalid search_type: {search_type_str}, defaulting to GRAPH_COMPLETION") - search_type_str = "GRAPH_COMPLETION" - - query_type = SearchType[search_type_str] + try: + query_type = SearchType[search_type_str] + except (KeyError, ValueError): + logger.warning(f"Invalid search_type: {search_type_str}, defaulting to GRAPH_COMPLETION") + query_type = SearchType.GRAPH_COMPLETIONcognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)
181-196: Ensure client is closed in all execution pathsWhile there's a
finallyblock that closes the client, it appears there's also a separate client close in the happy path. This could be redundant or confusing.try: client = self.get_qdrant_client() results = await client.search( collection_name=collection_name, query_vector=models.NamedVector( name="text", vector=query_vector if query_vector is not None else (await self.embed_data([query_text]))[0], ), limit=limit if limit > 0 else None, with_vectors=with_vector, ) - await client.close() return [cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
241-252: I/O efficiency &with_vectorflag are ignored
- You call
closest_items.all()inside the loop (241) — this runs the query result materialisation on every iteration. Move it outside:- for vector in closest_items.all(): + for vector in closest_items:Since
session.execute()already returns aResult, you can iterate directly.
- The
with_vectorparameter is accepted by the signature but never honoured. Either forward it in theselect()(i.e. addPGVectorDataPoint.c.vectorwhenwith_vector is True) or drop the argument to avoid API confusion.cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (2)
176-183: Early exit forlimit<=0is good – redundantNonebranch can be removed
Because you return early, the later ternary (limit if limit > 0 else None) is unreachable. Trim it to keep intent crystal-clear.
220-232: Consolidate duplicate “collection not found” handling
You handle missing collections in three separate blocks (187-192, 219-223, 226-230). This bloats the method and risks divergence. Refactor to:
- A single existence check at the top (
if not has_collection: return [])- A single
except CollectionNotExistExceptionhandler.This keeps the happy-path tight and the error path obvious.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
218-230: Potential O(N²) duplication in neighbour aggregation
predecessors + successorsmay contain the same neighbour twice when a node has both in- and out-edges to the target. Consider de-duplicating:return list({n["id"]: n for n in (predecessors + successors)}.values())cognee-mcp/src/server.py (2)
162-166: Exception chaining improves debuggabilityStatic analysis (B904) flags re-raising bare exceptions. Add context while preserving the traceback:
- except Exception as e: - logger.error("Cognify process failed.") - raise ValueError(f"Failed to cognify: {str(e)}") + except Exception as e: + logger.error("Cognify process failed.") + raise ValueError(f"Failed to cognify: {e}") from e🧰 Tools
🪛 Ruff (0.11.9)
166-166: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
221-235: Background task error leaks are swallowed
codify_tasklogs failures but the outercreate_taskignores the task handle, so unhandled exceptions will be dumped toasynciodefault handler and lost for the caller. Consider storing the task and adding:task.add_done_callback(lambda t: logger.error(t.exception()) if t.exception() else None)or gathering tasks inside a supervisor coroutine.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
151-166:limit=0repurposed as “return all” but negative /Noneare uncheckedEdge-cases:
- Passing
limit=None(allowed bybatch_search) bubbles into the query unchanged.- Negative limits are not rejected.
Add validation:
if limit is None or limit <= 0: limit = await collection.count_rows()
205-211: Serial deletes are O(N) round-tripsDeleting one ID at a time will hammer the DB for large lists. LanceDB supports SQL-like predicates; you can delete in one shot:
- for data_point_id in data_point_ids: - await collection.delete(f"id = '{data_point_id}'") + ids_tuple = tuple(data_point_ids) + await collection.delete(f"id IN {ids_tuple}")Huge latency reduction and atomicity.
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (2)
430-445: Mutable default argument violates best-practice
def serialize_properties(self, properties=dict()):creates a single shared dict across calls.-def serialize_properties(self, properties=dict()): +def serialize_properties(self, properties: dict | None = None): serialized_properties = {} - for property_key, property_value in properties.items(): + for property_key, property_value in (properties or {}).items():for the implementation logic once the default is fixed
🧰 Tools
🪛 Ruff (0.11.9)
430-430: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
594-604: Hard-coded labels ‘Node’ & ‘EDGE’ break on heterogeneous graphsMetric queries assume every vertex has label
Nodeand every relationshipEDGE, which contradicts earlier dynamic labelling (type(node).__name__/ arbitraryrelationship_name). This will lead to 0-row results and inaccurate metrics.Consider:
- Removing label filters altogether.
- Dynamically building label strings via
CALL db.labels()/CALL db.relationshipTypes()or using the helper methods you already wrote (get_node_labels_string,get_relationship_labels_string).cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
128-133: Race-condition possibility – ensure collection actually exists
create_data_pointscallscreate_collectionand immediately callsget_collection.
If two coroutines create the same collection concurrently the second call may still raiseCollectionNotFoundError. Consider retrying once or makingcreate_collectionidempotent (it currently is) but wait for collection creation to finish before proceeding.
317-321: Guard against mixed return types more robustly
list_collectionshandles two variants (object.nameordict["name"]). Unexpected types will raiseAttributeError/KeyError.-return [ - collection.name if hasattr(collection, "name") else collection["name"] - for collection in collections -] +names = [] +for coll in collections: + try: + names.append(coll.name) + except AttributeError: + names.append(coll["name"]) +return namesNot critical, but prevents hidden errors when the Chroma client changes its return type again.
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)
37-47: Repeatedawait client.connect()incurs overhead
get_clientcallsawait self.client.connect()on every invocation. The underlying SDK is idempotent but performs an HTTP round-trip.
Cache aself._connectedflag or callconnect()once in__init__to avoid unnecessary latency.cognee/api/v1/responses/routers/default_tools.py (1)
38-59: Cognify tool definition is complete but could use required parameters.The cognify tool definition is properly structured with appropriate parameter types and descriptions. However, consider specifying which parameters, if any, should be required, similar to how "search_query" is required for the search tool.
}, + "required": [], },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (11)
.github/actions/cognee_setup/action.ymlis excluded by!**/*.yml.github/workflows/db_examples_tests.ymlis excluded by!**/*.yml.github/workflows/python_version_tests.ymlis excluded by!**/*.yml.github/workflows/test_memgraph.ymlis excluded by!**/*.yml.github/workflows/test_suites.ymlis excluded by!**/*.ymlassets/graph_visualization.pngis excluded by!**/*.png,!**/*.pngcognee-mcp/pyproject.tomlis excluded by!**/*.tomlcognee-mcp/uv.lockis excluded by!**/*.lock,!**/*.lockpoetry.lockis excluded by!**/*.lock,!**/*.lockpyproject.tomlis excluded by!**/*.tomluv.lockis excluded by!**/*.lock,!**/*.lock
📒 Files selected for processing (104)
CONTRIBUTING.md(1 hunks)Dockerfile(1 hunks)README.md(1 hunks)alembic/versions/1d0bb7fede17_add_pipeline_run_status.py(1 hunks)alembic/versions/482cd6517ce4_add_default_user.py(1 hunks)assets/graph_visualization.html(0 hunks)cognee-frontend/src/app/page.tsx(3 hunks)cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx(1 hunks)cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx(1 hunks)cognee-frontend/src/app/wizard/WizardPage.tsx(1 hunks)cognee-frontend/src/modules/datasets/cognifyDataset.ts(1 hunks)cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts(1 hunks)cognee-frontend/src/modules/ingestion/DataView/DataView.tsx(4 hunks)cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx(1 hunks)cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx(2 hunks)cognee-frontend/src/utils/fetch.ts(1 hunks)cognee-mcp/src/server.py(3 hunks)cognee/api/client.py(2 hunks)cognee/api/v1/add/add.py(1 hunks)cognee/api/v1/cognify/code_graph_pipeline.py(3 hunks)cognee/api/v1/cognify/cognify.py(1 hunks)cognee/api/v1/config/config.py(1 hunks)cognee/api/v1/datasets/datasets.py(1 hunks)cognee/api/v1/responses/__init__.py(1 hunks)cognee/api/v1/responses/default_tools.py(1 hunks)cognee/api/v1/responses/dispatch_function.py(1 hunks)cognee/api/v1/responses/models.py(1 hunks)cognee/api/v1/responses/routers/__init__.py(1 hunks)cognee/api/v1/responses/routers/default_tools.py(1 hunks)cognee/api/v1/responses/routers/get_responses_router.py(1 hunks)cognee/base_config.py(1 hunks)cognee/exceptions/exceptions.py(1 hunks)cognee/infrastructure/databases/graph/get_graph_engine.py(1 hunks)cognee/infrastructure/databases/graph/graph_db_interface.py(2 hunks)cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py(1 hunks)cognee/infrastructure/databases/graph/networkx/adapter.py(7 hunks)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py(2 hunks)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py(8 hunks)cognee/infrastructure/databases/vector/exceptions/exceptions.py(1 hunks)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py(8 hunks)cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py(9 hunks)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py(5 hunks)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py(6 hunks)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py(9 hunks)cognee/infrastructure/llm/anthropic/adapter.py(1 hunks)cognee/infrastructure/llm/gemini/adapter.py(1 hunks)cognee/infrastructure/llm/openai/adapter.py(2 hunks)cognee/modules/data/methods/__init__.py(1 hunks)cognee/modules/data/methods/create_dataset.py(2 hunks)cognee/modules/data/methods/get_unique_dataset_id.py(1 hunks)cognee/modules/engine/models/ColumnValue.py(1 hunks)cognee/modules/engine/models/__init__.py(1 hunks)cognee/modules/graph/cognee_graph/CogneeGraph.py(1 hunks)cognee/modules/observability/get_observe.py(1 hunks)cognee/modules/observability/observers.py(1 hunks)cognee/modules/pipelines/models/PipelineRun.py(1 hunks)cognee/modules/pipelines/operations/__init__.py(1 hunks)cognee/modules/pipelines/operations/get_pipeline_status.py(2 hunks)cognee/modules/pipelines/operations/log_pipeline_run_initiated.py(1 hunks)cognee/modules/pipelines/operations/pipeline.py(3 hunks)cognee/modules/pipelines/operations/run_tasks.py(4 hunks)cognee/modules/pipelines/operations/run_tasks_base.py(4 hunks)cognee/modules/retrieval/exceptions/__init__.py(1 hunks)cognee/modules/retrieval/exceptions/exceptions.py(0 hunks)cognee/modules/retrieval/graph_completion_retriever.py(1 hunks)cognee/modules/retrieval/utils/brute_force_triplet_search.py(4 hunks)cognee/modules/settings/get_settings.py(2 hunks)cognee/modules/visualization/cognee_network_visualization.py(1 hunks)cognee/shared/data_models.py(0 hunks)cognee/shared/logging_utils.py(1 hunks)cognee/tasks/ingestion/ingest_data.py(1 hunks)cognee/tasks/ingestion/migrate_relational_database.py(2 hunks)cognee/tasks/temporal_awareness/index_graphiti_objects.py(2 hunks)cognee/tests/integration/run_toy_tasks/conftest.py(0 hunks)cognee/tests/test_memgraph.py(1 hunks)cognee/tests/test_neo4j.py(1 hunks)cognee/tests/test_relational_db_migration.py(3 hunks)cognee/tests/test_weaviate.py(1 hunks)cognee/tests/unit/modules/pipelines/run_tasks_test.py(1 hunks)cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py(1 hunks)cognee/tests/unit/modules/retrieval/chunks_retriever_test.py(4 hunks)cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py(1 hunks)cognee/tests/unit/modules/retrieval/summaries_retriever_test.py(1 hunks)cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py(0 hunks)entrypoint.sh(3 hunks)examples/data/car_and_tech_companies.txt(1 hunks)examples/database_examples/chromadb_example.py(1 hunks)examples/database_examples/falkordb_example.py(1 hunks)examples/database_examples/kuzu_example.py(1 hunks)examples/database_examples/milvus_example.py(1 hunks)examples/database_examples/neo4j_example.py(1 hunks)examples/database_examples/pgvector_example.py(1 hunks)examples/database_examples/qdrant_example.py(1 hunks)examples/database_examples/weaviate_example.py(1 hunks)examples/python/graphiti_example.py(2 hunks)notebooks/cognee_demo.ipynb(3 hunks)notebooks/cognee_graphiti_demo.ipynb(4 hunks)notebooks/cognee_llama_index.ipynb(2 hunks)notebooks/cognee_openai_compatable_demo.ipynb(1 hunks)notebooks/cognee_simple_demo.ipynb(7 hunks)notebooks/github_graph_visualization.html(0 hunks)notebooks/graphrag_vs_rag.ipynb(7 hunks)notebooks/hr_demo.ipynb(0 hunks)notebooks/llama_index_cognee_integration.ipynb(5 hunks)
💤 Files with no reviewable changes (7)
- cognee/modules/retrieval/exceptions/exceptions.py
- cognee/shared/data_models.py
- cognee/tests/integration/run_toy_tasks/conftest.py
- assets/graph_visualization.html
- notebooks/github_graph_visualization.html
- cognee/tests/unit/modules/retrieval/utils/brute_force_triplet_search_test.py
- notebooks/hr_demo.ipynb
🧰 Additional context used
🧬 Code Graph Analysis (30)
cognee/tests/test_weaviate.py (1)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine(5-6)
cognee/api/v1/responses/__init__.py (1)
cognee/api/v1/responses/routers/get_responses_router.py (1)
get_responses_router(25-149)
cognee/modules/data/methods/__init__.py (1)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
get_unique_dataset_id(5-6)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
cognee/modules/retrieval/summaries_retriever.py (1)
SummariesRetriever(9-33)
cognee/modules/engine/models/__init__.py (1)
cognee/modules/engine/models/ColumnValue.py (1)
ColumnValue(4-9)
cognee/modules/pipelines/operations/__init__.py (1)
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
log_pipeline_run_initiated(6-22)
cognee/infrastructure/llm/gemini/adapter.py (4)
cognee/shared/logging_utils.py (1)
get_logger(137-158)cognee/modules/observability/get_observe.py (1)
get_observe(5-11)cognee/exceptions/exceptions.py (1)
InvalidValueError(47-54)cognee/infrastructure/llm/rate_limiter.py (2)
rate_limit_async(220-243)sleep_and_retry_async(331-376)
alembic/versions/482cd6517ce4_add_default_user.py (1)
cognee/modules/users/methods/create_default_user.py (1)
create_default_user(5-19)
cognee/api/v1/datasets/datasets.py (1)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
get_pipeline_status(8-35)
cognee/api/v1/responses/routers/__init__.py (1)
cognee/api/v1/responses/routers/get_responses_router.py (1)
get_responses_router(25-149)
cognee/api/client.py (1)
cognee/api/v1/responses/routers/get_responses_router.py (1)
get_responses_router(25-149)
cognee/infrastructure/llm/openai/adapter.py (1)
cognee/modules/observability/get_observe.py (1)
get_observe(5-11)
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
Explorer(15-61)
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (1)
test_run_tasks(42-43)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
cognee/modules/users/models/User.py (1)
User(12-39)
cognee/modules/engine/models/ColumnValue.py (1)
cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(16-96)
cognee/base_config.py (1)
cognee/modules/observability/observers.py (1)
Observer(4-9)
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
cognee-frontend/src/utils/fetch.ts (1)
fetch(3-12)
cognee/modules/pipelines/operations/get_pipeline_status.py (1)
cognee/modules/pipelines/models/PipelineRun.py (1)
PipelineRun(15-27)
cognee/modules/observability/get_observe.py (3)
cognee/base_config.py (1)
get_base_config(29-30)cognee/modules/observability/observers.py (1)
Observer(4-9)cognee/api/v1/config/config.py (1)
monitoring_tool(37-39)
cognee/tasks/ingestion/ingest_data.py (1)
cognee/modules/data/methods/create_dataset.py (1)
create_dataset(11-33)
cognee/tests/test_neo4j.py (1)
cognee/modules/users/methods/get_default_user.py (1)
get_default_user(12-37)
examples/database_examples/qdrant_example.py (2)
cognee/modules/search/types/SearchType.py (1)
SearchType(4-13)cognee/api/v1/config/config.py (4)
config(15-194)set_vector_db_config(161-172)data_root_directory(32-34)system_root_directory(17-29)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
MemgraphAdapter(20-690)
cognee/modules/data/methods/create_dataset.py (2)
cognee/modules/data/methods/get_unique_dataset_id.py (1)
get_unique_dataset_id(5-6)cognee/modules/users/models/User.py (1)
User(12-39)
cognee/tasks/ingestion/migrate_relational_database.py (3)
cognee/modules/engine/models/TableRow.py (1)
TableRow(6-12)cognee/modules/engine/models/TableType.py (1)
TableType(4-8)cognee/modules/engine/models/ColumnValue.py (1)
ColumnValue(4-9)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError(5-14)cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
get_async_session(34-40)
cognee/exceptions/exceptions.py (1)
cognee/shared/logging_utils.py (4)
error(127-128)warning(124-125)info(121-122)debug(133-134)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (5)
cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (4)
get_collection(75-80)has_collection(51-53)create_data_points(82-132)delete_data_points(218-226)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (5)
get_collection(121-126)has_collection(111-113)get_connection(99-106)create_data_points(128-144)delete_data_points(300-304)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (3)
has_collection(74-78)create_data_points(99-130)delete_data_points(259-262)cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
CollectionNotFoundError(5-14)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(16-96)
cognee/infrastructure/databases/graph/networkx/adapter.py (4)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (7)
has_node(66-75)get_edges(264-275)extract_node(121-124)extract_nodes(126-136)get_neighbors(381-383)get_node(385-392)get_nodes(394-402)cognee/infrastructure/databases/graph/kuzu/adapter.py (7)
has_node(167-171)get_edges(439-475)extract_node(284-304)extract_nodes(306-325)get_neighbors(479-481)get_node(483-502)get_nodes(504-521)cognee/infrastructure/databases/graph/graph_db_interface.py (4)
get_edges(177-179)get_neighbors(182-184)get_node(125-127)get_nodes(130-132)cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)
extract_node(235-238)extract_nodes(240-241)
🪛 Ruff (0.11.9)
cognee/modules/data/methods/__init__.py
10-10: .get_unique_dataset_id.get_unique_dataset_id imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
cognee/modules/retrieval/exceptions/__init__.py
7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
cognee/modules/engine/models/__init__.py
6-6: .ColumnValue.ColumnValue imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
cognee/modules/pipelines/operations/__init__.py
1-1: .log_pipeline_run_initiated.log_pipeline_run_initiated imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
alembic/versions/482cd6517ce4_add_default_user.py
24-27: Use contextlib.suppress(Exception) instead of try-except-pass
Replace with contextlib.suppress(Exception)
(SIM105)
alembic/versions/1d0bb7fede17_add_pipeline_run_status.py
13-13: cognee.modules.pipelines.models.PipelineRun.PipelineRun imported but unused
Remove unused import
(F401)
13-13: cognee.modules.pipelines.models.PipelineRun.PipelineRunStatus imported but unused
Remove unused import
(F401)
cognee/api/v1/responses/routers/get_responses_router.py
74-74: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
examples/database_examples/chromadb_example.py
1-1: os imported but unused
Remove unused import: os
(F401)
cognee/modules/pipelines/operations/run_tasks_base.py
32-32: Use key in dict instead of key in dict.keys()
Remove .keys()
(SIM118)
examples/database_examples/kuzu_example.py
1-1: os imported but unused
Remove unused import: os
(F401)
examples/database_examples/falkordb_example.py
1-1: os imported but unused
Remove unused import: os
(F401)
cognee/modules/retrieval/utils/brute_force_triplet_search.py
66-73: Use contextlib.suppress(EntityNotFoundError) instead of try-except-pass
Replace with contextlib.suppress(EntityNotFoundError)
(SIM105)
examples/database_examples/pgvector_example.py
1-1: os imported but unused
Remove unused import: os
(F401)
cognee-mcp/src/server.py
166-166: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py
430-430: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🪛 LanguageTool
examples/data/car_and_tech_companies.txt
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...
(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...
(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...
(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...
(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...
(ENGLISH_WORD_REPEAT_RULE)
🪛 Biome (1.9.4)
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🔇 Additional comments (154)
CONTRIBUTING.md (1)
100-100: Clean and concise wording improvement.The removal of "make sure to" makes the instruction more direct and concise without changing the meaning, improving readability.
README.md (1)
133-139: Great addition of cognee UI documentation.The new section introduces users to the cognee UI feature with a clear description and visual representation, enhancing the user's understanding of available interfaces.
cognee/tests/unit/modules/retrieval/chunks_retriever_test.py (2)
19-19: Improved test isolation with specific directory paths.Replacing generic test directory names with more specific ones (e.g.,
test_rag_context→test_chunks_context_simple) creates better isolation between test environments and makes the test purpose clearer.Also applies to: 23-23, 76-76, 80-80, 165-165, 169-169
193-198: Enhanced async test execution.Consolidating all three test method calls within a single async
main()function improves the structure and efficiency of the test execution by running all tests in a single event loop.cognee/exceptions/exceptions.py (2)
15-16: Expanding exception flexibility with logging controlThe addition of
logandlog_levelparameters to theCogneeApiErrorconstructor enables more flexible logging behavior, which is a valuable improvement.
23-30: Well-implemented conditional loggingGood implementation of conditional logging with support for different log levels. This enhancement allows for more granular control over exception logging.
cognee/modules/visualization/cognee_network_visualization.py (1)
24-24: Color mapping for new "ColumnValue" node typeGood addition of color mapping for the "ColumnValue" node type, which maintains consistent visual representation in the network visualization.
cognee/tests/unit/modules/pipelines/run_tasks_test.py (1)
53-54: Added script execution capabilityAdding the
if __name__ == "__main__"guard is a good practice that enables direct execution of the test script, improving developer workflow.cognee/modules/pipelines/models/PipelineRun.py (1)
9-9: Well-integrated pipeline status addition.The new status
DATASET_PROCESSING_INITIATEDis a logical addition that enhances the pipeline execution workflow by adding a separate initiation step before processing starts. This follows the established naming convention and integrates well with the existing statuses.cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1)
130-130: Updated parameter name fromlimittotop_kGood update to align with the parameter naming changes in the
SummariesRetrieverimplementation. This ensures the test uses the correct parameter name which has been standardized across retrieval components.cognee/modules/settings/get_settings.py (1)
2-2: Made LLM endpoint and API version optionalGood change to make
endpointandapi_versionfields optional in theLLMConfigmodel. This provides more flexibility for different LLM provider configurations where these fields might not be required.Also applies to: 24-25
cognee/tests/test_weaviate.py (1)
85-85: LGTM: Correctly updated to await the now-async list_all methodThis change properly awaits the
list_all()method, aligning with the refactoring of the Weaviate adapter to be fully asynchronous.examples/python/graphiti_example.py (1)
17-17: LGTM: Properly integrated user context into the task pipelineThe changes correctly implement user context propagation through the task execution pipeline, which aligns with the broader modifications in the codebase.
Also applies to: 31-33, 41-41
cognee/api/v1/responses/routers/__init__.py (1)
1-3: Good implementation of module exportsThis file follows Python best practices by explicitly defining the public API through the
__all__list, making the module's interface clear and maintainable.cognee/infrastructure/databases/graph/graph_db_interface.py (2)
81-81: Ensure consistency in error logging approachesThis is another instance of changing log level from
errortodebug. While the code handles the exception correctly, verify this change is consistent with overall error handling strategy across the codebase.
61-61:Details
❓ Verification inconclusive
Consider the visibility impact of changing log level from error to debug
Changing this log level from
errortodebugwill significantly reduce visibility of relationship addition issues. While the code properly handles the exception with rollback, ensure these errors are truly expected in normal operation and not indicative of actual problems that should be monitored.
🏁 Script executed:
#!/bin/bash # Check how often these errors occur in logs if available rg "Error adding relationship:" -A 1 --glob "*.log" # Also check if there are error handling patterns in other database adapters rg "Error adding relationship:|Error committing session:" -A 2 --glob "*.py"Length of output: 1221
Re-assess logging level for database errors
These exception handlers for both relationship additions and session commits are now logged at DEBUG, which may hide genuine issues from normal monitoring. Please confirm whether these failures are expected in routine operation—if not, consider raising the level to WARN or ERROR to ensure visibility.
Locations to review:
- cognee/infrastructure/databases/graph/graph_db_interface.py:
• Line ~61 –logger.debug(f"Error adding relationship: {e}")
• Line ~?? –logger.debug(f"Error committing session: {e}")cognee/api/v1/responses/__init__.py (1)
1-3: Good module structure for package-level exportsThis file properly establishes a clean public API for the responses package by re-exporting the
get_responses_routerfunction. This follows Python best practices for package organization.cognee-frontend/src/ui/Partials/Explorer/Explorer.tsx (1)
10-10:Details
✅ Verification successful
Verify that all callsites have been updated for the dataset prop change
The
datasetprop type has been changed from{ id: string }to{ name: string }. While this component passes the dataset object directly togetExplorationGraphUrl()without accessing specific properties, verify that all components usingExplorerhave been updated to provide a dataset with anameproperty instead of anidproperty.
🏁 Script executed:
#!/bin/bash # Search for usages of the Explorer component to check if they've all been updated rg -A 5 -B 5 "<Explorer\s+dataset=" --type tsxLength of output: 76
Let's retry searching for
<Explorer>usages without the unsupported--type tsxflag:
🏁 Script executed:
#!/bin/bash # Search all TSX files for <Explorer dataset= ... /> rg -g '*.tsx' -A5 -B5 '<Explorer[^>]*dataset='Length of output: 1736
To confirm that both callsites now supply a
datasetobject with anamefield (and no longer rely onid), let’s inspect their prop definitions and usages:
🏁 Script executed:
#!/bin/bash echo "=== ExploreStep.tsx ===" sed -n '1,200p' cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx echo echo "=== DatasetsView.tsx ===" sed -n '1,200p' cognee-frontend/src/modules/ingestion/DatasetsView/DatasetsView.tsxLength of output: 4213
All
<Explorer>callsites now use adatasetwith anamefieldBoth instances of
<Explorer>have been updated to supply an object containingnamerather thanid, satisfying the new prop type:
ExploreStep.tsx
- Defines
ExploreStepPropsasdataset: { name: string }- Passes
datasetdirectly to<Explorer>DatasetsView.tsx
- Holds a state of type
{ id: string; name: string }- Passes
dataset!to<Explorer>(extraidproperty is structurally compatible)No further updates are required.
cognee/tests/test_neo4j.py (2)
77-85: Good troubleshooting approach for unstable test.Commenting out the natural language search test with a clear explanation improves test stability for environments with weaker LLM models. This is a pragmatic solution to prevent CI failures.
For the longer term, consider implementing a conditional test that can detect LLM capabilities and skip this test automatically when using weaker models, rather than commenting it out entirely.
90-90: Correctly updated assertion to match removed test.The assertion has been properly updated to expect 6 history entries instead of 8, accounting for the removed natural language search test. This ensures the test remains accurate.
cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1)
3-3:Details
✅ Verification successful
Parameter type change aligns with codebase refactoring.
The function parameter has been updated from
{ id: string }to{ name: string }, which aligns with the broader change across the frontend to use dataset names instead of IDs for identification. This is a consistent pattern being applied throughout the codebase.Let's verify that all components calling this function have been updated to pass the correct parameter:
🏁 Script executed:
#!/bin/bash # Find all occurrences of getExplorationGraphUrl being called with dataset parameters rg "getExplorationGraphUrl\(" -A 3 -B 3Length of output: 1265
🏁 Script executed:
#!/bin/bash rg "dataset" -n cognee-frontend/src/ui/Partials/Explorer/Explorer.tsxLength of output: 244
Verified dataset parameter type updated consistently
I’ve confirmed that
getExplorationGraphUrlis defined to acceptdataset: { name: string }in
cognee-frontend/src/modules/exploration/getExplorationGraphUrl.tsand that its sole call site in
cognee-frontend/src/ui/Partials/Explorer/Explorer.tsxis passing the samedatasetshape. No further updates are needed.cognee/infrastructure/llm/gemini/adapter.py (2)
2-7: Import reorganization follows best practices.The imports have been properly organized, separating standard library imports, third-party packages, and internal modules with blank lines. Adding explicit imports for
BaseModeland type annotations improves code readability.
17-17: Centralized observability implementation improves maintainability.Replacing conditional import logic with a centralized
get_observe()function follows the DRY principle and improves code maintainability. This change is part of a broader effort to standardize observability across the codebase.cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py (1)
157-162: Improved async test execution with consolidated event loop.Consolidating all test executions into a single async
main()function is a best practice for async code. This approach:
- Avoids creating multiple event loops
- Improves test execution efficiency
- Provides a clear sequential execution flow
- Aligns with patterns in other test modules
cognee/api/v1/cognify/cognify.py (1)
37-39: Explicit pipeline naming enhances observability.Adding the
pipeline_name="cognify_pipeline"parameter improves pipeline tracking and management without changing the external behavior of the function. This change supports enhanced pipeline status tracking and logging introduced in this PR.cognee/api/client.py (2)
17-17: Good addition of OpenAI-compatible responses router importThe import statement for
get_responses_routerfrom the responses module follows the established pattern of importing routers in this file.
171-172: Well-integrated OpenAI-compatible responses APIThe responses router is properly mounted at
/api/v1/responseswith appropriate tags, following the established pattern of router registration in this application. This addition enables OpenAI-compatible response generation with function call support.cognee/infrastructure/llm/anthropic/adapter.py (1)
21-21: Good migration to async Anthropic clientSwitching from
anthropic.Anthropic()toanthropic.AsyncAnthropic()aligns with the existing async methods in this class and the broader effort to adopt asynchronous patterns for external service clients throughout the codebase. This change will improve performance by enabling better concurrency.cognee/tasks/ingestion/ingest_data.py (1)
107-107: Correct update to match create_dataset signatureThis change correctly updates the call to pass the full
Userobject instead of justuser.id, aligning with the updated signature of thecreate_datasetfunction as shown in the relevant code snippets.cognee/api/v1/datasets/datasets.py (1)
31-31: Good improvement in specifying the pipeline name explicitlyAdding the pipeline name parameter ensures that status retrieval is scoped to the "cognify_pipeline" specifically, aligning with the updated
get_pipeline_statusfunction that now filters pipeline runs by name. This change improves the precision of pipeline status tracking.cognee-frontend/src/app/wizard/CognifyStep/CognifyStep.tsx (1)
10-10:Details
❌ Incorrect review comment
Property type change from id to name
The dataset prop type has been changed from
{ id: string }to{ name: string }. This appears to be part of a coordinated update across the frontend codebase to standardize dataset identification by name rather than ID.Verify that the
cognifyDatasetfunction and all related components have been updated to handle this change consistently:
🏁 Script executed:
#!/bin/bash # Check if cognifyDataset function has been updated to work with 'name' property echo "Checking cognifyDataset implementation:" rg -A 10 "function cognifyDataset" --type ts --type tsx src/modules/datasets/Length of output: 196
It looks like ripgrep doesn’t recognize the built-in “tsx” type, so let’s re-run the search using file globs:
🏁 Script executed:
#!/bin/bash # Re-check cognifyDataset implementation in both .ts and .tsx files echo "Searching for cognifyDataset definition:" rg -A 10 "function cognifyDataset" -g '*.ts' -g '*.tsx' src/modules/datasets/Length of output: 232
It looks like the path was off—let’s search the entire repo for where
cognifyDatasetis defined and used:#!/bin/bash echo "Locating function definition of cognifyDataset:" rg -n "function cognifyDataset" -g '*.ts' -g '*.tsx' echo echo "Locating all calls to cognifyDataset:" rg -n "cognifyDataset\(" -g '*.ts' -g '*.tsx'
Let’s do a repo-wide search for every occurrence of
cognifyDataset, both its definition and usage:#!/bin/bash echo "Searching for cognifyDataset occurrences in .ts/.tsx files:" rg -n "cognifyDataset" -g '*.ts' -g '*.tsx' .
Let’s search the entire repo for any references to
cognifyDatasetand confirm all instances ofdataset.idwere replaced withdataset.name:#!/bin/bash echo "1) All occurrences of cognifyDataset:" rg -n "cognifyDataset" . echo echo "2) Any remaining uses of dataset.id:" rg -n "dataset\s*\.\s*id" -g '*.ts' -g '*.tsx' . echo echo "3) Confirm uses of dataset.name:" rg -n "dataset\s*\.\s*name" -g '*.ts' -g '*.tsx' .
Ignore
cognifyDatasetverification
I was unable to locate any definition or usage ofcognifyDatasetin the repository. It looks like this function doesn’t exist, so the original request to verify its update can be disregarded.Likely an incorrect or invalid review comment.
cognee/modules/data/methods/get_unique_dataset_id.py (1)
1-6: LGTM - Deterministic ID generation approachThe implementation correctly uses UUID v5 with NAMESPACE_OID to generate a deterministic, unique dataset ID based on the combination of dataset name and user ID. This ensures consistent IDs for the same dataset-user combination.
cognee/base_config.py (1)
5-6: LGTM - Updated import for Observer enumThe import was correctly updated to use the new Observer enum from the observability module, replacing the deprecated MonitoringTool import.
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
131-135: LGTM - Updated to use standardized search interfaceThe code correctly updates to use the new
searchmethod with explicit keyword arguments, replacing the deprecatedget_distance_from_collection_elementsmethod. This aligns with the broader refactoring of vector database adapters mentioned in the summary.cognee/modules/observability/observers.py (1)
4-9: LGTM - Well-structured Observer enumThe Observer enum is well-defined as a string enum with appropriate values for different monitoring tools. This approach provides type safety while maintaining string compatibility, which is excellent for configuration values.
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
3-3: Function interface improved for flexibilityThe update to accept a dataset object with optional properties instead of separate parameters is a good improvement. It allows more flexibility in how datasets are identified.
Also applies to: 10-10
cognee/modules/engine/models/ColumnValue.py (2)
4-9: New ColumnValue class implementation looks goodThe ColumnValue class correctly inherits from DataPoint and defines the necessary attributes for representing column data in the graph.
9-9: Indexing only on properties seems appropriateThe metadata configuration indexes only the "properties" field, which aligns with the likely use case of searching column values in the graph.
cognee/infrastructure/llm/openai/adapter.py (1)
5-5: Import refactoring improves code organizationThe changes simplify imports and centralize how the observe decorator is obtained, which is a good practice for maintainability.
The refactoring aligns with similar changes in other LLM adapters, promoting consistency across the codebase.
Also applies to: 18-20
cognee/tests/test_relational_db_migration.py (2)
115-118: Test assertions updated to match enhanced graph modelingThe increased expected counts for distinct nodes and edges correctly reflect the architectural changes where individual column data is now migrated as separate ColumnValue nodes.
161-162: Database-specific node and edge counts properly updatedThe significantly higher node and edge counts for both SQLite and PostgreSQL providers accurately reflect the expanded graph representation resulting from the migration enhancements.
Also applies to: 192-193
cognee-frontend/src/app/page.tsx (2)
46-46: Updated notification message to improve user flowThe notification message now prompts users to run "Cognify" after data is successfully added, providing clearer guidance on the next step in the workflow.
106-106: Added cognify capability to DataView componentThis change properly passes the new
onCognifycallback to the DataView component, enabling users to trigger cognification directly from the data view.cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (2)
72-72: Fixed apostrophe in commentSimple comment correction from curly apostrophe to straight apostrophe for consistency.
333-334: Restricted schema list for table deletionChanged from dynamically fetching all schemas to using a fixed list
["public", "public_staging"]. This is a safer approach as it restricts table deletion to only specific schemas, preventing accidental deletion of tables in other schemas.cognee/modules/pipelines/operations/get_pipeline_status.py (2)
8-8: Added pipeline_name parameter to function signatureThe function now accepts a pipeline name parameter, allowing for more specific pipeline status queries. This change aligns with the broader enhancement of pipeline management in the codebase.
23-23: Added filter by pipeline nameThis change implements the filtering by pipeline name, restricting results to runs of the specified pipeline. This is consistent with the PipelineRun model which has a pipeline_name column.
notebooks/cognee_llama_index.ipynb (2)
15-17: Standardized notebook cell formatUpdated the code cell format to use a list of strings for the "source" field, which is the standard Jupyter notebook format. This change improves compatibility with notebook tooling.
123-124: Reordered cell metadata and outputsStandardized the notebook JSON structure by reordering metadata and outputs fields. These formatting changes don't affect code execution or logic but ensure consistent notebook structure.
notebooks/cognee_demo.ipynb (3)
470-470: Import correction for Task classThe import statement has been corrected to use the lowercase module name, following Python's conventional naming patterns where module names are typically lowercase.
508-508: Function signature update with user parameterThe call to
run_tasksnow correctly includes theuserparameter, aligning with changes in the task execution pipeline that now require user context for operations. This ensures proper authorization and user-specific data handling.
532-536: Improved user retrieval logicThe user retrieval logic has been refined to follow a two-step process: first fetch the default user, then retrieve the complete user object by ID. This approach ensures the full user object with all necessary attributes is available for dataset operations.
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
81-82: Improved UX with immediate input clearingThe change to clear the input field immediately after submission provides better feedback to the user, rather than waiting for the response to come back.
entrypoint.sh (5)
16-16: Migration command updated to use direct alembic invocationChanging from Poetry-based alembic invocation to direct command is consistent with the Dockerfile refactoring mentioned in the PR summary.
25-25: Added helpful confirmation messageThe added confirmation message improves observability of the container startup process.
27-27: More generic startup messageChanged from the specific "Starting Gunicorn" to the more generic "Starting server..." which is appropriate for the entrypoint script.
36-37: Simplified debugpy invocationThe debugpy invocation has been simplified by removing the
python -mprefix. This works because debugpy is now directly available in the path.
38-42: Removed exec from server commandsRemoving the
execcommand allows the shell script to continue running after the server starts, which might be necessary for additional steps after server startup.Note: This change means the container will not receive termination signals directly. Ensure that there's a proper signal handling mechanism or container health checks to manage graceful shutdown.
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
1-4: Imports look appropriate and complete.These imports correctly provide the necessary components for the function: UUID generation, database engine access, and pipeline run models.
cognee/api/v1/cognify/code_graph_pipeline.py (4)
5-6: Import reorganization improves clarity.The imports were reorganized with the addition of
get_observefrom the observability module. This follows the project's import organization pattern.
14-16: Updated imports align with code functionality.The import changes reflect removing unused imports and adding the necessary
get_unique_dataset_idfunction, which aligns with the changes in dataset ID generation logic later in the file.
25-25: Simplified observability setup.The conditional import of
observefromlangfuse.decoratorswas replaced by a direct assignment using the centralizedget_observe()function. This simplifies the code and ensures consistent observability behavior across the codebase.
68-68: Improved dataset ID generation with user context.The synchronous generation of a fixed UUID is replaced with an asynchronous call to
get_unique_dataset_idthat properly incorporates user context. This makes dataset IDs more user-specific and maintainable.cognee/infrastructure/databases/vector/exceptions/exceptions.py (1)
9-14: Enhanced exception class with improved configuration options.The changes to
CollectionNotFoundErrorare beneficial:
- Fixed the default name parameter from "DatabaseNotCreatedError" to the correct "CollectionNotFoundError"
- Added configurable logging parameters (
logandlog_level) for better error handling flexibility- These changes align with good practices for error reporting and logging
The improvements support standardized error handling across vector database adapters, allowing for more fine-grained control of error visibility.
notebooks/cognee_graphiti_demo.ipynb (8)
18-18: Fixed typo in markdown text.Corrected "libaries" to "libraries" in the markdown cell, improving documentation quality.
27-29: Added essential async support.Added
import asynciowhich is necessary for executing the async functions in the notebook. This is a good addition as the notebook contains async code.
42-43: Important imports for user handling.Added imports for
get_llm_clientandget_default_user, which align with the updated user-aware workflow implemented later in the notebook.
131-131: Simplified pruning code.Simplified the
prune_datacall by removing an inline comment. This improves code readability.
135-137: Added user context initialization.Added code to initialize the default user, which is now required for the pipeline execution. This ensures proper user context throughout the pipeline process.
143-143: Minor formatting adjustment.Small formatting change with no functional impact.
145-145: Updated pipeline execution with user context.Modified
run_tasksto include the user parameter, aligning with updates to the pipeline execution system that now requires user context for proper operation.
148-148: Simplified result printing.Simplified the pipeline result printing by removing string formatting and directly printing the result object. This is cleaner and will show more information about the pipeline status.
cognee/modules/data/methods/create_dataset.py (3)
7-11: Cleaner dependency organization with user object accessThe code now imports and uses the User model directly, which is more type-safe and maintainable than working with just the UUID. This is a good architectural choice that makes the function's dependencies more explicit.
11-12: Improved function signature by using the User modelChanging from accepting a raw UUID to a full User object is a good design decision. It better enforces type safety and makes the function's requirements more explicit. The function correctly extracts the owner_id from the user.
24-26: Properly refactored dataset ID generation logicThe implementation now correctly uses the specialized
get_unique_dataset_idhelper function to generate dataset IDs consistently. The extracted functionality makes the code more maintainable and ensures consistent ID generation across the codebase.cognee/tests/unit/modules/pipelines/run_tasks_with_context_test.py (4)
10-13: Good test setup with proper data cleanupThe test properly cleans up data and system metadata before running, which prevents test interference from previous runs and ensures reproducible test results.
14-22: Well-structured task definitions with different context usage patternsThe test defines tasks with different parameter patterns:
task_1uses both input and contexttask_2uses only inputtask_3uses both input and contextThis properly tests the pipeline's ability to pass context to tasks that need it while handling tasks that don't require context.
26-35: Clear pipeline setup with context parameterThe test properly sets up the pipeline with tasks, input data, user, and context, demonstrating the intended usage pattern of the
run_tasks_basefunction.
42-47: Well-structured test wrapper and executionThe synchronous wrapper and main guard allow for both automated test runs and manual execution, which is good for development and testing.
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (4)
33-33: Good interface update with Promise return typeThe
onCognifyproperty is correctly defined as returning a Promise, which makes it clear that this is an asynchronous operation.
60-70: Well-implemented button state managementThe implementation correctly:
- Uses the
useBooleanhook to manage button state- Disables the button while the operation is in progress
- Re-enables the button regardless of success or failure using
finallyThis prevents multiple submissions and provides good user feedback.
80-84: Well-integrated Cognify button in the UIThe Cognify button is properly placed alongside existing controls and uses the appropriate UI component (
CTAButton) for a primary action.
36-36:Details
❌ Incorrect review comment
Avoid shadowing the global DataView object
The component name is shadowing the global
DataViewproperty, which could lead to confusion.Consider renaming this component to something more specific like
DatasetVieworCogneeDataViewto avoid potential conflicts with the globalDataViewconstructor.
🏁 Script executed:
#!/bin/bash # Check if DataView is used as a global property in the codebase rg "new DataView\(" --glob "*.{ts,tsx,js,jsx}"Length of output: 48
No global
DataViewconflict detected
TheDataViewReact component is declared and exported within its own module scope, so it doesn’t override or interfere with the built-inDataViewconstructor elsewhere. No occurrences ofnew DataView(...)or references to the global API were found in the codebase. You can safely keep the current name or rename it for clarity at your discretion.Likely an incorrect or invalid review comment.
🧰 Tools
🪛 Biome (1.9.4)
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
examples/database_examples/kuzu_example.py (4)
8-34: Well-structured example with clear configurationThe example clearly demonstrates how to:
- Configure KuzuDB as the graph database provider
- Set up data directories with proper paths
- Document each step with helpful comments
This makes it easy for users to understand the configuration process.
39-54: Good dataset creation and sample data ingestionThe example demonstrates dataset creation and data ingestion with realistic sample text about KuzuDB, which helps users understand the context and provides a meaningful example.
57-78: Comprehensive search examplesThe example thoroughly demonstrates three different search types:
- Insights search
- Chunks search with dataset filtering
- Graph completion search
Each search is properly formatted and the results are clearly printed, which provides users with a complete guide to using search functionality.
79-81: Good practice showing optional cleanupThe example includes commented-out cleanup code, which is useful for users who want to run the example multiple times without data accumulation while making it clear that cleanup is optional.
cognee/tasks/temporal_awareness/index_graphiti_objects.py (2)
23-32: Consistent approach to graph engine queries with explicit parametersThe code now consistently passes empty parameter dictionaries (
params={}) to all graph engine queries, aligning with the updated graph adapter interfaces mentioned in the PR summary. This makes the code more maintainable and consistent across the codebase.
34-40: Updated graph data retrieval interfaceThe change from
get_model_independent_graph_data()toget_graph_data()and the updated node iteration pattern reflect the refactored graph database adapters mentioned in the PR summary. The new approach of iterating directly over node ID-data pairs is cleaner than the previous implementation.examples/database_examples/falkordb_example.py (1)
29-39:⚠️ Potential issueUpdate FalkorDB configuration comment
Line 29 creates a Milvus database path, but the example is for FalkorDB. This appears to be a copy-paste error from the Milvus example file.
-local_milvus_db_path = os.path.join(cognee_directory_path, "databases", "milvus.db") # Configure Milvus as the vector database provider cognee.config.set_vector_db_config( { - "vector_db_url": local_milvus_db_path, # Enter Milvus Endpoint if exist + "vector_db_url": "", # Enter FalkorDB vector endpoint if needed "vector_db_key": "", # Enter Token - "vector_db_provider": "milvus", # Specify Milvus as provider + "vector_db_provider": "falkordb", # Specify FalkorDB as provider } )Likely an incorrect or invalid review comment.
cognee/tasks/ingestion/migrate_relational_database.py (2)
98-133: Column data migration enhances graph representation capabilitiesThe addition of column-level nodes provides a more granular representation of relational data in the graph, enabling more detailed queries and insights. The implementation correctly excludes primary and foreign keys to avoid redundancy.
16-16: Function signature updated with clear optional parameterThe addition of the optional
migrate_column_dataparameter with a sensible default (True) allows users to control the granularity of the migration while maintaining backward compatibility.examples/database_examples/milvus_example.py (1)
8-18: Well-structured example demonstrates clear process flowThe function documentation and example flow are clear, showing how to set up, process data, and perform searches with Milvus. This structure makes it easy for users to understand the key steps in using Cognee with Milvus.
examples/database_examples/neo4j_example.py (1)
1-95: Well-structured example with good documentation.The example is well-organized, follows a logical flow, and has clear documentation. It successfully demonstrates the integration between Cognee and Neo4j with appropriate comments explaining each step.
examples/database_examples/chromadb_example.py (1)
1-88: Well-structured example with good documentation.The example is well-organized, follows a logical flow, and has clear documentation. It successfully demonstrates the integration between Cognee and ChromaDB with appropriate comments explaining each step.
🧰 Tools
🪛 Ruff (0.11.9)
1-1:
osimported but unusedRemove unused import:
os(F401)
cognee/modules/pipelines/operations/run_tasks.py (2)
87-93: Good use of explicit keyword arguments.Good job using explicit keyword arguments for clarity when calling
run_tasks_with_telemetry. This makes the code more readable and less prone to errors when parameters are reordered or added.
23-42: Well-implemented backward-compatible parameter addition.The addition of the optional
contextparameter with a default value ofNoneis a good example of how to extend functionality while maintaining backward compatibility with existing code.examples/database_examples/pgvector_example.py (2)
1-100: Well-structured example for PGVector integration.This example script provides a comprehensive demonstration of using Cognee with PostgreSQL and PGVector, with the same well-organized structure as other database examples. It appropriately configures both vector and relational database settings, which is necessary for PGVector.
🧰 Tools
🪛 Ruff (0.11.9)
1-1:
osimported but unusedRemove unused import:
os(F401)
94-95: Clean-up code commented out.Similar to the Qdrant example, the cleanup calls are commented out. This is appropriate for an example script as it allows users to inspect the results after running.
cognee/tests/test_memgraph.py (1)
88-92: Magic number in history length assertion
assert len(history) == 8is brittle – the expected number of history entries is an implementation detail that may change. Either calculate the expected value dynamically (e.g.,expected = len(SEARCH_TYPES_USED) * entries_per_search) or drop the count assertion and verify properties that matter (non-empty, chronological order, etc.).cognee/modules/pipelines/operations/run_tasks_base.py (1)
66-82: LGTM - Context forwarding is properly implementedThe function signature update and context forwarding mechanism are correctly implemented. The context is passed through the pipeline execution, maintaining backward compatibility with tasks that don't expect a context parameter.
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
173-174: LGTM - Improved error handling for collection not found caseAdding specific error handling for the
CollectionNotFoundErrorcase is a good improvement. It makes the function more robust by gracefully handling the case when a collection doesn't exist.examples/data/car_and_tech_companies.txt (1)
1-37: Well-structured sample data for testingThe sample data is well-structured and provides a good representation of two different domains (automotive and technology) for testing knowledge graph functionality. The paragraph format with numbered lists is clear and consistent across both text samples.
🧰 Tools
🪛 LanguageTool
[duplication] ~2-~2: Possible typo: you repeated a word.
Context: text_1 = """ 1. Audi Audi is known for its modern designs and adv...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~5-~5: Possible typo: you repeated a word.
Context: ...ns to high-performance sports cars. 2. BMW BMW, short for Bayerische Motoren Werke, is...(ENGLISH_WORD_REPEAT_RULE)
[style] ~6-~6: Consider using a more concise synonym.
Context: ... reflects that commitment. BMW produces a variety of cars that combine luxury with sporty pe...(A_VARIETY_OF)
[duplication] ~8-~8: Possible typo: you repeated a word.
Context: ...ine luxury with sporty performance. 3. Mercedes-Benz Mercedes-Benz is synonymous with luxury and quality. ...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~11-~11: Possible typo: you repeated a word.
Context: ... catering to a wide range of needs. 4. Porsche Porsche is a name that stands for high-performa...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~14-~14: Possible typo: you repeated a word.
Context: ...o value both performance and style. 5. Volkswagen Volkswagen, which means "people's car" in German, ...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~17-~17: The plural determiner ‘these’ does not agree with the singular noun ‘car’.
Context: ...nce practicality with quality. Each of these car manufacturer contributes to Germany's r...(THIS_NNS)
[uncategorized] ~17-~17: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...cality with quality. Each of these car manufacturer contributes to Germany's reputation as ...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...design excellence. """ text_2 = """ 1. Apple Apple is renowned for its innovative consumer...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ... in shaping the internet landscape. 3. Microsoft Microsoft Corporation has been a dominant force i...(ENGLISH_WORD_REPEAT_RULE)
[style] ~28-~28: Consider using a synonym to be more concise.
Context: ...n both business and personal computing. In recent years, Microsoft has expanded into cloud comp...(IN_RECENT_STYLE)
[uncategorized] ~31-~31: You might be missing the article “the” here.
Context: ...or innovation continues to reshape both retail and technology sectors. 5. Meta Meta, ...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[duplication] ~33-~33: Possible typo: you repeated a word.
Context: ...both retail and technology sectors. 5. Meta Meta, originally known as Facebook, revoluti...(ENGLISH_WORD_REPEAT_RULE)
Dockerfile (6)
1-2: Good choice of base image for modern Python developmentUsing a pre-installed
uvimage with Python 3.12 is a great improvement over the previous setup. This leverages newer Python features and a more efficient dependency management tool.
19-27: Well-structured dependency installationGood approach combining all system dependencies in a single RUN command with proper cleanup of apt lists to reduce image size.
29-34: Effective use of Docker layer cachingCopying just the configuration files first is a good practice for leveraging Docker's layer caching mechanism.
33-34: Comprehensive dependency management with uvUsing
uv syncwith specific extras and flags shows a thoughtful approach to dependency management.
36-38: Good sequencing of file copyingCopying Alembic migrations before application code is a smart approach that allows for better layer caching.
56-60: Properly configured runtime environmentGood configuration of PATH and PYTHONPATH to ensure the application and its dependencies are correctly accessible.
cognee/modules/pipelines/operations/pipeline.py (3)
3-3: Clean import organizationThe imports are well organized and the addition of new modules like
get_unique_dataset_idandlog_pipeline_run_initiatedshows good modularization.Also applies to: 5-9, 16-16
64-92: Improved dataset handling with user-scoped IDsThe refactoring to use user-scoped dataset IDs is a significant improvement over the previous implementation. The code properly handles existing datasets and creates new ones with unique IDs when needed.
148-154: Enhanced pipeline status checkingGood addition of early returns to prevent reprocessing of datasets that are already being processed or have been completed. This will improve efficiency and prevent duplicate work.
cognee/api/v1/responses/dispatch_function.py (4)
19-45: Well-structured function dispatch implementationThe
dispatch_functionis cleanly implemented with proper error handling and logging. It correctly handles both dictionary and object inputs for broader compatibility.
47-84: Comprehensive search handling with proper validationThe
handle_searchfunction includes thorough validation of inputs, defaults for optional parameters, and proper error handling. Good approach to extracting schema information from tool definitions.
87-101: Clear cognify handling with conditional responseThe
handle_cognifyfunction is well-implemented with a clear conditional response based on whether text was provided. Good separation of concerns between adding text and running cognify.
104-107: Simple and effective prune handlerThe
handle_prunefunction is straightforward and returns a clear success message.cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (4)
4-8: Improved import organizationBetter organization of imports with the logging utility first, followed by core components and exceptions.
100-125: Enhanced error handling for data point creationGood improvement in error handling by specifically catching
UnexpectedResponseand raising a more specificCollectionNotFoundErrorwhen appropriate.
155-179: Robust search method implementation with validationThe search method has been significantly improved with:
- Parameter validation to ensure at least one query type is provided
- Early returns for invalid limits
- Collection existence checking
- Proper embedding of text queries when needed
These changes make the method more robust and efficient.
208-215: Consistent error handling patternThe error handling pattern for
UnexpectedResponseis consistently implemented here, similar to thecreate_data_pointsmethod. This provides a unified approach to handling collection not found errors.cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
16-21: Import grouping reads better – thanks!
Splitting utility, exception, and embedding-engine imports onto their own lines improves scan-ability and avoids very long lines.
186-188: Good use ofCollectionNotFoundErrorwith log-level control
Raising the domain-specific error instead ofKeyError/RuntimeErrormakes the contract clearer for callers and allows the service to downgrade noisy “missing collection” events to DEBUG – nice touch.cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (3)
122-125: Great: translate Milvus-specific exception to domain error
CatchingCollectionNotExistExceptionand re-raisingCollectionNotFoundErrorkeeps the adapter boundary clean and avoids leaking backend-specific types.
148-166:retrieve()now acceptsUUIDbut string interpolation still relies onstr()cast
You join the IDs withf'"{id}"', which implicitly callsUUID.__str__; fine. However returning raw Milvus rows breaks parity with other adapters whereretrievereturns aList[ScoredResult]. Consider normalising here or documenting the deviation.
251-251: Changed signature is a breaking change – ripple review recommended
delete_data_pointsnow expectsList[UUID]instead ofList[str]. Verify callers were updated; otherwise a silentTypeErroror string/UUID mismatch will surface at runtime.cognee/infrastructure/databases/graph/networkx/adapter.py (1)
45-47:Details
❌ Incorrect review comment
Method signature now violates the parent interface
GraphDBInterface.has_node()is declared with astrargument, but the override was changed toUUID. Besides breaking the Liskov-substitution principle, this will trip static type-checkers and may crash at runtime when other code (e.g., Neo4j/Kùzu adapters or service code) calls the interface withstrIDs.Two safe options:
-async def has_node(self, node_id: UUID) -> bool: +async def has_node(self, node_id: str | UUID) -> bool:or refactor all adapters + the interface to accept
UUID. Mixing the two types is a hidden time-bomb.
🏁 Script executed:
#!/bin/bash # Locate the GraphDBInterface declaration rg -n "class GraphDBInterface" -C 5 # Inspect the has_node method signature in that interface rg -n "def has_node" -C 5Length of output: 5302
🏁 Script executed:
#!/bin/bash # Verify whether GraphDBInterface declares has_node and its signature rg -n "has_node" -C 5 cognee/infrastructure/databases/graph/graph_db_interface.py # Show the interface block around line 95 where methods are defined sed -n '90,115p' cognee/infrastructure/databases/graph/graph_db_interface.pyLength of output: 902
Ignore incorrect override warning
TheGraphDBInterfacedoes not declare ahas_nodemethod at all, so thenetworkxadapter’s signature isn’t overriding anything in the interface. No change is needed here.Likely an incorrect or invalid review comment.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
76-82:get_collectionawaits a possibly synchronous method
AsyncConnection.open_table()in LanceDB is synchronous (it returns anAsyncTableimmediately).
await connection.open_table(...)works only if the library changed this toasync def. Double-check the version; otherwise drop theawait:- return await connection.open_table(collection_name) + return connection.open_table(collection_name)Failure will raise
TypeError: object AsyncTable can’t be used in 'await' expression.cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
121-127: Asynchronous contract mismatch
client.get_collectionin chromadb-python returns a plain object, not a coroutine; awaiting it may raiseTypeErrorin newer versions. Guard withinspect.isawaitableor removeawaitonce verified.-return await client.get_collection(collection_name) +return client.get_collection(collection_name)notebooks/llama_index_cognee_integration.ipynb (4)
6-8: Notebook source formatting standardized to array structure.The notebook cell's source format has been converted from a single string to an array format for consistency across the project. This is a structural change in the notebook JSON and doesn't affect functionality.
62-64: Code cell source formatting standardized.Similar to the markdown cells, this code cell's source has been converted to array format for consistent notebook structure.
199-201: Standardized remaining markdown cell source formatting.The remaining markdown cells have been updated to use consistent array-based source formatting, maintaining the notebook's JSON structure uniformity.
Also applies to: 219-221
282-283: Updated Python kernel version specification.The Python kernel metadata has been updated to explicitly specify the Python version as "3.12.9" instead of just "python", which helps ensure reproducibility by clearly documenting the expected Python version.
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (3)
5-5: Updated dataset prop type to use name instead of id.The
datasetprop type has been changed from{ id: string }to{ name: string }, aligning with a broader refactoring across the frontend to standardize on using dataset name as the identifier.
11-11: Simplified Explorer component usage.Removed the non-null assertion operator (
!) since it's unnecessary when passing the dataset prop directly. This improves type safety and readability.
4-6:Details
❓ Verification inconclusive
Verify dataset name property usage in Explorer component.
The Explorer component now expects to receive a dataset with a
nameproperty instead ofid. Based on the relevant code snippet fromExplorer.tsx, you should verify that thegetExplorationGraphUrlfunction has been updated to work with dataset name rather than id.Also applies to: 8-12
🏁 Script executed:
#!/bin/bash # Check if getExplorationGraphUrl has been updated to work with dataset.name grep -r "getExplorationGraphUrl" --include="*.ts" --include="*.tsx" src/modules/exploration/Length of output: 150
It looks like the
src/modules/exploration/path doesn’t exist in this branch. Let’s search the entire repo forgetExplorationGraphUrl:#!/bin/bash # Search for getExplorationGraphUrl across the entire codebase grep -R "getExplorationGraphUrl" -n .
Let’s search the entire repo for
getExplorationGraphUrlto locate its definition/import and verify it’s usingdataset.name:#!/bin/bash # Search for getExplorationGraphUrl across the entire codebase grep -R "getExplorationGraphUrl" -n .
Please verify
getExplorationGraphUrlusesdataset.nameinstead ofdataset.id.
- In
ExploreStep.tsx, the prop now reads:interface ExploreStepProps { dataset: { name: string }; }- Locate the definition or import of
getExplorationGraphUrl(e.g. inExplorer.tsxor your URL-builder module) and ensure it has been updated to acceptdataset.namerather thandataset.id.- Update any import paths or calls if the function has moved or been renamed.
notebooks/cognee_simple_demo.ipynb (3)
18-18: Updated cognee package version.The cognee package has been upgraded from version 0.1.36 to 0.1.39. This ensures compatibility with the latest backend improvements including the new OpenAI-compatible responses API and enhanced dataset handling.
13-19: Standardized notebook cell metadata.All code cells now explicitly include
"execution_count": nulland"outputs": []fields, standardizing the notebook metadata format for better consistency across the project's notebooks.Also applies to: 33-41, 53-60, 72-80, 92-98, 102-108, 112-118, 130-143
1-175: Ensure compatibility with new API features.The notebook has been updated to use cognee 0.1.39, which introduces new features like OpenAI-compatible responses and improved dataset handling. While the notebook code itself hasn't changed functionally, you should verify that all existing code works with the new package version, especially regarding async dataset handling.
notebooks/graphrag_vs_rag.ipynb (4)
56-56: Package version update looks appropriate.The notebook is updated to use cognee 0.1.39, which aligns with the new APIs being introduced in this PR. This ensures the notebook will work with the latest version of the package.
152-153: Correct update to the new API import path and method signature.The import statement has been properly updated from
cognee.modules.search.typestocognee.api.v1.search, and the search method now uses keyword arguments (query_type,query_text) instead of positional arguments, which follows the new API convention.
173-173: Consistent use of updated search API.The RAG completion search is correctly updated to use the new keyword argument pattern and the appropriate SearchType enum value. This maintains consistency with the changes seen in the GraphRAG search call above.
202-202: Insights search syntax properly updated.The insights search call has been updated to match the new API pattern with keyword arguments. All search calls in this notebook now consistently use the new method signature.
examples/database_examples/qdrant_example.py (10)
1-6: Appropriate imports for asynchronous operation.The imports cover all necessary modules for asynchronous operation with the Cognee package, including the updated SearchType import from the new API path.
8-19: Well-documented main function with clear purpose.The main function includes a thorough docstring explaining the example's purpose and steps. This follows good documentation practices and helps users understand how to use Cognee with Qdrant.
20-32: Secure credential handling and proper database configuration.The code properly retrieves credentials from environment variables and configures Cognee to use Qdrant as the vector database provider. This is a secure approach to handling credentials rather than hardcoding them.
34-41: Appropriate path handling for data storage.Using pathlib for path manipulation is a good practice. The code correctly sets up relative paths based on the script location, which makes the example more portable.
43-45: Optional data cleanup for fresh start.Including the prune operations is helpful for ensuring a clean environment when running the example. The comment clearly indicates that this step is optional.
47-59: Clear dataset creation and sample data addition.The sample data is well-formed and representative of Qdrant's features. The example demonstrates how to add text data to a named dataset, which is a fundamental operation in Cognee.
61-62: Proper knowledge extraction step.The example correctly demonstrates how to process the added document using the cognify method with the dataset name.
64-86: Comprehensive demonstration of different search types.The example showcases three different search types (INSIGHTS, CHUNKS, GRAPH_COMPLETION) which is valuable for users to understand the versatility of Cognee's search capabilities. Each search includes proper result handling and output formatting.
87-89: Optional cleanup steps are commented out.It's good practice to include these cleanup steps but leave them commented out, allowing users to decide whether they want to retain the data for further experimentation.
92-93: Standard Python script entry point with proper asyncio handling.The script correctly uses
asyncio.run()to execute the async main function when the script is run directly.cognee/api/v1/responses/routers/default_tools.py (2)
1-37: Well-structured definition of the search tool.The search tool is properly defined with clear parameter types and descriptions. The search_type enum includes the appropriate options, and the required parameter is correctly specified. This follows OpenAI's function calling format for tool definitions.
60-86: Prune tool definition is well-defined with appropriate constraints.The prune tool definition includes appropriate constraints for its parameters, such as the enum values for prune_strategy and min/max values for min_confidence. The default value for prune_strategy is also helpful.
| root_logger.setLevel(log_level) | ||
|
|
||
| if log_level > logging.WARNING: | ||
| if log_level > logging.DEBUG: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Broadened SQLAlchemy warning suppression.
Changed from suppressing SQLAlchemy warnings only when log_level > WARNING to suppressing them when log_level > DEBUG. This means warnings will now be suppressed at INFO and WARNING levels where they were previously visible.
Consider adding a comment explaining the rationale behind this change, as it affects the visibility of potential SQLAlchemy issues during debugging.
🤖 Prompt for AI Agents
In cognee/shared/logging_utils.py at line 315, add a comment explaining why
SQLAlchemy warnings are now suppressed for log levels greater than DEBUG instead
of only greater than WARNING. This comment should clarify the intention behind
broadening the suppression to include INFO and WARNING levels, helping future
readers understand the impact on visibility of SQLAlchemy warnings during
debugging.
| """ | ||
|
|
||
| from .exceptions import SearchTypeNotSupported, CypherSearchError, CollectionDistancesNotFoundError | ||
| from .exceptions import SearchTypeNotSupported, CypherSearchError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Address unused imports
Static analysis indicates that the imported exceptions (SearchTypeNotSupported and CypherSearchError) are not being used within this file.
Either:
- Add them to an
__all__list to make them available to importers:
from .exceptions import SearchTypeNotSupported, CypherSearchError
+__all__ = ["SearchTypeNotSupported", "CypherSearchError"]- Or re-export them more explicitly:
-from .exceptions import SearchTypeNotSupported, CypherSearchError
+from .exceptions import SearchTypeNotSupported as SearchTypeNotSupported
+from .exceptions import CypherSearchError as CypherSearchError- Or import with
__all__if defined in the exceptions module:
-from .exceptions import SearchTypeNotSupported, CypherSearchError
+from .exceptions import SearchTypeNotSupported, CypherSearchError, __all__📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from .exceptions import SearchTypeNotSupported, CypherSearchError | |
| from .exceptions import SearchTypeNotSupported, CypherSearchError | |
| __all__ = ["SearchTypeNotSupported", "CypherSearchError"] |
🧰 Tools
🪛 Ruff (0.11.9)
7-7: .exceptions.SearchTypeNotSupported imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
7-7: .exceptions.CypherSearchError imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
🤖 Prompt for AI Agents
In cognee/modules/retrieval/exceptions/__init__.py at line 7, the imported
exceptions SearchTypeNotSupported and CypherSearchError are not used in the
file. To fix this, explicitly re-export these exceptions by adding an __all__
list including their names or by using a direct re-export statement. This will
make the imports available to other modules without causing unused import
warnings.
| } = useBoolean(false); | ||
|
|
||
| const dataset = { id: 'main' }; | ||
| const dataset = { name: 'main' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Property name change is consistent with broader refactoring.
Changing the property from id to name aligns with the broader refactoring across the codebase as mentioned in the AI summary.
Let's verify that all usages of dataset.id have been updated to use dataset.name instead:
🏁 Script executed:
#!/bin/bash
# Check for any remaining instances of dataset.id that might have been missed during refactoring
echo "Searching for potential missed refactoring instances:"
rg "dataset\.id" --type=typescript --type=tsxLength of output: 199
The rg error indicates “typescript” isn’t a recognized file type. Let’s broaden the search to all .ts/.tsx files:
🏁 Script executed:
#!/bin/bash
echo "Searching for any remaining `dataset.id` usages in .ts/.tsx files:"
rg "dataset\.id" -g '*.ts' -g '*.tsx'Length of output: 1298
Incomplete refactoring: remaining dataset.id references need updating
Several files still reference dataset.id instead of dataset.name. Please update each occurrence to use the new name property (and adjust any related variable or API-parameter names), for example:
• cognee-frontend/src/modules/ingestion/useDatasets.ts
- status: statuses[dataset.id]
- datasets.filter((dataset) => dataset.id !== datasetId)
• cognee-frontend/src/modules/ingestion/addData.ts
- if (dataset.id) {
formData.append('datasetId', dataset.id);
}
• cognee-frontend/src/modules/ingestion/DatasetsView/DatasetsView.tsx
- <DatasetItem key={dataset.id} onClick={() => onDatasetClick(dataset)}>
• cognee-frontend/src/modules/datasets/deleteDataset.ts
- fetch(
/v1/datasets/${dataset.id})
• cognee-frontend/src/modules/datasets/getDatasetData.ts
- fetch(
/v1/datasets/${dataset.id}/data)
• cognee-frontend/src/modules/datasets/cognifyDataset.ts
- datasets: [dataset.id || dataset.name]
• cognee-frontend/src/app/page.tsx
- .then(() => setSelectedDataset(dataset.id));
- const dataset = datasets.find((dataset) => dataset.id === selectedDataset);
Example diff for one case:
- status: statuses[dataset.id]
+ status: statuses[dataset.name]Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee-frontend/src/app/wizard/WizardPage.tsx at line 32, the dataset object
uses the property name 'name' instead of 'id' as part of a broader refactoring.
However, there are still many references to 'dataset.id' across the codebase
that need to be updated to 'dataset.name' to maintain consistency. Review all
occurrences of 'dataset.id' in the project files, especially those listed in the
comment, and replace them with 'dataset.name'. Also, update any related variable
names, API parameters, and usages accordingly to reflect this change.
| try: | ||
| await_only(create_default_user()) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve exception handling with contextlib.suppress and add logging
While suppressing exceptions during default user creation may be intentional (perhaps to handle cases where the user already exists), the current implementation silently swallows all exceptions without any logging. This could mask important errors.
Apply this refactoring:
- try:
- await_only(create_default_user())
- except Exception:
- pass
+ import contextlib
+ import logging
+
+ logger = logging.getLogger(__name__)
+
+ with contextlib.suppress(Exception) as e:
+ await_only(create_default_user())
+ logger.info("Default user created successfully")
+ if e:
+ logger.warning(f"Failed to create default user: {e}")Alternatively, for a simpler fix that addresses just the static analysis warning:
- try:
- await_only(create_default_user())
- except Exception:
- pass
+ from contextlib import suppress
+
+ with suppress(Exception):
+ await_only(create_default_user())Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.11.9)
24-27: Use contextlib.suppress(Exception) instead of try-except-pass
Replace with contextlib.suppress(Exception)
(SIM105)
🤖 Prompt for AI Agents
In alembic/versions/482cd6517ce4_add_default_user.py around lines 24 to 27,
replace the try-except block that silently passes on all exceptions with
contextlib.suppress to explicitly suppress expected exceptions, and add logging
to record any suppressed exceptions or unexpected errors. This improves clarity
and ensures important errors are not silently ignored.
| triplets = await self.get_triplets(query) | ||
| except EntityNotFoundError: | ||
| return "" | ||
| triplets = await self.get_triplets(query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify that exception propagation is handled appropriately by callers
The removal of the try-except block around get_triplets changes the error handling behavior - EntityNotFoundError will now propagate to callers instead of being caught and returning an empty string.
Make sure all callers of get_context are prepared to handle these exceptions. Run this script to identify callers that might need updates:
🏁 Script executed:
#!/bin/bash
# Find all places where get_context is called
echo "Finding direct callers of get_context method:"
rg --context=2 "\.get_context\(" --type py
echo -e "\nChecking if any of these callers have try-except blocks:"
rg --context=5 "try:.*\.get_context\(" --type pyLength of output: 15045
Preserve error‐handling around get_triplets or update callers of get_context
The removal of the try/except in GraphCompletionRetriever.get_context lets EntityNotFoundError (and DatabaseNotCreatedError) bubble up to every caller. A scan of the codebase found numerous direct callers—none of which wrap get_context in a try/except—so this change will break existing tests (e.g., empty‐graph scenarios expecting "") and may crash downstream consumers.
Call‐sites requiring attention:
- cognee/tests/unit/modules/retrieval/graph_completion_retriever_test.py
- cognee/modules/retrieval/EntityCompletionRetriever.py
- cognee/modules/retrieval/cypher_search_retriever.py
- cognee/modules/retrieval/summaries_retriever.py
- cognee/modules/retrieval/natural_language_retriever.py
- cognee/modules/retrieval/code_retriever.py
- cognee/modules/retrieval/chunks_retriever.py
- cognee/modules/retrieval/completion_retriever.py
- cognee/modules/retrieval/insights_retriever.py
- cognee/eval_framework/answer_generation/answer_generation_executor.py
- cognee/api/v1/cognify/routers/get_code_pipeline_router.py
Suggested fixes:
- Re-add a
try/except EntityNotFoundErroraroundget_tripletsinGraphCompletionRetriever.get_context(returning""on failure), or - Wrap each caller of
get_contextin its owntry/exceptto handle the now-propagated exceptions.
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_retriever.py around lines 77 to 78,
the removal of the try-except block around the call to get_triplets causes
EntityNotFoundError and DatabaseNotCreatedError exceptions to propagate to all
callers of get_context, which currently do not handle these exceptions and may
break existing functionality. To fix this, reintroduce a try-except block around
the get_triplets call inside get_context that catches EntityNotFoundError and
returns an empty string, preserving the original error handling behavior and
preventing exceptions from breaking downstream callers.
| async def add_node(self, node: DataPoint): | ||
| serialized_properties = self.serialize_properties(node.model_dump()) | ||
|
|
||
| query = """ | ||
| MERGE (node {id: $node_id}) | ||
| ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp() | ||
| ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp() | ||
| RETURN ID(node) AS internal_id,node.id AS nodeId | ||
| """ | ||
|
|
||
| params = { | ||
| "node_id": str(node.id), | ||
| "node_label": type(node).__name__, | ||
| "properties": serialized_properties, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Cypher label cannot be parameterized – query will fail at runtime
ON CREATE/ON MATCH SET node:$node_label attempts to pass a label through a Cypher parameter.
Cypher (both Neo4j & Memgraph) does not allow labels to be provided as parameters – the query engine interprets $node_label as a literal string, not as a label. The MERGE therefore creates/updates a node with no label at all, or raises a syntax error (depending on server version).
Replace the pattern with explicit string interpolation (after validating / whitelisting), or use APOC procedures if available.
- MERGE (node {id: $node_id})
- ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp()
- ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp()
+MERGE (node {id: $node_id})
+SET node:`{type(node).__name__}`
+SET node += $properties,
+ node.updated_at = timestamp()Be sure to validate type(node).__name__ against an allow-list to avoid Cypher-injection.
Same issue exists in add_nodes.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 64 to 78, the Cypher query incorrectly uses a parameter for the node
label, which Cypher does not support and causes runtime errors. To fix this,
replace the parameterized label with explicit string interpolation of the label
after validating it against a whitelist to prevent Cypher injection. Apply the
same fix to the add_nodes method to ensure labels are safely and correctly
included in the query.
| UNWIND $edges AS edge | ||
| MATCH (a)-[r]->(b) | ||
| WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name | ||
| RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists | ||
| """ | ||
|
|
||
| try: | ||
| params = { | ||
| "edges": [ | ||
| { | ||
| "from_node": str(edge[0]), | ||
| "to_node": str(edge[1]), | ||
| "relationship_name": edge[2], | ||
| } | ||
| for edge in edges | ||
| ], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic error – comparing property IDs to internal IDs
id(a) / id(b) returns the internal graph ID (an integer), yet you pass the string UUIDs as parameters.
This causes every comparison to evaluate to FALSE, so edge_exists will never be true.
Change the MATCH to use the same id property you use everywhere else:
-WHERE id(a) = edge.from_node AND id(b) = edge.to_node
+WHERE a.id = edge.from_node AND b.id = edge.to_node📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| UNWIND $edges AS edge | |
| MATCH (a)-[r]->(b) | |
| WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name | |
| RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists | |
| """ | |
| try: | |
| params = { | |
| "edges": [ | |
| { | |
| "from_node": str(edge[0]), | |
| "to_node": str(edge[1]), | |
| "relationship_name": edge[2], | |
| } | |
| for edge in edges | |
| ], | |
| } | |
| UNWIND $edges AS edge | |
| MATCH (a)-[r]->(b) | |
| - WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name | |
| + WHERE a.id = edge.from_node AND b.id = edge.to_node AND type(r) = edge.relationship_name | |
| RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists | |
| """ | |
| try: | |
| params = { | |
| "edges": [ | |
| { | |
| "from_node": str(edge[0]), | |
| "to_node": str(edge[1]), | |
| "relationship_name": edge[2], | |
| } | |
| for edge in edges | |
| ], | |
| } |
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 155 to 171, the MATCH clause incorrectly compares internal graph IDs
(id(a), id(b)) to string UUIDs passed as parameters, causing the comparison to
always fail. Update the MATCH condition to compare the node property 'id' (or
the appropriate UUID property) instead of using id(a) and id(b), ensuring the
property used matches the UUID strings in the parameters for correct edge
existence checking.
| query = f""" | ||
| UNWIND $node_ids AS id | ||
| MATCH (node:`{id}`)<-[r:{edge_label}]-(successor) | ||
| DELETE r; | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Label/string interpolation misuse
MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
uses {id} inside back-ticks, producing something like node:1234`` which is invalid.
Use property matching instead (or APOC):
-MATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
+MATCH (node {id: nid})<-[r:`{edge_label}`]-(successor)Again, whitelist edge_label.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 414 to 418, the query incorrectly uses string interpolation for the node
label with `{id}` inside backticks, resulting in invalid Cypher syntax like
`node:`1234``. Instead, modify the query to match nodes by a property, for
example using `MATCH (node) WHERE node.id = $id` to safely match nodes by their
ID property. Also, ensure that the `edge_label` is whitelisted before
interpolation to prevent injection risks.
| query = dedent( | ||
| f"""\ | ||
| MATCH (from_node {{id: $from_node}}), | ||
| (to_node {{id: $to_node}}) | ||
| MERGE (from_node)-[r:{relationship_name}]->(to_node) | ||
| ON CREATE SET r += $properties, r.updated_at = timestamp() | ||
| ON MATCH SET r += $properties, r.updated_at = timestamp() | ||
| RETURN r | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Un-sanitised f-string may allow Cypher-injection
MERGE (from_node)-[r:{relationship_name}]->(to_node) interpolates relationship_name directly into the query.
If user input reaches this path, an attacker can inject arbitrary Cypher.
Validate relationship_name against a whitelist (re.fullmatch(r"[A-Z_][A-Z0-9_]*", …)) or use APOC’s apoc.create.relationship.
🏁 Script executed:
#!/bin/bash
# Find all occurrences of `relationship_name` to trace its origin and check for sanitization
rg -n 'relationship_name' cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.pyLength of output: 960
🏁 Script executed:
#!/bin/bash
# Display lines 150-250 to inspect the function signature and usage of relationship_name
sed -n '150,250p' cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.pyLength of output: 3490
🏁 Script executed:
#!/bin/bash
# Search for .add_edge usages
rg -n "add_edge\(" -n cogneeLength of output: 1872
Secure relationship_name interpolation in add_edge` to prevent Cypher injection
The current add_edge implementation uses an f-string to inject relationship_name directly into the Cypher pattern, which can be exploited if an attacker controls that value. We need to either strictly validate the name against a whitelist or switch to a parameterized APOC call.
• File cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py, add_edge (lines 188–196)
• Vulnerable line:
MERGE (from_node)-[r:{relationship_name}]->(to_node)Suggested fixes:
- Add a whitelist check before building the query:
+ import re
+ # ensure only valid relationship names are allowed
+ if not re.fullmatch(r"[A-Z_][A-Z0-9_]*", relationship_name):
+ raise ValueError(f"Invalid relationship name: {relationship_name}")- Replace the dynamic MERGE with an APOC call:
- query = dedent(
- f"""\
- MATCH (from_node {{id: $from_node}}),
- (to_node {{id: $to_node}})
- MERGE (from_node)-[r:{relationship_name}]->(to_node)
- ON CREATE SET r += $properties, r.updated_at = timestamp()
- ON MATCH SET r += $properties, r.updated_at = timestamp()
- RETURN r
- """
- )
+ query = dedent(
+ """\
+ MATCH (from_node {id: $from_node}), (to_node {id: $to_node})
+ CALL apoc.create.relationship(
+ from_node,
+ $relationship_name,
+ $properties,
+ to_node
+ ) YIELD rel AS r
+ SET r.updated_at = timestamp()
+ RETURN r
+ """
+ )This ensures relationship_name cannot inject arbitrary Cypher and uses a parameterized APOC procedure instead.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around
lines 188 to 196, the f-string directly interpolates relationship_name into the
Cypher query, risking Cypher injection. Fix this by validating relationship_name
against a strict whitelist regex pattern like r"[A-Z_][A-Z0-9_]*" before using
it in the query, or refactor the query to use the APOC procedure
apoc.create.relationship with relationship_name as a parameter to safely create
the relationship without direct string interpolation.
| DEFAULT_TOOLS = [ | ||
| { | ||
| "type": "function", | ||
| "name": "search", | ||
| "description": "Search for information within the knowledge graph", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "search_query": { | ||
| "type": "string", | ||
| "description": "The query to search for in the knowledge graph", | ||
| }, | ||
| "search_type": { | ||
| "type": "string", | ||
| "description": "Type of search to perform", | ||
| "enum": [ | ||
| "INSIGHTS", | ||
| "CODE", | ||
| "GRAPH_COMPLETION", | ||
| "SEMANTIC", | ||
| "NATURAL_LANGUAGE", | ||
| ], | ||
| }, | ||
| "top_k": { | ||
| "type": "integer", | ||
| "description": "Maximum number of results to return", | ||
| "default": 10, | ||
| }, | ||
| "datasets": { | ||
| "type": "array", | ||
| "items": {"type": "string"}, | ||
| "description": "Optional list of dataset names to search within", | ||
| }, | ||
| }, | ||
| "required": ["search_query"], | ||
| }, | ||
| }, | ||
| { | ||
| "type": "function", | ||
| "name": "cognify", | ||
| "description": "Convert text into a knowledge graph or process all added content", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "text": { | ||
| "type": "string", | ||
| "description": "Text content to be converted into a knowledge graph", | ||
| }, | ||
| "graph_model_name": { | ||
| "type": "string", | ||
| "description": "Name of the graph model to use", | ||
| }, | ||
| "graph_model_file": { | ||
| "type": "string", | ||
| "description": "Path to a custom graph model file", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| { | ||
| "type": "function", | ||
| "name": "prune", | ||
| "description": "Remove unnecessary or outdated information from the knowledge graph", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "prune_strategy": { | ||
| "type": "string", | ||
| "enum": ["light", "moderate", "aggressive"], | ||
| "description": "Strategy for pruning the knowledge graph", | ||
| "default": "moderate", | ||
| }, | ||
| "min_confidence": { | ||
| "type": "number", | ||
| "description": "Minimum confidence score to retain (0-1)", | ||
| "minimum": 0, | ||
| "maximum": 1, | ||
| }, | ||
| "older_than": { | ||
| "type": "string", | ||
| "description": "ISO date string - prune nodes older than this date", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify that search_type enum values match the actual SearchType implementation.
The search_type enum values in the tool definition should match the actual implementation in the SearchType enum. The current set of values (INSIGHTS, CODE, GRAPH_COMPLETION, SEMANTIC, NATURAL_LANGUAGE) seems to be missing some values that are in the SearchType enum (e.g., CHUNKS, RAG_COMPLETION, GRAPH_SUMMARY_COMPLETION, CYPHER, SUMMARIES).
🏁 Script executed:
#!/bin/bash
# Check if all SearchType enum values are included in the DEFAULT_TOOLS search_type enum
# First, get all SearchType values from the enum definition
echo "SearchType enum values:"
rg -A 10 "class SearchType" --type py | grep -o '"[A-Z_]\+"' | sort
# Then compare with what's in the default_tools.py
echo -e "\nSearch type enum values in DEFAULT_TOOLS:"
rg -A 10 '"search_type":' cognee/api/v1/responses/routers/default_tools.py | grep -o '"[A-Z_]\+"' | sortLength of output: 529
Align search_type enum in DEFAULT_TOOLS with the actual SearchType values
The search_type enum in cognee/api/v1/responses/routers/default_tools.py currently lists:
- INSIGHTS
- CODE
- GRAPH_COMPLETION
- SEMANTIC
- NATURAL_LANGUAGE
But your SearchType enum defines:
- CHUNKS
- CODE
- CYPHER
- GRAPH_COMPLETION
- GRAPH_SUMMARY_COMPLETION
- INSIGHTS
- NATURAL_LANGUAGE
- RAG_COMPLETION
- SUMMARIES
And SEMANTIC is not part of SearchType. Please update the tool definition so that search_type.enum matches exactly the SearchType values.
• File: cognee/api/v1/responses/routers/default_tools.py
• Location: inside the parameters → properties → search_type block of the search tool
Suggested replacement for the enum array:
- "enum": [
- "INSIGHTS",
- "CODE",
- "GRAPH_COMPLETION",
- "SEMANTIC",
- "NATURAL_LANGUAGE",
- ],
+ "enum": [
+ "CHUNKS",
+ "CODE",
+ "CYPHER",
+ "GRAPH_COMPLETION",
+ "GRAPH_SUMMARY_COMPLETION",
+ "INSIGHTS",
+ "NATURAL_LANGUAGE",
+ "RAG_COMPLETION",
+ "SUMMARIES",
+ ],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| DEFAULT_TOOLS = [ | |
| { | |
| "type": "function", | |
| "name": "search", | |
| "description": "Search for information within the knowledge graph", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "search_query": { | |
| "type": "string", | |
| "description": "The query to search for in the knowledge graph", | |
| }, | |
| "search_type": { | |
| "type": "string", | |
| "description": "Type of search to perform", | |
| "enum": [ | |
| "INSIGHTS", | |
| "CODE", | |
| "GRAPH_COMPLETION", | |
| "SEMANTIC", | |
| "NATURAL_LANGUAGE", | |
| ], | |
| }, | |
| "top_k": { | |
| "type": "integer", | |
| "description": "Maximum number of results to return", | |
| "default": 10, | |
| }, | |
| "datasets": { | |
| "type": "array", | |
| "items": {"type": "string"}, | |
| "description": "Optional list of dataset names to search within", | |
| }, | |
| }, | |
| "required": ["search_query"], | |
| }, | |
| }, | |
| { | |
| "type": "function", | |
| "name": "cognify", | |
| "description": "Convert text into a knowledge graph or process all added content", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "text": { | |
| "type": "string", | |
| "description": "Text content to be converted into a knowledge graph", | |
| }, | |
| "graph_model_name": { | |
| "type": "string", | |
| "description": "Name of the graph model to use", | |
| }, | |
| "graph_model_file": { | |
| "type": "string", | |
| "description": "Path to a custom graph model file", | |
| }, | |
| }, | |
| }, | |
| }, | |
| { | |
| "type": "function", | |
| "name": "prune", | |
| "description": "Remove unnecessary or outdated information from the knowledge graph", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "prune_strategy": { | |
| "type": "string", | |
| "enum": ["light", "moderate", "aggressive"], | |
| "description": "Strategy for pruning the knowledge graph", | |
| "default": "moderate", | |
| }, | |
| "min_confidence": { | |
| "type": "number", | |
| "description": "Minimum confidence score to retain (0-1)", | |
| "minimum": 0, | |
| "maximum": 1, | |
| }, | |
| "older_than": { | |
| "type": "string", | |
| "description": "ISO date string - prune nodes older than this date", | |
| }, | |
| }, | |
| }, | |
| }, | |
| ] | |
| DEFAULT_TOOLS = [ | |
| { | |
| "type": "function", | |
| "name": "search", | |
| "description": "Search for information within the knowledge graph", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "search_query": { | |
| "type": "string", | |
| "description": "The query to search for in the knowledge graph", | |
| }, | |
| "search_type": { | |
| "type": "string", | |
| "description": "Type of search to perform", | |
| "enum": [ | |
| "CHUNKS", | |
| "CODE", | |
| "CYPHER", | |
| "GRAPH_COMPLETION", | |
| "GRAPH_SUMMARY_COMPLETION", | |
| "INSIGHTS", | |
| "NATURAL_LANGUAGE", | |
| "RAG_COMPLETION", | |
| "SUMMARIES", | |
| ], | |
| }, | |
| }, | |
| }, | |
| }, | |
| { | |
| "type": "function", | |
| "name": "cognify", | |
| "description": "Convert text into a knowledge graph or process all added content", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "text": { | |
| "type": "string", | |
| "description": "Text content to be converted into a knowledge graph", | |
| }, | |
| "graph_model_name": { | |
| "type": "string", | |
| "description": "Name of the graph model to use", | |
| }, | |
| "graph_model_file": { | |
| "type": "string", | |
| "description": "Path to a custom graph model file", | |
| }, | |
| }, | |
| }, | |
| }, | |
| { | |
| "type": "function", | |
| "name": "prune", | |
| "description": "Remove unnecessary or outdated information from the knowledge graph", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "prune_strategy": { | |
| "type": "string", | |
| "enum": ["light", "moderate", "aggressive"], | |
| "description": "Strategy for pruning the knowledge graph", | |
| "default": "moderate", | |
| }, | |
| "min_confidence": { | |
| "type": "number", | |
| "description": "Minimum confidence score to retain (0-1)", | |
| "minimum": 0, | |
| "maximum": 1, | |
| }, | |
| "older_than": { | |
| "type": "string", | |
| "description": "ISO date string - prune nodes older than this date", | |
| }, | |
| }, | |
| }, | |
| }, | |
| ] |
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/default_tools.py within lines 1 to 86, update
the enum values for the "search_type" property inside the "search" tool's
parameters to exactly match the SearchType enum values defined elsewhere.
Replace the current enum list ["INSIGHTS", "CODE", "GRAPH_COMPLETION",
"SEMANTIC", "NATURAL_LANGUAGE"] with ["CHUNKS", "CODE", "CYPHER",
"GRAPH_COMPLETION", "GRAPH_SUMMARY_COMPLETION", "INSIGHTS", "NATURAL_LANGUAGE",
"RAG_COMPLETION", "SUMMARIES"] and remove "SEMANTIC" since it is not part of the
SearchType enum.
<!-- .github/pull_request_template.md --> ## Description Adds modal parallel evaluation for retriever development ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
cognee/eval_framework/metrics_dashboard.py (1)
131-170: Docstring is now stale – update it to reflect the new return value
create_dashboardused to return a file path, but it now returns the full HTML string. The docstring still says “Create and save the dashboard with all visualizations.” without mentioning the HTML return, which can mislead downstream users and static-type checkers.@@ - """Create and save the dashboard with all visualizations.""" + """ + Create the dashboard, write it to `output_file`, and return the + HTML string. + + Returns + ------- + str + Complete dashboard HTML that was written to disk. + """cognee/eval_framework/eval_config.py (1)
45-47: Document the newinstance_filterfieldThe new optional list is great, but nothing in the class docstring (nor inline comment) explains its purpose or expected values. A brief description will help other contributors understand how to use it and avoid passing malformed data.
- instance_filter: Optional[List[str]] = None +# instance_filter +# --------------- +# Optional list of instance IDs/names that should be evaluated. When +# provided, only these instances will be sent to the corpus builder. + instance_filter: Optional[List[str]] = None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
Dockerfile_modal(1 hunks)cognee/eval_framework/eval_config.py(3 hunks)cognee/eval_framework/metrics_dashboard.py(1 hunks)cognee/eval_framework/modal_run_eval.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
cognee/eval_framework/modal_run_eval.py
5-5: hashlib imported but unused
Remove unused import: hashlib
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: End-to-End Tests / Run Telemetry Test
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Deletion Test
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: End-to-End Tests / Server Start Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Basic Tests / Run Basic Graph Tests
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: Publish Cognee Docker image
🔇 Additional comments (3)
Dockerfile_modal (3)
21-21: Good optimization: Consolidated file copying.Including README.md in the initial COPY command streamlines the Docker build process by reducing the number of layers and eliminating a redundant copy operation that was likely present in the previous version.
25-26: Good practice: Disabled Poetry virtualenvs in Docker.Disabling Poetry's virtual environment creation is the recommended approach when running in a Docker container, as containers already provide isolation. This eliminates unnecessary nesting of environments and potential path-related issues.
27-27:Details
❓ Verification inconclusive
Good optimization: Explicit dependency installation.
Specifying only the required extras (neo4j and qdrant) instead of installing all extras makes the Dockerfile more explicit and reduces the image size. The
--no-rootflag is appropriate since the application code is copied in a later step.
🏁 Script executed:
#!/bin/bash # Verify that only neo4j and qdrant extras are needed for the modal environment grep -r "RUN_MODE=modal" cognee/Length of output: 32
🏁 Script executed:
#!/bin/bash # Locate Dockerfile_modal anywhere in the repo echo "Finding Dockerfile_modal paths:" file_paths=$(fd Dockerfile_modal || true) echo "$file_paths" # If found, show context around the install line if [ -n "$file_paths" ]; then for f in $file_paths; do echo -e "\n--- Context in $f (lines 1–60) ---" sed -n '1,60p' "$f" done else echo "No Dockerfile_modal found via fd." fi # Search for RUN_MODE=modal usage globally echo -e "\n--- Searching for RUN_MODE=modal ---" grep -R "RUN_MODE=modal" -n . || echo "No occurrences of RUN_MODE=modal" # Search for any 'modal' references to see how the modal environment is triggered echo -e "\n--- Searching for 'modal' keyword occurrences ---" grep -R "modal" -n . || echo "No 'modal' references found"Length of output: 2298
🏁 Script executed:
#!/bin/bash # 1. Display the extras defined in pyproject.toml echo "=== tool.poetry.extras in pyproject.toml ===" sed -n '/^\[tool\.poetry\.extras\]/,/^\[/{p}' pyproject.toml || echo "Error: pyproject.toml not found" # 2. Search code for direct usage of neo4j and qdrant echo -e "\n=== Searching for neo4j usage in cognee/ ===" grep -R "neo4j" -n cognee/ || echo "No neo4j references found" echo -e "\n=== Searching for qdrant usage in cognee/ ===" grep -R "qdrant" -n cognee/ || echo "No qdrant references found" # 3. Gather all extras keys and check for imports of each echo -e "\n=== Checking for other extras usage ===" extras=$(sed -n '/^\[tool\.poetry\.extras\]/,/^\[/{/=/p}' pyproject.toml \ | sed -E 's/^\s*"?([a-zA-Z0-9_-]+)"?\s*=.*/\1/') || extras="" for ex in $extras; do echo -e "\n-- Extra key: $ex" grep -R "import $ex" -n cognee/ && continue grep -R "$ex\." -n cognee/ || echo "No import matches for $ex" doneLength of output: 7059
🏁 Script executed:
#!/bin/bash # Locate pyproject.toml and poetry.lock in the repo echo "=== Locating pyproject.toml ===" fd pyproject.toml || echo "pyproject.toml not found" echo -e "\n=== Locating poetry.lock ===" fd poetry.lock || echo "poetry.lock not found" # If pyproject.toml is found, display the '[tool.poetry.extras]' section for file in $(fd pyproject.toml); do echo -e "\n--- Extras in $file ---" sed -n '/^\[tool\.poetry\.extras\]/,/^\[/{p}' "$file" || echo "Failed to read extras" doneLength of output: 203
Verify Poetry extras in Dockerfile_modal
RUN poetry install --extras neo4j --extras qdrant --no-root
- Ensure that this command includes all extras defined under
[tool.poetry.extras]in yourpyproject.tomlthat are required by the modal environment.- If there are additional extras used by
cognee/eval_framework/modal_run_eval.pyor other modal‐related code, add them here to avoid missing dependencies.
| import hashlib | ||
| import json | ||
| from cognee.shared.logging_utils import get_logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused hashlib import
hashlib is imported but never referenced, and Ruff flags this (F401).
-import hashlib📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import hashlib | |
| import json | |
| from cognee.shared.logging_utils import get_logger | |
| import json | |
| from cognee.shared.logging_utils import get_logger |
🧰 Tools
🪛 Ruff (0.11.9)
5-5: hashlib imported but unused
Remove unused import: hashlib
(F401)
🤖 Prompt for AI Agents
In cognee/eval_framework/modal_run_eval.py around lines 5 to 7, the hashlib
module is imported but not used anywhere in the code, causing an unused import
warning. Remove the line importing hashlib to clean up the code and resolve the
F401 warning flagged by Ruff.
| if eval_params.get("dashboard"): | ||
| logger.info("Generating dashboard...") | ||
| html_output = create_dashboard( | ||
| metrics_path=eval_params["metrics_path"], | ||
| aggregate_metrics_path=eval_params["aggregate_metrics_path"], | ||
| output_file=eval_params["dashboard_path"], | ||
| benchmark=eval_params["benchmark"], | ||
| ) | ||
| return None | ||
|
|
||
| return read_and_combine_metrics(eval_params) | ||
| with open("/data/" + html_filename, "w") as f: | ||
| f.write(html_output) | ||
| vol.commit() | ||
|
|
||
| logger.info("Evaluation set finished...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
html_output may be undefined when dashboard is False
html_output is created only inside the if eval_params.get("dashboard"): block, but the write/commit logic that uses it is unconditionally executed right after the block.
For runs where dashboard=False, this will raise UnboundLocalError.
- if eval_params.get("dashboard"):
- logger.info("Generating dashboard...")
- html_output = create_dashboard(
- metrics_path=eval_params["metrics_path"],
- aggregate_metrics_path=eval_params["aggregate_metrics_path"],
- output_file=eval_params["dashboard_path"],
- benchmark=eval_params["benchmark"],
- )
-
- with open("/data/" + html_filename, "w") as f:
- f.write(html_output)
- vol.commit()
+ if eval_params.get("dashboard"):
+ logger.info("Generating dashboard...")
+ html_output = create_dashboard(
+ metrics_path=eval_params["metrics_path"],
+ aggregate_metrics_path=eval_params["aggregate_metrics_path"],
+ output_file=eval_params["dashboard_path"],
+ benchmark=eval_params["benchmark"],
+ )
+
+ with open("/data/" + html_filename, "w") as f:
+ f.write(html_output)
+ vol.commit()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if eval_params.get("dashboard"): | |
| logger.info("Generating dashboard...") | |
| html_output = create_dashboard( | |
| metrics_path=eval_params["metrics_path"], | |
| aggregate_metrics_path=eval_params["aggregate_metrics_path"], | |
| output_file=eval_params["dashboard_path"], | |
| benchmark=eval_params["benchmark"], | |
| ) | |
| return None | |
| return read_and_combine_metrics(eval_params) | |
| with open("/data/" + html_filename, "w") as f: | |
| f.write(html_output) | |
| vol.commit() | |
| logger.info("Evaluation set finished...") | |
| if eval_params.get("dashboard"): | |
| logger.info("Generating dashboard...") | |
| html_output = create_dashboard( | |
| metrics_path=eval_params["metrics_path"], | |
| aggregate_metrics_path=eval_params["aggregate_metrics_path"], | |
| output_file=eval_params["dashboard_path"], | |
| benchmark=eval_params["benchmark"], | |
| ) | |
| with open("/data/" + html_filename, "w") as f: | |
| f.write(html_output) | |
| vol.commit() | |
| logger.info("Evaluation set finished...") |
🤖 Prompt for AI Agents
In cognee/eval_framework/modal_run_eval.py around lines 84 to 97, the variable
html_output is only defined inside the if block checking
eval_params.get("dashboard"), but it is used unconditionally afterward, causing
an UnboundLocalError when dashboard is False. To fix this, move the file writing
and vol.commit() calls inside the if block so they only execute when html_output
is defined, or add a conditional check before using html_output to ensure it is
defined.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - set the parallel option to None in Fastembed's embedding function ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py (1)
47-47:Details
❓ Verification inconclusive
Improved parallelism handling for text embeddings.
Changing the
parallelparameter from0toNonelikely allows the fastembed library to use its default parallelism strategy rather than explicitly disabling it. This should improve embedding generation performance, especially for larger batches of text, by allowing the library to optimize resource allocation based on the system capabilities.Consider adding a brief comment explaining this performance optimization choice for future maintainers:
embeddings = self.embedding_model.embed( text, batch_size=len(text), - parallel=None, + parallel=None, # Use default parallelism strategy for optimal performance )
🏁 Script executed:
#!/bin/bash # Check fastembed documentation for parallel parameter rg -A 5 -B 5 "parallel" --glob "*fastembed*"Length of output: 193
Manual verification required: Confirm
parallel=Nonebehavior in fastembedI wasn’t able to find any references to the
parallelparameter in the codebase or fastembed docs via ripgrep, so please double-check that passingparallel=Noneindeed defers to fastembed’s default parallelism (rather than disabling it) before relying on it for performance.Suggested update once confirmed:
embeddings = self.embedding_model.embed( text, batch_size=len(text), - parallel=None, + parallel=None, # Use fastembed’s default parallelism for optimal performance )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: End-to-End Tests / S3 Bucket Test
- GitHub Check: End-to-End Tests / Deletion Test
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: Basic Tests / Run Basic Graph Tests
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: Basic Tests / Run Simple Examples
- GitHub Check: Basic Tests / Run Integration Tests
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Publish Cognee Docker image
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
cognee/shared/logging_utils.py (1)
326-326: Add comment explaining the SQLAlchemy warning suppression changeChanged from suppressing SQLAlchemy warnings only when log_level > WARNING to suppressing them when log_level > DEBUG. Consider adding a comment explaining why SQLAlchemy warnings are now suppressed at INFO and WARNING levels.
- if log_level > logging.DEBUG: + # Suppress SQLAlchemy warnings at INFO level and above to reduce noise in logs + # while still showing them at DEBUG level for detailed troubleshooting + if log_level > logging.DEBUG: import warnings from sqlalchemy.exc import SAWarning
🧹 Nitpick comments (2)
cognee/shared/logging_utils.py (1)
7-12: Clean up unused importThe
importlib.metadataimport is not being used in this file and should be removed.import os import sys import threading import logging import structlog import traceback import platform from datetime import datetime from pathlib import Path -import importlib.metadata from cognee import __version__ as cognee_version🧰 Tools
🪛 Ruff (0.11.9)
10-10:
importlib.metadataimported but unusedRemove unused import:
importlib.metadata(F401)
cognee/version.py (1)
7-24: Great implementation of version detection logicThe function provides a robust way to determine the package version in both development and installed environments.
Consider simplifying the nested
withstatements:def get_cognee_version() -> str: """Returns either the version of installed cognee package or the one found in nearby pyproject.toml""" with suppress(FileNotFoundError, StopIteration): - with open( - os.path.join(Path(__file__).parent.parent, "pyproject.toml"), encoding="utf-8" - ) as pyproject_toml: + with open(os.path.join(Path(__file__).parent.parent, "pyproject.toml"), + encoding="utf-8") as pyproject_toml: version = ( next(line for line in pyproject_toml if line.startswith("version")) .split("=")[1] .strip("'\"\n ") ) # Mark the version as a local Cognee library by appending "-dev" return f"{version}-dev"🧰 Tools
🪛 Ruff (0.11.9)
10-13: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cognee/__init__.py(1 hunks)cognee/shared/logging_utils.py(4 hunks)cognee/version.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
cognee/__init__.py (1)
cognee/version.py (1)
get_cognee_version(7-24)
🪛 Ruff (0.11.9)
cognee/version.py
10-13: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
cognee/shared/logging_utils.py
10-10: importlib.metadata imported but unused
Remove unused import: importlib.metadata
(F401)
🔇 Additional comments (3)
cognee/__init__.py (1)
1-5: Good addition of version tracking!Adding a module-level
__version__variable is a good practice for Python packages. The comment about circular imports is helpful and explains why the version extraction needs to be at the top of the file.cognee/shared/logging_utils.py (2)
43-48: Good addition of version tracking constantsAdding these version constants enables consistent tracking across logs, which is valuable for debugging and support.
340-351: Excellent enhancement to logging initializationAdding system and version information to the logs is very valuable for troubleshooting and understanding the environment where issues occur.
<!-- .github/pull_request_template.md --> ## Description Adds dashboard application to parallel modal evals to enable fast retriever development/evaluation ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: lxobr <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
cognee/eval_framework/modal_eval_dashboard.py (5)
73-75: Add protection against division by zero.While it's unlikely in this context, it's good practice to add protection against division by zero.
- "avg_EM": round(total_em / num_q, 4), - "avg_F1": round(total_f1 / num_q, 4), - "avg_correctness": round(total_corr / num_q, 4), + "avg_EM": round(total_em / num_q, 4) if num_q > 0 else 0, + "avg_F1": round(total_f1 / num_q, 4) if num_q > 0 else 0, + "avg_correctness": round(total_corr / num_q, 4) if num_q > 0 else 0,
84-91: Consider adding interactive visualizations for better insights.The current dashboard shows only tabular data. Consider enhancing it with interactive charts for better visualization of the metrics.
You could add bar charts or line charts to compare metrics across different benchmarks:
import plotly.express as px # After creating the DataFrame if not df.empty: # Create visualizations st.subheader("Metrics Visualization") fig = px.bar( df, x="file", y=["avg_EM", "avg_F1", "avg_correctness"], barmode="group", title="Average Metrics by File" ) st.plotly_chart(fig, use_container_width=True) # Original tabular display st.subheader("Results by benchmark") # Rest of the code...
11-11: Consider configuring volume size and persistence options.The current volume configuration doesn't specify size limits or persistence options, which might lead to unexpected behavior in production.
-metrics_volume = modal.Volume.from_name("evaluation_dashboard_results", create_if_missing=True) +# Configure volume with appropriate size and persistence settings +metrics_volume = modal.Volume.from_name( + "evaluation_dashboard_results", + create_if_missing=True, + size_mb=1024, # Adjust based on expected data volume + persistent=True # Ensure data persists between app deployments +)Note: Verify the Modal API documentation for the correct parameters as they may have changed since my last update.
41-41: Use subprocess.run instead of Popen for better process management.Using
subprocess.Popenwithout managing the process can lead to zombie processes if the parent process exits unexpectedly.- subprocess.Popen(cmd, shell=True) + # Using run() with shell=True is sufficient for this use case + # since Modal will manage the container lifecycle + subprocess.run(cmd, shell=True)
1-7: Organize imports according to PEP 8 conventions.The imports could be organized more clearly by grouping them into standard library imports, third-party imports, and local imports.
-import os -import json -import pandas as pd -import subprocess -import modal -import streamlit as st +# Standard library imports +import json +import os +import subprocess + +# Third-party imports +import modal +import pandas as pd +import streamlit as st
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/eval_framework/modal_eval_dashboard.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
- GitHub Check: End-to-End Tests / Deduplication Test
- GitHub Check: Basic Tests / Run Unit Tests
- GitHub Check: Publish Cognee Docker image
🔇 Additional comments (1)
cognee/eval_framework/modal_eval_dashboard.py (1)
36-40:Details
❓ Verification inconclusive
Security concern: CORS and XSRF protection disabled.
Your Streamlit configuration disables both CORS and XSRF protection, which could expose the dashboard to cross-site attacks. While this might be convenient for development, it's not recommended for production environments.
Consider enabling these protections if the dashboard will be accessible outside a secure network:
🏁 Script executed:
#!/bin/bash # Check if this is a development or staging environment grep -r "development\|staging" . --include="*.py" --include="*.yaml" --include="*.json" | head -n 10Length of output: 567
Security hardening: re-enable CORS and XSRF protection for production
Disabling both CORS and XSRF protection:
streamlit run /root/serve_dashboard.py \ --server.port 8000 \ --server.enableCORS=false \ --server.enableXsrfProtection=falsemay be fine for local development, but if this dashboard is ever exposed beyond a locked-down network, you’ll want to turn these protections back on.
• Confirm whether this service is strictly for local/dev use only or if it will run in staging/production.
• If it must remain open for external users, gate these flags behind an environment check (e.g.ENV != "production").
• Example pattern:- streamlit run /root/serve_dashboard.py \ - --server.enableCORS=false \ - --server.enableXsrfProtection=false + if [ "$ENV" = "production" ]; then + streamlit run /root/serve_dashboard.py \ + --server.enableCORS=true \ + --server.enableXsrfProtection=true + else + streamlit run /root/serve_dashboard.py \ + --server.enableCORS=false \ + --server.enableXsrfProtection=false + fiEnsure you document and enforce the correct
ENVvalue in your deployment pipeline.
| total_corr = sum(q["metrics"]["correctness"]["score"] for q in items) | ||
| records.append( | ||
| { | ||
| "file": parts[0].upper() + "_____" + parts[2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent potential IndexError in filename parsing.
If parts has fewer than 3 elements, accessing parts[2] would raise an IndexError.
- "file": parts[0].upper() + "_____" + parts[2],
+ "file": parts[0].upper() + "_____" + (parts[2] if len(parts) > 2 else ""),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "file": parts[0].upper() + "_____" + parts[2], | |
| - "file": parts[0].upper() + "_____" + parts[2], | |
| + "file": parts[0].upper() + "_____" + (parts[2] if len(parts) > 2 else ""), |
🤖 Prompt for AI Agents
In cognee/eval_framework/modal_eval_dashboard.py at line 70, the code accesses
parts[2] without checking if parts has at least 3 elements, which can cause an
IndexError. Add a condition to verify the length of parts before accessing
parts[2], and handle cases where parts has fewer than 3 elements safely, such as
by providing a default value or skipping the operation.
| total_em = sum(q["metrics"]["EM"]["score"] for q in items) | ||
| total_f1 = sum(q["metrics"]["f1"]["score"] for q in items) | ||
| total_corr = sum(q["metrics"]["correctness"]["score"] for q in items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add exception handling for JSON structure inconsistencies.
The code assumes a specific structure in the JSON files and would raise errors if the expected fields are missing.
- total_em = sum(q["metrics"]["EM"]["score"] for q in items)
- total_f1 = sum(q["metrics"]["f1"]["score"] for q in items)
- total_corr = sum(q["metrics"]["correctness"]["score"] for q in items)
+ try:
+ total_em = sum(q.get("metrics", {}).get("EM", {}).get("score", 0) for q in items)
+ total_f1 = sum(q.get("metrics", {}).get("f1", {}).get("score", 0) for q in items)
+ total_corr = sum(q.get("metrics", {}).get("correctness", {}).get("score", 0) for q in items)
+ except Exception as e:
+ st.warning(f"Error processing metrics in {filename}: {str(e)}")
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| total_em = sum(q["metrics"]["EM"]["score"] for q in items) | |
| total_f1 = sum(q["metrics"]["f1"]["score"] for q in items) | |
| total_corr = sum(q["metrics"]["correctness"]["score"] for q in items) | |
| try: | |
| total_em = sum(q.get("metrics", {}).get("EM", {}).get("score", 0) for q in items) | |
| total_f1 = sum(q.get("metrics", {}).get("f1", {}).get("score", 0) for q in items) | |
| total_corr = sum(q.get("metrics", {}).get("correctness", {}).get("score", 0) for q in items) | |
| except Exception as e: | |
| st.warning(f"Error processing metrics in {filename}: {str(e)}") | |
| continue |
🤖 Prompt for AI Agents
In cognee/eval_framework/modal_eval_dashboard.py around lines 65 to 67, the code
sums scores assuming all JSON items have the expected nested keys, which can
cause errors if keys are missing. Add exception handling such as try-except
blocks or use dict.get() with default values to safely access nested keys and
handle missing fields gracefully without raising exceptions.
| if not filename.endswith(".json"): | ||
| continue | ||
| base = filename.rsplit(".", 1)[0] | ||
| parts = base.split("_") | ||
| benchmark = parts[1] if len(parts) >= 3 else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add robust error handling for filename parsing.
The current filename parsing logic assumes a specific format with underscores as separators. This could be fragile if filenames don't follow the expected pattern.
- base = filename.rsplit(".", 1)[0]
- parts = base.split("_")
- benchmark = parts[1] if len(parts) >= 3 else ""
+ try:
+ base = filename.rsplit(".", 1)[0]
+ parts = base.split("_")
+ benchmark = parts[1] if len(parts) >= 3 else ""
+ except (IndexError, ValueError) as e:
+ st.warning(f"Skipping file {filename} due to unexpected format: {str(e)}")
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not filename.endswith(".json"): | |
| continue | |
| base = filename.rsplit(".", 1)[0] | |
| parts = base.split("_") | |
| benchmark = parts[1] if len(parts) >= 3 else "" | |
| if not filename.endswith(".json"): | |
| continue | |
| try: | |
| base = filename.rsplit(".", 1)[0] | |
| parts = base.split("_") | |
| benchmark = parts[1] if len(parts) >= 3 else "" | |
| except (IndexError, ValueError) as e: | |
| st.warning(f"Skipping file {filename} due to unexpected format: {str(e)}") | |
| continue |
🤖 Prompt for AI Agents
In cognee/eval_framework/modal_eval_dashboard.py around lines 55 to 59, the
filename parsing assumes filenames have at least three underscore-separated
parts, which can cause errors if the format is unexpected. Add robust error
handling by checking the length of parts before accessing parts[1], and handle
cases where the filename format is invalid, such as by skipping those files or
logging a warning, to prevent crashes or incorrect behavior.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.