-
Notifications
You must be signed in to change notification settings - Fork 963
fix: Mcp small updates #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please make sure all the checkboxes are checked:
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces a comprehensive set of changes across the Cognee codebase. Major highlights include the addition of an OpenAI-compatible responses API with function-calling support, a new Memgraph graph database adapter, and refactored MCP server tooling. The pipeline and dataset handling logic is enhanced with context passing, unique dataset ID generation, and improved pipeline status tracking. Numerous vector database adapters are updated for better error handling and type safety. The frontend receives updates to dataset and cognify workflows, and new example scripts and tests are added for various database backends. Several obsolete files, visualizations, and tests are removed. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastAPI
participant ResponsesRouter
participant OpenAI Model
participant DispatchFunction
participant ToolHandler
Client->>FastAPI: POST /api/v1/responses (input, tools, etc.)
FastAPI->>ResponsesRouter: Handle request
ResponsesRouter->>OpenAI Model: Call with input, tools
OpenAI Model-->>ResponsesRouter: Response (may include tool calls)
loop For each tool call in response
ResponsesRouter->>DispatchFunction: dispatch_function(tool_call)
DispatchFunction->>ToolHandler: handle_search/handle_cognify/handle_prune
ToolHandler-->>DispatchFunction: Tool call result
DispatchFunction-->>ResponsesRouter: Tool call output
end
ResponsesRouter-->>FastAPI: Structured response (id, model, tool_calls, usage)
FastAPI-->>Client: API response
sequenceDiagram
participant User
participant Frontend
participant API
participant DatasetManager
participant Pipeline
User->>Frontend: Add data, select dataset, click "Cognify"
Frontend->>API: Request to cognify dataset
API->>DatasetManager: Find or create dataset (by name/user)
DatasetManager->>Pipeline: Start cognify pipeline (with unique dataset ID)
Pipeline-->>API: Pipeline status updated/logged
API-->>Frontend: Cognify started/status
Frontend-->>User: Show notification, update UI
Possibly related issues
Possibly related PRs
Suggested labels
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
🛑 Comments failed to post (49)
cognee/modules/retrieval/exceptions/__init__.py (1)
7-7: 💡 Verification agent
🧩 Analysis chain
Removed CollectionDistancesNotFoundError from imports.
The removal of this exception from imports suggests it's no longer needed in the codebase. Ensure all code that previously used this exception has been properly updated to handle these error cases differently.
The static analysis flags that imported exceptions are unused within this file. Consider adding them to
__all__for proper re-export:-from .exceptions import SearchTypeNotSupported, CypherSearchError +from .exceptions import SearchTypeNotSupported, CypherSearchError + +__all__ = ["SearchTypeNotSupported", "CypherSearchError"]
🏁 Script executed:
#!/bin/bash # Check if CollectionDistancesNotFoundError is still referenced anywhere in the codebase echo "Checking for any remaining references to CollectionDistancesNotFoundError..." rg -l "CollectionDistancesNotFoundError" --type pyLength of output: 213
To ensure the exception was actually removed and isn’t still defined elsewhere (which could break exports), let’s locate its definition:
🏁 Script executed:
#!/bin/bash # Locate the definition of CollectionDistancesNotFoundError echo "Searching for the definition of CollectionDistancesNotFoundError..." rg -n "class CollectionDistancesNotFoundError" --type pyLength of output: 203
Confirm safe removal of CollectionDistancesNotFoundError and add all for exports
- No references or definitions of
CollectionDistancesNotFoundErrorwere found in the codebase—its removal is safe.- To make your module’s public API explicit, add an
__all__declaration.File: cognee/modules/retrieval/exceptions/init.py
from .exceptions import SearchTypeNotSupported, CypherSearchError +__all__ = ["SearchTypeNotSupported", "CypherSearchError"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from .exceptions import SearchTypeNotSupported, CypherSearchError __all__ = ["SearchTypeNotSupported", "CypherSearchError"]🧰 Tools
🪛 Ruff (0.8.2)
7-7:
.exceptions.SearchTypeNotSupportedimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
7-7:
.exceptions.CypherSearchErrorimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
🤖 Prompt for AI Agents
In cognee/modules/retrieval/exceptions/__init__.py at line 7, confirm that CollectionDistancesNotFoundError is fully removed since it is no longer referenced or defined anywhere in the codebase. Then, add an __all__ list to explicitly declare the public API by including the currently imported exceptions SearchTypeNotSupported and CypherSearchError to ensure proper re-export and clarity of the module's exports.cognee/modules/settings/get_settings.py (1)
2-2: 💡 Verification agent
🧩 Analysis chain
Made LLM endpoint and API version optional.
Good improvement making the
endpointandapi_versionfields optional. This increases flexibility when integrating with different LLM backends that might not require these fields.Let's verify that the code handling these fields properly handles the case when they're None:
Also applies to: 24-25
🏁 Script executed:
#!/bin/bash # Check how these optional fields are used throughout the codebase echo "Checking usage of llm_endpoint and llm_api_version..." rg "llm_endpoint" --type py -A 2 -B 2 rg "llm_api_version" --type py -A 2 -B 2Length of output: 5608
🏁 Script executed:
#!/bin/bash # Display OpenAIAdapter and GenericAPIAdapter implementations rg -n "class OpenAIAdapter" -A 20 -B 5 cognee/infrastructure/llm/get_llm_client.py rg -n "class GenericAPIAdapter" -A 20 -B 5 cognee/infrastructure/llm/get_llm_client.pyLength of output: 173
The previous search didn’t locate any class definitions—let’s inspect the top of
get_llm_client.pyto see how adapters are defined or imported.
🏁 Script executed:
#!/bin/bash # Show the first 100 lines of get_llm_client.py to locate adapter definitions sed -n '1,100p' cognee/infrastructure/llm/get_llm_client.pyLength of output: 3260
Inspecting adapter constructors for optional
endpointandapi_versionhandling:
🏁 Script executed:
#!/bin/bash # Inspect adapter __init__ signatures and conditional logic for endpoint/api_version for file in \ cognee/infrastructure/llm/openai/adapter.py \ cognee/infrastructure/llm/generic_llm_api/adapter.py \ cognee/infrastructure/llm/gemini/adapter.py; do echo "==> $file" rg -n -A5 -B5 "class .+Adapter" "$file" rg -n -A5 -B5 "def __init__" "$file" doneLength of output: 2815
Ensure adapter constructors properly accept and handle optional
endpointandapi_versionWhile the config now allows
llm_endpointandllm_api_versionto be omitted, the adapter constructors still require them as non-optional parameters. This will fail at runtime if either isNone. Please update the signatures and guard their usage:• cognee/infrastructure/llm/openai/adapter.py
- Change
todef __init__( self, api_key: str, endpoint: str, api_version: str, model: str, … ):def __init__( self, api_key: str, endpoint: Optional[str] = None, api_version: Optional[str] = None, model: str, … ):- Inside
__init__, only passendpoint/api_versioninto the OpenAI client if they’re notNone.• cognee/infrastructure/llm/generic_llm_api/adapter.py
- Change
todef __init__(self, endpoint, api_key: str, model: str, name: str, max_tokens: int):def __init__(self, endpoint: Optional[str] = None, api_key: str, model: str, name: str, max_tokens: int):- Guard any requests that rely on
self.endpointbeing set.• (Optionally) cognee/infrastructure/llm/get_llm_client.py
- Before instantiating adapters, default missing fields or raise a clear error if a provider truly requires them.
These changes will ensure omitting
endpointorapi_versiondoes not cause unexpected crashes.🤖 Prompt for AI Agents
In cognee/infrastructure/llm/openai/adapter.py around the adapter __init__ method, update the endpoint and api_version parameters to be Optional[str] with default None, and modify the constructor logic to only pass these parameters to the OpenAI client if they are not None. Similarly, in cognee/infrastructure/llm/generic_llm_api/adapter.py, change the __init__ signature to make endpoint Optional[str] with default None and add guards around any usage of self.endpoint to handle the None case safely. Optionally, in cognee/infrastructure/llm/get_llm_client.py, add logic before adapter instantiation to provide default values or raise clear errors if required fields are missing. This will prevent runtime errors when these fields are omitted.README.md (1)
137-137:
⚠️ Potential issueRemove stray closing anchor tag.
The
<img>element on line 137 ends with</a>without a matching opening<a>tag, causing malformed HTML in the README. Remove the extraneous</a>or wrap the image in a proper link if that was the intention.🤖 Prompt for AI Agents
In README.md at line 137, there is a stray closing anchor tag </a> after the <img> element without a matching opening <a> tag, causing malformed HTML. Remove the extraneous </a> tag or, if the image should be a link, wrap the <img> element inside a proper <a> tag with a valid href attribute.cognee/modules/pipelines/models/PipelineRun.py (1)
8-12:
⚠️ Potential issueAdd a database migration for the new enum value.
Introducing
DATASET_PROCESSING_INITIATEDto a PostgreSQL/SQLAlchemyEnumcolumn requires an Alembic migration to update the existing type, e.g.:ALTER TYPE pipeline_runstatus ADD VALUE 'DATASET_PROCESSING_INITIATED';Without this migration, deployments will break due to an unknown enum value. Please add the corresponding migration script.
🤖 Prompt for AI Agents
In cognee/modules/pipelines/models/PipelineRun.py around lines 8 to 12, you added a new enum value DATASET_PROCESSING_INITIATED to the PipelineRunStatus enum. To prevent deployment failures, create an Alembic migration script that alters the corresponding PostgreSQL enum type by adding this new value using the SQL command ALTER TYPE pipeline_runstatus ADD VALUE 'DATASET_PROCESSING_INITIATED';. This migration must be applied before deploying the updated code.cognee/api/v1/config/config.py (1)
158-158: 🛠️ Refactor suggestion
Inconsistent exception class usage.
The
set_graph_db_configmethod now raises a nativeAttributeErrorwhile similar methods in this file (set_llm_config,set_relational_db_config, etc.) use the customInvalidAttributeErrorclass with themessageparameter.For consistency, consider modifying line 158 to match the pattern used in other methods:
- raise AttributeError(f"'{key}' is not a valid attribute of the config.") + raise InvalidAttributeError( + message=f"'{key}' is not a valid attribute of the config." + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.raise InvalidAttributeError( message=f"'{key}' is not a valid attribute of the config." )🤖 Prompt for AI Agents
In cognee/api/v1/config/config.py at line 158, replace the native AttributeError with the custom InvalidAttributeError class used in other similar methods. Change the raise statement to use InvalidAttributeError and pass the error message via the message parameter to maintain consistency across the file.cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts (1)
3-3: 💡 Verification agent
🧩 Analysis chain
Parameter type change doesn't seem to be used in the function
The parameter type has been updated from
{ id: string }to{ name: string }, but it's not clear how this parameter is used in the fetch request. There's no reference todataset.namein the request URL or body.
🏁 Script executed:
#!/bin/bash # Check if the visualization endpoint uses any dataset identifier in the backend rg -A 10 -B 5 "/v1/visualize" --type pyLength of output: 981
🏁 Script executed:
#!/bin/bash # Show the contents of getExplorationGraphUrl.ts sed -n '1,200p' cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts # Check if dataset.name is referenced anywhere in this file rg "dataset\.name" cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts || echo "No occurrences of dataset.name" # Find all call sites of getExplorationGraphUrl rg -n "getExplorationGraphUrl" -t tsLength of output: 1140
Include the dataset identifier in the fetch URL
The
datasetparameter (now typed as{ name: string }) is never used—fetch('/v1/visualize')always hits the same endpoint regardless of which dataset is passed in. Either:
- Remove the unused
datasetparameter, or- Update the call to include
dataset.name(ordataset.idif you revert to using the ID) in the URL or request body.Locations to update:
- cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts
- Signature and first
fetchcall- Any related tests or call sites assuming that the dataset is scoped
Example fix:
export default function getExplorationGraphUrl(dataset: { name: string }) { const url = `/v1/visualize?datasetName=${encodeURIComponent(dataset.name)}`; return fetch(url) … }🤖 Prompt for AI Agents
In cognee-frontend/src/modules/exploration/getExplorationGraphUrl.ts at line 3, the dataset parameter typed as { name: string } is not used in the fetch request URL, causing the function to always call the same endpoint regardless of the dataset. To fix this, update the fetch URL to include the dataset.name as a query parameter, for example by constructing the URL as `/v1/visualize?datasetName=${encodeURIComponent(dataset.name)}` before calling fetch. Also, review and update any related tests or call sites to reflect this change in how the dataset is scoped.cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
3-3: 🛠️ Refactor suggestion
Add validation for missing dataset identifiers
The function now accepts optional
idandnameproperties, using a fallback mechanism to support both identifier types. However, if neither property is present, this could lead toundefinedbeing included in the datasets array.-export default function cognifyDataset(dataset: { id?: string, name?: string }) { +export default function cognifyDataset(dataset: { id?: string, name?: string }) { + if (!dataset.id && !dataset.name) { + return Promise.reject(new Error('Dataset must have either id or name property')); + } + return fetch('/v1/cognify', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ datasets: [dataset.id || dataset.name], }), }).then((response) => response.json()); }Also applies to: 10-10
🤖 Prompt for AI Agents
In cognee-frontend/src/modules/datasets/cognifyDataset.ts at lines 3 and 10, the function accepts optional id and name properties but does not handle the case where both are missing, which can cause undefined values in the datasets array. Add validation to check if either id or name is present before including the dataset in the array, and handle the case where both are missing by either throwing an error or skipping the dataset to prevent undefined entries.cognee/base_config.py (1)
5-11: 💡 Verification agent
🧩 Analysis chain
Updated to use new Observer enum
The default monitoring tool configuration now uses the new centralized
Observerenum instead of the removedMonitoringToolenum. This is consistent with the observability refactoring.
🏁 Script executed:
#!/bin/bash # Verify that all references to MonitoringTool have been updated to Observer echo "Checking for any remaining references to MonitoringTool..." rg -p "MonitoringTool" --type py echo "\nChecking for imports of get_observe function..." rg -p "from cognee.modules.observability.get_observe import get_observe" --type pyLength of output: 690
Ensure all observability references are updated to use the new
ObserverenumThe change in
cognee/base_config.pyto default toObserver.LANGFUSEis correct, but there are still lingering imports of the oldget_observefunction that need to be replaced with the centralizedObserverenum interface.Files requiring updates:
- cognee/api/v1/cognify/code_graph_pipeline.py (line 6)
- cognee/infrastructure/llm/openai/adapter.py (line 18)
- cognee/infrastructure/llm/gemini/adapter.py (line 7)
For each, replace the outdated import and usages, for example:
-from cognee.modules.observability.get_observe import get_observe +from cognee.modules.observability.observers import ObserverThen update any calls like
get_observe(...)to use the appropriateObserverenum member (e.g.Observer.LANGFUSE) or the new observer-instantiation API.Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/code_graph_pipeline.py at line 6, cognee/infrastructure/llm/openai/adapter.py at line 18, and cognee/infrastructure/llm/gemini/adapter.py at line 7, replace the outdated import of get_observe with an import of the new centralized Observer enum from cognee.modules.observability.observers. Then update all usages of get_observe(...) to use the appropriate Observer enum member directly, such as Observer.LANGFUSE, or adapt to the new observer-instantiation API as required by the refactoring.cognee-frontend/src/app/page.tsx (1)
63-69: 🛠️ Refactor suggestion
Consider adding null check for dataset lookup
The function has a potential null reference exception. When accessing
dataset!.idanddataset!.name, there's an assumption thatdatasetexists, but ifselectedDatasetdoesn't match any dataset ID, this will cause a runtime error.const onCognify = useCallback(() => { const dataset = datasets.find((dataset) => dataset.id === selectedDataset); + if (!dataset) { + showNotification("No dataset selected for cognification.", 5000); + return; + } return onDatasetCognify({ - id: dataset!.id, - name: dataset!.name, + id: dataset.id, + name: dataset.name, }); -}, [datasets, onDatasetCognify, selectedDataset]); +}, [datasets, onDatasetCognify, selectedDataset, showNotification]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.const onCognify = useCallback(() => { const dataset = datasets.find((dataset) => dataset.id === selectedDataset); if (!dataset) { showNotification("No dataset selected for cognification.", 5000); return; } return onDatasetCognify({ id: dataset.id, name: dataset.name, }); }, [datasets, onDatasetCognify, selectedDataset, showNotification]);🤖 Prompt for AI Agents
In cognee-frontend/src/app/page.tsx around lines 63 to 69, the onCognify function assumes the dataset exists without checking, which can cause a runtime error if no matching dataset is found. Add a null check after finding the dataset to verify it is not undefined before accessing its properties. If the dataset is null, handle the case gracefully, such as returning early or showing an error, to prevent exceptions.entrypoint.sh (1)
36-37: 💡 Verification agent
🧩 Analysis chain
Removal of
execfrom process launching commandsRemoving
execfrom these commands means the shell script process continues running after starting the server. In Docker environments, this can affect signal handling as the shell process (not Gunicorn) becomes PID 1.Verify this change won't affect how the container handles termination signals:
Also applies to: 38-39, 41-42
🏁 Script executed:
#!/bin/bash # Check if this is intended to run in Docker/container environment grep -r "ENTRYPOINT" --include="Dockerfile*" .Length of output: 203
Ensure proper signal handling by restoring
execin the entrypoint
Withoutexec, the shell remains PID 1 in your Docker containers and won’t forward SIGTERM/SIGINT to Gunicorn/Uvicorn. Addexecback before your debugpy/Gunicorn launch commands:Locations to update in
entrypoint.sh:
- Lines 36–37
- Lines 38–39
- Lines 41–42
Suggested diff for each block (example for lines 36–37):
- debugpy --wait-for-client --listen 0.0.0.0:5678 -m gunicorn \ - -w 3 -k uvicorn.workers.UvicornWorker -t 30000 \ - --bind=0.0.0.0:8000 --log-level debug --reload cognee.api.client:app + exec debugpy --wait-for-client --listen 0.0.0.0:5678 -m gunicorn \ + -w 3 -k uvicorn.workers.UvicornWorker -t 30000 \ + --bind=0.0.0.0:8000 --log-level debug --reload cognee.api.client:appApply the same prefix in the other invocation blocks so that Gunicorn/Uvicorn replaces the shell process and receives container signals directly.
🤖 Prompt for AI Agents
In entrypoint.sh around lines 36 to 37, the launch command for debugpy and Gunicorn is missing the exec prefix, causing the shell script to remain as PID 1 and not forward termination signals properly in Docker. To fix this, add exec before the debugpy command so that Gunicorn/Uvicorn replaces the shell process and receives signals directly. Apply the same exec prefix to the other similar command blocks at lines 38-39 and 41-42.cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (1)
36-36: 💡 Verification agent
🧩 Analysis chain
Rename component to avoid shadowing global DataView interface
This component shadows the global
DataViewinterface which could lead to confusion.
🏁 Script executed:
#!/bin/bash # Check for DataView in browser globals cat << EOF | node - console.log("Global DataView exists:", typeof DataView !== 'undefined'); console.log("DataView description:", DataView ? DataView.toString() : "not available"); EOFLength of output: 103
Rename
DataViewcomponent to avoid shadowing the globalDataView
The browser provides a nativeDataViewconstructor (seen viatypeof DataView === 'function'), so keeping the same name for your React component can lead to confusion or unintentional references.Please update the component and its file (and all imports/usages) accordingly. For example:
- Rename in
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx- export default function DataView(props: DataViewProps) { + export default function DataViewComponent(props: DataViewProps) { // … }- Rename the file to
DataViewComponent.tsx(or a preferred alternative)- Update all imports/exports/JSX tags to match the new name
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) { +export default function DataViewComponent({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {🧰 Tools
🪛 Biome (1.9.4)
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🤖 Prompt for AI Agents
In cognee-frontend/src/modules/ingestion/DataView/DataView.tsx at line 36, rename the React component from DataView to a different name such as DataViewComponent to avoid shadowing the global DataView interface. Also rename the file to DataViewComponent.tsx and update all imports, exports, and JSX usages throughout the codebase to use the new component name consistently.examples/database_examples/neo4j_example.py (4)
62-64: 🛠️ Refactor suggestion
Add error handling for asynchronous operations
The script lacks error handling for asynchronous operations, which could lead to unhandled exceptions and unclear error messages when something goes wrong.
- # Process the added document to extract knowledge - await cognee.cognify([dataset_name]) + # Process the added document to extract knowledge + try: + await cognee.cognify([dataset_name]) + except Exception as e: + print(f"Error processing dataset: {e}") + raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Process the added document to extract knowledge try: await cognee.cognify([dataset_name]) except Exception as e: print(f"Error processing dataset: {e}") raise🤖 Prompt for AI Agents
In examples/database_examples/neo4j_example.py around lines 62 to 64, the asynchronous call to cognee.cognify lacks error handling, which may cause unhandled exceptions. Wrap the await cognee.cognify([dataset_name]) call in a try-except block to catch potential exceptions, and log or handle the errors appropriately to ensure clear error messages and prevent the script from crashing unexpectedly.
67-70: 🛠️ Refactor suggestion
Add error handling for search operations
Search operations may fail due to various reasons (like connectivity issues or malformed queries) but lack proper error handling.
- insights_results = await cognee.search(query_type=SearchType.INSIGHTS, query_text="Neo4j") - print("\nInsights about Neo4j:") - for result in insights_results: - print(f"- {result}") + try: + insights_results = await cognee.search(query_type=SearchType.INSIGHTS, query_text="Neo4j") + print("\nInsights about Neo4j:") + for result in insights_results: + print(f"- {result}") + except Exception as e: + print(f"Error searching for insights: {e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.try: insights_results = await cognee.search(query_type=SearchType.INSIGHTS, query_text="Neo4j") print("\nInsights about Neo4j:") for result in insights_results: print(f"- {result}") except Exception as e: print(f"Error searching for insights: {e}")🤖 Prompt for AI Agents
In examples/database_examples/neo4j_example.py around lines 67 to 70, the search operation lacks error handling which can cause the program to crash on failures like connectivity issues or malformed queries. Wrap the search call and subsequent processing in a try-except block to catch exceptions, log or print an appropriate error message, and prevent the program from crashing unexpectedly.
44-46: 🛠️ Refactor suggestion
Make data pruning truly optional
These operations are marked as "optional" in the comment but are always executed, which could lead to unexpected data loss if users run the example against a database containing important data.
- # Clean any existing data (optional) - await cognee.prune.prune_data() - await cognee.prune.prune_system(metadata=True) + # Clean any existing data (optional) + # Set to True to clean existing data before running the example + should_clean_data = False + if should_clean_data: + await cognee.prune.prune_data() + await cognee.prune.prune_system(metadata=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Clean any existing data (optional) # Set to True to clean existing data before running the example should_clean_data = False if should_clean_data: await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True)🤖 Prompt for AI Agents
In examples/database_examples/neo4j_example.py around lines 44 to 46, the data pruning calls are always executed despite being labeled as optional, risking unintended data loss. Modify the code to execute these pruning operations only if a specific flag or parameter is set, such as adding a conditional check or a command-line argument that controls whether pruning should occur, ensuring the example does not delete data unless explicitly requested by the user.
20-23: 🛠️ Refactor suggestion
Consider validating environment variables
The script retrieves Neo4j credentials from environment variables without validating their existence. If these variables are missing, the database connection will fail later in execution.
- neo4j_url = os.getenv("GRAPH_DATABASE_URL") - neo4j_user = os.getenv("GRAPH_DATABASE_USERNAME") - neo4j_pass = os.getenv("GRAPH_DATABASE_PASSWORD") + neo4j_url = os.getenv("GRAPH_DATABASE_URL") + neo4j_user = os.getenv("GRAPH_DATABASE_USERNAME") + neo4j_pass = os.getenv("GRAPH_DATABASE_PASSWORD") + + # Validate required environment variables + if not all([neo4j_url, neo4j_user, neo4j_pass]): + missing = [var for var, val in + zip(["GRAPH_DATABASE_URL", "GRAPH_DATABASE_USERNAME", "GRAPH_DATABASE_PASSWORD"], + [neo4j_url, neo4j_user, neo4j_pass]) if not val] + raise ValueError(f"Missing required environment variables: {', '.join(missing)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Set up Neo4j credentials in .env file and get the values from environment variables neo4j_url = os.getenv("GRAPH_DATABASE_URL") neo4j_user = os.getenv("GRAPH_DATABASE_USERNAME") neo4j_pass = os.getenv("GRAPH_DATABASE_PASSWORD") # Validate required environment variables if not all([neo4j_url, neo4j_user, neo4j_pass]): missing = [var for var, val in zip( ["GRAPH_DATABASE_URL", "GRAPH_DATABASE_USERNAME", "GRAPH_DATABASE_PASSWORD"], [neo4j_url, neo4j_user, neo4j_pass] ) if not val] raise ValueError(f"Missing required environment variables: {', '.join(missing)}")🤖 Prompt for AI Agents
In examples/database_examples/neo4j_example.py around lines 20 to 23, the Neo4j credentials are fetched from environment variables without checking if they are set. Add validation to verify that neo4j_url, neo4j_user, and neo4j_pass are not None or empty after retrieval. If any are missing, raise an informative error or exit gracefully to prevent connection failures later in the script.cognee/tasks/temporal_awareness/index_graphiti_objects.py (2)
34-34: 💡 Verification agent
🧩 Analysis chain
Updated method for retrieving graph data
The function now uses
get_graph_data()instead ofget_model_independent_graph_data(). This change simplifies the API, but ensure all callers are updated accordingly.
🏁 Script executed:
#!/bin/bash # Check for any remaining calls to get_model_independent_graph_data() rg "get_model_independent_graph_data\(\)" --type pyLength of output: 311
Action Required: Update remaining
get_model_independent_graph_data()callsThe search shows two adapters still invoking the old method. Please update them to use
get_graph_data()for consistency:
- cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
- cognee/infrastructure/databases/graph/kuzu/adapter.py
Suggested change in each file:
- nodes, edges = await self.get_model_independent_graph_data() + nodes, edges = await self.get_graph_data()Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/neo4j_driver/adapter.py and cognee/infrastructure/databases/graph/kuzu/adapter.py, locate all calls to get_model_independent_graph_data() and replace them with get_graph_data() to maintain consistency with the updated API. Ensure that any parameters or usage align with the new method's signature and behavior.
64-65: 💡 Verification agent
🧩 Analysis chain
Updated edge data structure access
The edge data structure access has been updated to use a different index for retrieving the edge relationship name.
🏁 Script executed:
#!/bin/bash # Check the structure of edges_data to verify the correct index is being used # Look at where get_graph_data is defined to understand the returned data structure rg -A 5 "def get_graph_data" --type pyLength of output: 3583
🏁 Script executed:
#!/bin/bash # Locate and display the code around the edge indexing in the target file FILE_PATH=$(fd -I -t f -p "index_graphiti_objects.py") echo "File path: $FILE_PATH" sed -n '1,200p' "$FILE_PATH"Length of output: 3898
Fix edge relationship_name extraction for NetworkX adapter
The current code unconditionally uses
edge[2]to get the relationship name, which only works for DB adapters (Kuzu/Neo4j/Memgraph) where the tuple is(src, tgt, relationship_name, props). NetworkX’sedges(data=True, keys=True)yields(src, tgt, key, props), soedge[2]is the internal key, not the actualrelationship_namestored inprops.Please update in
cognee/tasks/temporal_awareness/index_graphiti_objects.py(around lines 64–65):
- Replace:
edge_types = Counter( edge[2] # The edge key (relationship name) is at index 2 for edge in edges_data )- With something like:
edge_types = Counter( # DB adapters return a string at edge[2], otherwise fall back to props['relationship_name'] edge[2] if isinstance(edge[2], str) else edge[3]["relationship_name"] for edge in edges_data )This ensures both tuple shapes yield the correct
relationship_name.🤖 Prompt for AI Agents
In cognee/tasks/temporal_awareness/index_graphiti_objects.py around lines 64 to 65, the code currently extracts the edge relationship name using edge[2], which works only for DB adapters but not for NetworkX where edge[2] is an internal key and the actual relationship name is in edge[3]["relationship_name"]. Update the code to check if edge[2] is a string and use it directly; otherwise, extract the relationship name from edge[3]["relationship_name"]. This will handle both tuple formats correctly.examples/database_examples/weaviate_example.py (3)
42-44: 🛠️ Refactor suggestion
Make data pruning truly optional
These operations are marked as "optional" in the comment but are always executed, which could lead to unexpected data loss if users run the example against a database containing important data.
- # Clean any existing data (optional) - await cognee.prune.prune_data() - await cognee.prune.prune_system(metadata=True) + # Clean any existing data (optional) + # Set to True to clean existing data before running the example + should_clean_data = False + if should_clean_data: + await cognee.prune.prune_data() + await cognee.prune.prune_system(metadata=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Clean any existing data (optional) # Set to True to clean existing data before running the example should_clean_data = False if should_clean_data: await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True)🤖 Prompt for AI Agents
In examples/database_examples/weaviate_example.py around lines 42 to 44, the data pruning calls are always executed despite being marked as optional, risking unintended data loss. Modify the code to conditionally execute these pruning operations based on a user-configurable flag or parameter, such as a boolean variable named 'enable_pruning'. This way, pruning only occurs when explicitly enabled, preserving existing data by default.
20-23: 🛠️ Refactor suggestion
Consider validating environment variables
The script retrieves Weaviate credentials from environment variables without validating their existence. If these variables are missing, the database connection will fail later.
- weaviate_url = os.getenv("VECTOR_DB_URL") - weaviate_key = os.getenv("VECTOR_DB_KEY") + weaviate_url = os.getenv("VECTOR_DB_URL") + weaviate_key = os.getenv("VECTOR_DB_KEY") + + # Validate required environment variables + if not weaviate_url: + raise ValueError("Missing required environment variable: VECTOR_DB_URL")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Set up Weaviate credentials in .env file and get the values from environment variables weaviate_url = os.getenv("VECTOR_DB_URL") weaviate_key = os.getenv("VECTOR_DB_KEY") # Validate required environment variables if not weaviate_url: raise ValueError("Missing required environment variable: VECTOR_DB_URL")🤖 Prompt for AI Agents
In examples/database_examples/weaviate_example.py around lines 20 to 23, the environment variables VECTOR_DB_URL and VECTOR_DB_KEY are retrieved without validation. Add checks to verify these variables are not None or empty after retrieval, and raise a clear error or exit with a message if they are missing to prevent connection failures later.
57-84: 🛠️ Refactor suggestion
Add error handling for core operations
The script lacks error handling for asynchronous operations (cognify and search), which could lead to unclear failure modes.
Add try-except blocks around key operations:
- # Process the added document to extract knowledge - await cognee.cognify([dataset_name]) + # Process the added document to extract knowledge + try: + await cognee.cognify([dataset_name]) + except Exception as e: + print(f"Error processing dataset: {e}") + raiseSimilar error handling should be added to each search operation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Add the sample text to the dataset await cognee.add([sample_text], dataset_name) # Process the added document to extract knowledge try: await cognee.cognify([dataset_name]) except Exception as e: print(f"Error processing dataset: {e}") raise # Now let's perform some searches # 1. Search for insights related to "Weaviate" insights_results = await cognee.search(query_type=SearchType.INSIGHTS, query_text="Weaviate") print("\nInsights about Weaviate:") for result in insights_results: print(f"- {result}") # 2. Search for text chunks related to "vector search" chunks_results = await cognee.search( query_type=SearchType.CHUNKS, query_text="vector search", datasets=[dataset_name] ) print("\nChunks about vector search:") for result in chunks_results: print(f"- {result}") # 3. Get graph completion related to databases graph_completion_results = await cognee.search( query_type=SearchType.GRAPH_COMPLETION, query_text="database" ) print("\nGraph completion for databases:") for result in graph_completion_results: print(f"- {result}")🤖 Prompt for AI Agents
In examples/database_examples/weaviate_example.py between lines 57 and 84, the asynchronous calls to cognee.cognify and cognee.search lack error handling, which can cause unclear failures. Wrap each await call to cognee.cognify and each cognee.search invocation in try-except blocks to catch exceptions, log or print meaningful error messages, and prevent the script from crashing silently.examples/database_examples/chromadb_example.py (2)
37-39: 🛠️ Refactor suggestion
Make data pruning truly optional
These operations are marked as "optional" in the comment but are always executed, which could lead to unexpected data loss if users run the example against a database containing important data.
- # Clean any existing data (optional) - await cognee.prune.prune_data() - await cognee.prune.prune_system(metadata=True) + # Clean any existing data (optional) + # Set to True to clean existing data before running the example + should_clean_data = False + if should_clean_data: + await cognee.prune.prune_data() + await cognee.prune.prune_system(metadata=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Clean any existing data (optional) # Set to True to clean existing data before running the example should_clean_data = False if should_clean_data: await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True)🤖 Prompt for AI Agents
In examples/database_examples/chromadb_example.py around lines 37 to 39, the data pruning calls are always executed despite being marked as optional, risking unintended data loss. Modify the code to conditionally execute these pruning operations based on a user-configurable flag or parameter, so pruning only occurs when explicitly requested. Update the comment to clarify this behavior.
52-79: 🛠️ Refactor suggestion
Add error handling for core operations
The script lacks error handling for asynchronous operations (add, cognify, and search), which could lead to unclear failure modes and difficult debugging.
- # Add the sample text to the dataset - await cognee.add([sample_text], dataset_name) + # Add the sample text to the dataset + try: + await cognee.add([sample_text], dataset_name) + except Exception as e: + print(f"Error adding sample text: {e}") + raiseSimilar error handling should be added to the cognify and search operations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Add the sample text to the dataset try: await cognee.add([sample_text], dataset_name) except Exception as e: print(f"Error adding sample text: {e}") raise # Process the added document to extract knowledge await cognee.cognify([dataset_name]) # Now let's perform some searches # 1. Search for insights related to "ChromaDB" insights_results = await cognee.search(query_type=SearchType.INSIGHTS, query_text="ChromaDB") print("\nInsights about ChromaDB:") for result in insights_results: print(f"- {result}") # 2. Search for text chunks related to "vector search" chunks_results = await cognee.search( query_type=SearchType.CHUNKS, query_text="vector search", datasets=[dataset_name] ) print("\nChunks about vector search:") for result in chunks_results: print(f"- {result}") # 3. Get graph completion related to databases graph_completion_results = await cognee.search( query_type=SearchType.GRAPH_COMPLETION, query_text="database" ) print("\nGraph completion for databases:") for result in graph_completion_results: print(f"- {result}")🤖 Prompt for AI Agents
In examples/database_examples/chromadb_example.py around lines 52 to 79, the asynchronous operations add, cognify, and search lack error handling, which can cause unclear failures. Wrap each await call for add, cognify, and each search operation in try-except blocks to catch exceptions. Log or print meaningful error messages in the except blocks to aid debugging and ensure the script handles failures gracefully.examples/database_examples/milvus_example.py (1)
29-37: 💡 Verification agent
🧩 Analysis chain
Verify that a local file-path is really accepted by the Milvus adapter
vector_db_urlis set to<…>/milvus.db, but Milvus typically expects an HTTP / gRPC endpoint (tcp://host:port). If the Cognee Milvus adapter falls back to “local-disk SQlite” for demo purposes that’s fine; otherwise this will fail at connection time.Run:
…to confirm that a path value is supported. Update the example or add an explanatory comment if the adapter really needs a network URL.
🏁 Script executed:
#!/bin/bash # Search whether the Milvus adapter treats file paths specially rg -n '"milvus"' --type py | rg 'vector_db_url'Length of output: 47
🏁 Script executed:
#!/bin/bash # Locate the setter for vector DB configuration rg -n 'def set_vector_db_config' --type py # Find all occurrences of vector_db_url in Python source rg -n 'vector_db_url' --type py # Find all occurrences of vector_db_provider in Python source rg -n 'vector_db_provider' --type py # Search for any Milvus adapter class definitions rg -n 'class .*Milvus' --type py # Case-insensitive search for "milvus" to catch any usage rg -n -i 'milvus' --type pyLength of output: 13041
🏁 Script executed:
#!/bin/bash # Inspect MilvusAdapter implementation around initialization and client creation sed -n '1,200p' cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py # Search for any fallback logic for local file paths or SQLite usage rg -n 'sqlite' -n 'file' cognee/infrastructure/databases/vector/milvus/MilvusAdapter.pyLength of output: 7218
Clarify Milvus URL Format for
vector_db_urlThe Milvus adapter passes
vector_db_urldirectly topymilvus.MilvusClient(uri=…)and does not fall back to a local SQLite file. Using a filesystem path will result in a connection error at runtime. Please update the example to use a proper Milvus endpoint (e.g. gRPC URL) or document that this must be replaced.• File:
examples/database_examples/milvus_example.py
Line 34:- "vector_db_url": local_milvus_db_path, # Enter Milvus Endpoint if exist + "vector_db_url": "tcp://localhost:19530", # Milvus gRPC endpoint (replace as needed)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.local_milvus_db_path = os.path.join(cognee_directory_path, "databases", "milvus.db") # Configure Milvus as the vector database provider cognee.config.set_vector_db_config( { - "vector_db_url": local_milvus_db_path, # Enter Milvus Endpoint if exist + "vector_db_url": "tcp://localhost:19530", # Milvus gRPC endpoint (replace as needed) "vector_db_key": "", # Enter Token "vector_db_provider": "milvus", # Specify Milvus as provider }🤖 Prompt for AI Agents
In examples/database_examples/milvus_example.py around lines 29 to 37, the vector_db_url is set to a local file path which is incorrect because the Milvus adapter expects a network endpoint URL like a gRPC address. Update the example to use a proper Milvus endpoint URL (e.g., tcp://host:port) instead of a file path, or add a clear comment explaining that users must replace the placeholder with their actual Milvus server URL to avoid connection errors.cognee/tasks/ingestion/migrate_relational_database.py (2)
110-118: 🛠️ Refactor suggestion
Potential duplicate
ColumnValuenodes across rows
column_node_id = f"{table_name}:{key}:{value}"is identical for every row that shares the same value, so multiple rows referencing"status:active"will clash.
Either:
- include the row’s PK in the ID, or
- de-duplicate intentionally and add many-to-one edges.
Clarify the intended semantics and update the ID strategy accordingly.
🤖 Prompt for AI Agents
In cognee/tasks/ingestion/migrate_relational_database.py around lines 110 to 118, the current ID generation for ColumnValue nodes uses only table name, column key, and value, causing collisions for identical values across different rows. To fix this, modify the column_node_id to include the row's primary key to ensure uniqueness per row, or if intentional deduplication is desired, implement logic to create many-to-one edges from rows to shared ColumnValue nodes. Clarify the intended behavior and update the ID generation and node mapping accordingly.
98-109:
⚠️ Potential issueLogical bugs in column-migration filter
if key is primary_key_coluses theisoperator, which checks identity, not equality.- The foreign-key exclusion list appends
fk["ref_column"], but the actual FK column in this table isfk["column"].
As written, FK columns are still turned intoColumnValuenodes, while unrelated columns in the referenced table are skipped.-foreign_keys.append(fk["ref_column"]) +foreign_keys.append(fk["column"]) -if key is primary_key_col or key in foreign_keys: +if key == primary_key_col or key in foreign_keys:These fixes prevent spurious
ColumnValuenodes and ensure correct filtering.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Migrate data stored in columns of table rows if migrate_column_data: # Get foreign key columns to filter them out from column migration foreign_keys = [] for fk in details.get("foreign_keys", []): - foreign_keys.append(fk["ref_column"]) + foreign_keys.append(fk["column"]) for key, value in row_properties.items(): # Skip mapping primary key information to itself and mapping of foreign key information (as it will be mapped bellow) - if key is primary_key_col or key in foreign_keys: + if key == primary_key_col or key in foreign_keys: continue🤖 Prompt for AI Agents
In cognee/tasks/ingestion/migrate_relational_database.py around lines 98 to 109, replace the identity check 'if key is primary_key_col' with an equality check 'if key == primary_key_col' to correctly compare column names. Also, change the foreign key columns list to append 'fk["column"]' instead of 'fk["ref_column"]' so that the filter excludes the actual foreign key columns in the current table, preventing incorrect inclusion of referenced table columns.cognee/api/v1/responses/routers/default_tools.py (1)
16-23:
⚠️ Potential issue
search_typeenum is out of sync withSearchType– will cause validation errors
SEMANTICis not incognee.modules.search.types.SearchType, while valid options such asCHUNKS,RAG_COMPLETION,GRAPH_SUMMARY_COMPLETION, andCYPHERare missing. Down-stream validation (e.g., Pydantic/OpenAPI) will reject legitimate calls.-"INSIGHTS", "CODE", "GRAPH_COMPLETION", "SEMANTIC", "NATURAL_LANGUAGE" +"INSIGHTS", "CHUNKS", "CODE", "GRAPH_COMPLETION", "GRAPH_SUMMARY_COMPLETION", +"RAG_COMPLETION", "CYPHER", "NATURAL_LANGUAGE"Synchronise the list or, better, import the enum values programmatically to avoid future drift.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/default_tools.py around lines 16 to 23, the "enum" list for "search_type" is not synchronized with the SearchType enum in cognee.modules.search.types. Replace the hardcoded enum list with a dynamic import of the SearchType enum values to ensure they always match and prevent validation errors. This can be done by importing SearchType and programmatically extracting its members for the enum definition.cognee/api/v1/responses/routers/get_responses_router.py (6)
140-144:
⚠️ Potential issueToken usage mapping mismatch
The returned dict uses
prompt_tokens/completion_tokens. Your conversion expectsinput_tokens/output_tokens. Adjust field names to avoid zeroed statistics.- prompt_tokens=usage.get("input_tokens", 0), - completion_tokens=usage.get("output_tokens", 0), + prompt_tokens=usage.get("prompt_tokens", 0), + completion_tokens=usage.get("completion_tokens", 0),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.usage=ChatUsage( prompt_tokens=usage.get("prompt_tokens", 0), completion_tokens=usage.get("completion_tokens", 0), total_tokens=usage.get("total_tokens", 0), ),🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 140 to 144, the code incorrectly maps usage dictionary keys 'input_tokens' and 'output_tokens' to ChatUsage fields prompt_tokens and completion_tokens. Update the keys to 'prompt_tokens' and 'completion_tokens' respectively to match the returned dictionary and avoid zeroed token statistics.
46-49:
⚠️ Potential issueMutable default argument – potential cross-request bleed-through
tools: Optional[List[Dict[str, Any]]] = DEFAULT_TOOLSstores a single list instance that will be shared and possibly mutated across requests. Default toNoneand assign inside the function:- tools: Optional[List[Dict[str, Any]]] = DEFAULT_TOOLS, + tools: Optional[List[Dict[str, Any]]] = None, ... - tools = tools or DEFAULT_TOOLS + tools = tools or DEFAULT_TOOLS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def get_responses_router( obj_id: str, query: str, - tools: Optional[List[Dict[str, Any]]] = DEFAULT_TOOLS, + tools: Optional[List[Dict[str, Any]]] = None, tool_choice: Any = "auto", temperature: float = 1.0, ) -> Dict[str, Any]: tools = tools or DEFAULT_TOOLS ...🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 46 to 49, the function parameter 'tools' uses a mutable default argument DEFAULT_TOOLS, which can cause shared state issues across requests. Change the default value of 'tools' to None, then inside the function check if 'tools' is None and if so, assign it to DEFAULT_TOOLS. This prevents unintended mutations of the default list across different calls.
54-56:
⚠️ Potential issueModel name is being silently overwritten
model = "gpt-4o"ignores the caller’s requested model. Either honourrequest.modelor reject unsupported models with a 400.- model = "gpt-4o" + if model not in {"gpt-4o"}: # extend as back-ends grow + raise HTTPException(status_code=400, detail=f"Unsupported model: {model}")Silently switching models is surprising for API consumers.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# TODO: Support other models (e.g. cognee-v1-openai-gpt-3.5-turbo, etc.) if model not in {"gpt-4o"}: # extend as back-ends grow raise HTTPException(status_code=400, detail=f"Unsupported model: {model}")🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 54 to 56, the code overwrites the model variable with a hardcoded value "gpt-4o", ignoring the model requested by the caller. To fix this, modify the code to check if the requested model in request.model is supported; if it is, use it, otherwise return a 400 error response rejecting unsupported models. This ensures the API respects the caller's input and avoids silently switching models.
91-109:
⚠️ Potential issueParsing tool calls from the wrong field
The OpenAI Responses API places tool/function calls in
tool_calls, not in the top-leveloutputlist. This loop will never run in production.- output = response.get("output", []) + output = response.get("tool_calls", [])Verify against
openai>=1.24.0docs.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Use the response ID from the API or generate a new one response_id = response.get("id", f"resp_{uuid.uuid4().hex}") # Check if there are function tool calls in the output output = response.get("tool_calls", []) processed_tool_calls = [] # Process any function tool calls from the output for item in output: if isinstance(item, dict) and item.get("type") == "function_call": # This is a tool call from the new format function_name = item.get("name", "") arguments_str = item.get("arguments", "{}") call_id = item.get("call_id", f"call_{uuid.uuid4().hex}") # Create a format the dispatcher can handle tool_call = { "id": call_id,🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 91 to 109, the code incorrectly attempts to parse function tool calls from the "output" field, but according to openai>=1.24.0 docs, tool calls are located in the "tool_calls" field. Update the code to extract and iterate over the "tool_calls" field from the response instead of "output" to correctly process function tool calls.
36-42: 🛠️ Refactor suggestion
Cache the OpenAI client instead of rebuilding it per request
_get_model_client()instantiatesopenai.AsyncOpenAIfor every incoming call.
Because the client object is stateless and thread-safe, create it once (e.g. at module import or viafunctools.cache) to avoid repeated HTTP-session/bootstrap overhead.-from cognee.infrastructure.llm.config import get_llm_config +from functools import cache +from cognee.infrastructure.llm.config import get_llm_config ... - def _get_model_client(): + @cache + def _get_model_client() -> openai.AsyncOpenAI:This shaves latency from each request.
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 36 to 42, the _get_model_client function creates a new openai.AsyncOpenAI client on every call, causing unnecessary overhead. To fix this, cache the client instance by either creating it once at module import or by decorating the function with functools.cache so that the same client is reused across calls, reducing latency and resource usage.
100-113: 🛠️ Refactor suggestion
Arguments string is never parsed – JSON stays a string
arguments_stris forwarded untouched to bothdispatch_functionand the response body.
Convert it to a dict to guarantee type-safe handling and prevent double-serialization mistakes:-import json ... - arguments_str = item.get("arguments", "{}") + import json + arguments_dict = json.loads(item.get("arguments", "{}")) ... - "function": {"name": function_name, "arguments": arguments_str}, + "function": {"name": function_name, "arguments": arguments_dict}, ... - function=FunctionCall(name=function_name, arguments=arguments_str), + function=FunctionCall(name=function_name, arguments=json.dumps(arguments_dict)),Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 100 to 113, the arguments string is currently passed as a raw JSON string without parsing, which risks type errors and double serialization. Fix this by parsing the arguments_str from JSON string to a dictionary using a JSON parsing method before assigning it to the tool_call's function arguments. This ensures type-safe handling and prevents serialization issues downstream.cognee/modules/pipelines/operations/pipeline.py (1)
63-92: 💡 Verification agent
🧩 Analysis chain
Consider persisting new dataset instances to the database.
The improved dataset handling logic now supports matching by name or ID and creating new instances for unmatched datasets. However, these new instances aren't saved to the database, only created in memory.
- dataset_instances.append( - Dataset( - id=await get_unique_dataset_id(dataset_name=dataset_name, user=user), - name=dataset_name, - owner_id=user.id, - ) - ) + # Create and persist new dataset instance + new_dataset = Dataset( + id=await get_unique_dataset_id(dataset_name=dataset_name, user=user), + name=dataset_name, + owner_id=user.id, + ) + # Save to database + db_engine = get_relational_engine() + async with db_engine.get_async_session() as session: + session.add(new_dataset) + await session.commit() + dataset_instances.append(new_dataset)Let's verify if these datasets should be persisted:
🏁 Script executed:
#!/bin/bash # Check if new datasets are persisted elsewhere in the codebase rg -A 5 "get_unique_dataset_id" --type pyLength of output: 5150
🏁 Script executed:
#!/bin/bash # Display the first 200 lines of pipeline.py for context sed -n '1,200p' cognee/modules/pipelines/operations/pipeline.pyLength of output: 6323
🏁 Script executed:
#!/bin/bash # Locate get_datasets function implementation rg -R "async def get_datasets" -A 30 --type pyLength of output: 453
🏁 Script executed:
#!/bin/bash # Locate get_datasets implementation rg "async def get_datasets" -A 30 --type pyLength of output: 8883
🏁 Script executed:
#!/bin/bash # Display implementation of create_dataset to understand persistence logic sed -n '1,200p' cognee/modules/data/methods/create_dataset.pyLength of output: 1189
Persist new datasets in the database in
cognee_pipeline
Newly createdDatasetinstances inpipeline.pyare only in-memory—they need to be saved via the existing persistence logic (create_dataset).Please update the block at cognee/modules/pipelines/operations/pipeline.py (lines ~80–92) to use
create_datasetinstead of rawDataset(...), for example:from cognee.modules.data.methods.get_unique_dataset_id import get_unique_dataset_id +from cognee.modules.data.methods.create_dataset import create_dataset +from cognee.infrastructure.databases.relational import get_relational_engine for dataset_name in datasets: is_dataset_found = False for existing_dataset in existing_datasets: if ( existing_dataset.name == dataset_name or str(existing_dataset.id) == dataset_name ): dataset_instances.append(existing_dataset) is_dataset_found = True break if not is_dataset_found: - dataset_instances.append( - Dataset( - id=await get_unique_dataset_id(dataset_name=dataset_name, user=user), - name=dataset_name, - owner_id=user.id, - ) - ) + # Create and persist new dataset if missing + db_engine = get_relational_engine() + async with db_engine.get_async_session() as session: + new_dataset = await create_dataset(dataset_name=dataset_name, user=user, session=session) + dataset_instances.append(new_dataset)
- This leverages the existing
create_datasetmethod which commits only when a new record is required.- Imports:
create_datasetfromcognee.modules.data.methods.create_datasetget_relational_enginefromcognee.infrastructure.databases.relational📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# add these imports at the top of cognee/modules/pipelines/operations/pipeline.py from cognee.modules.data.methods.get_unique_dataset_id import get_unique_dataset_id +from cognee.modules.data.methods.create_dataset import create_dataset +from cognee.infrastructure.databases.relational import get_relational_engine if not datasets: # Get datasets from database if none sent. datasets = existing_datasets else: # If dataset is already in database, use it, otherwise create a new instance. dataset_instances = [] for dataset_name in datasets: is_dataset_found = False for existing_dataset in existing_datasets: if ( existing_dataset.name == dataset_name or str(existing_dataset.id) == dataset_name ): dataset_instances.append(existing_dataset) is_dataset_found = True break if not is_dataset_found: - dataset_instances.append( - Dataset( - id=await get_unique_dataset_id(dataset_name=dataset_name, user=user), - name=dataset_name, - owner_id=user.id, - ) - ) + # Create and persist new dataset if missing + db_engine = get_relational_engine() + async with db_engine.get_async_session() as session: + new_dataset = await create_dataset( + dataset_name=dataset_name, + user=user, + session=session, + ) + dataset_instances.append(new_dataset) datasets = dataset_instances🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/pipeline.py around lines 63 to 92, the new Dataset instances created in memory are not persisted to the database. To fix this, replace the direct Dataset(...) instantiation with calls to the existing create_dataset function, which handles persistence and commits only when needed. Import create_dataset from cognee.modules.data.methods.create_dataset and get_relational_engine from cognee.infrastructure.databases.relational, then use create_dataset with the appropriate parameters to save new datasets instead of just creating them in memory.cognee/api/v1/responses/dispatch_function.py (4)
87-101: 🛠️ Refactor suggestion
Add error handling for cognify operations.
The function processes both
addandcognifyoperations but lacks error handling for potential failures.- if text: - await add(data=text, user=user) - - await cognify(user=user, ontology_file_path=ontology_file_path if ontology_file_path else None) - - return ( - "Text successfully converted into knowledge graph." - if text - else "Knowledge graph successfully updated with new information." - ) + try: + if text: + await add(data=text, user=user) + + await cognify(user=user, ontology_file_path=ontology_file_path if ontology_file_path else None) + + return ( + "Text successfully converted into knowledge graph." + if text + else "Knowledge graph successfully updated with new information." + ) + except Exception as e: + logger.error(f"Error during cognify operation: {e}") + return f"Error processing knowledge graph: {str(e)}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def handle_cognify(arguments: Dict[str, Any], user) -> str: """Handle cognify function call""" text = arguments.get("text") ontology_file_path = arguments.get("ontology_file_path") try: if text: await add(data=text, user=user) await cognify( user=user, ontology_file_path=ontology_file_path if ontology_file_path else None, ) return ( "Text successfully converted into knowledge graph." if text else "Knowledge graph successfully updated with new information." ) except Exception as e: logger.error(f"Error during cognify operation: {e}") return f"Error processing knowledge graph: {str(e)}"🤖 Prompt for AI Agents
In cognee/api/v1/responses/dispatch_function.py around lines 87 to 101, the handle_cognify function lacks error handling for the asynchronous add and cognify calls. Wrap the calls to add and cognify in try-except blocks to catch exceptions, log or handle errors appropriately, and ensure the function returns meaningful error messages or statuses if these operations fail.
104-107: 🛠️ Refactor suggestion
Add error handling for prune operation.
The function calls
prune()but doesn't handle potential errors that might occur during execution.- async def handle_prune(arguments: Dict[str, Any], user) -> str: - """Handle prune function call""" - await prune() - return "Memory has been pruned successfully." + async def handle_prune(arguments: Dict[str, Any], user) -> str: + """Handle prune function call""" + try: + await prune() + return "Memory has been pruned successfully." + except Exception as e: + logger.error(f"Error during prune operation: {e}") + return f"Error pruning memory: {str(e)}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def handle_prune(arguments: Dict[str, Any], user) -> str: """Handle prune function call""" try: await prune() return "Memory has been pruned successfully." except Exception as e: logger.error(f"Error during prune operation: {e}") return f"Error pruning memory: {str(e)}"🤖 Prompt for AI Agents
In cognee/api/v1/responses/dispatch_function.py around lines 104 to 107, the handle_prune function calls the async prune() without any error handling, which can cause unhandled exceptions. Wrap the await prune() call in a try-except block to catch potential exceptions, and return an appropriate error message or re-raise the error after logging it. This will ensure the function handles failures gracefully.
19-45: 🛠️ Refactor suggestion
Add error handling for JSON parsing failures.
The function parses JSON arguments but doesn't handle potential JSON parsing errors, which could occur with malformed input.
- arguments = json.loads(arguments_str) + try: + arguments = json.loads(arguments_str) + except json.JSONDecodeError as e: + logger.error(f"Failed to parse arguments: {e}") + return f"Error: Invalid function arguments format - {str(e)}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def dispatch_function(tool_call: Union[ToolCall, Dict[str, Any]]) -> str: """ Dispatches a function call to the appropriate Cognee function. """ if isinstance(tool_call, dict): function_data = tool_call.get("function", {}) function_name = function_data.get("name", "") arguments_str = function_data.get("arguments", "{}") else: function_name = tool_call.function.name arguments_str = tool_call.function.arguments try: arguments = json.loads(arguments_str) except json.JSONDecodeError as e: logger.error(f"Failed to parse arguments: {e}") return f"Error: Invalid function arguments format - {str(e)}" logger.info(f"Dispatching function: {function_name} with args: {arguments}") user = await get_default_user() if function_name == "search": return await handle_search(arguments, user) elif function_name == "cognify": return await handle_cognify(arguments, user) elif function_name == "prune": return await handle_prune(arguments, user) else: return f"Error: Unknown function {function_name}"🤖 Prompt for AI Agents
In cognee/api/v1/responses/dispatch_function.py around lines 19 to 45, the code parses JSON arguments without handling JSONDecodeError exceptions. To fix this, wrap the json.loads call in a try-except block catching JSONDecodeError, and return or raise a clear error message indicating malformed JSON input to prevent unhandled exceptions.
47-84: 🛠️ Refactor suggestion
Handle potential errors in search execution.
While the parameter validation is thorough, there's no try-except block to handle potential errors in the search operation itself.
- results = await search( - query_text=query, - query_type=query_type, - datasets=datasets, - user=user, - system_prompt_path=system_prompt_path, - top_k=top_k if isinstance(top_k, int) else 10, - ) - - return results + try: + results = await search( + query_text=query, + query_type=query_type, + datasets=datasets, + user=user, + system_prompt_path=system_prompt_path, + top_k=top_k if isinstance(top_k, int) else 10, + ) + return results + except Exception as e: + logger.error(f"Error during search operation: {e}") + return f"Error executing search: {str(e)}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def handle_search(arguments: Dict[str, Any], user) -> list: """Handle search function call""" search_tool = next((tool for tool in DEFAULT_TOOLS if tool["name"] == "search"), None) required_params = ( search_tool["parameters"].get("required", []) if search_tool else ["search_query"] ) query = arguments.get("search_query") if not query and "search_query" in required_params: return "Error: Missing required 'search_query' parameter" search_type_str = arguments.get("search_type", "GRAPH_COMPLETION") valid_search_types = ( search_tool["parameters"]["properties"]["search_type"]["enum"] if search_tool else ["INSIGHTS", "CODE", "GRAPH_COMPLETION", "SEMANTIC", "NATURAL_LANGUAGE"] ) if search_type_str not in valid_search_types: logger.warning(f"Invalid search_type: {search_type_str}, defaulting to GRAPH_COMPLETION") search_type_str = "GRAPH_COMPLETION" query_type = SearchType[search_type_str] top_k = arguments.get("top_k") datasets = arguments.get("datasets") system_prompt_path = arguments.get("system_prompt_path", "answer_simple_question.txt") try: results = await search( query_text=query, query_type=query_type, datasets=datasets, user=user, system_prompt_path=system_prompt_path, top_k=top_k if isinstance(top_k, int) else 10, ) return results except Exception as e: logger.error(f"Error during search operation: {e}") return f"Error executing search: {str(e)}"🤖 Prompt for AI Agents
In cognee/api/v1/responses/dispatch_function.py around lines 47 to 84, the handle_search function lacks error handling for the asynchronous search call. Wrap the search invocation and its related logic inside a try-except block to catch any exceptions that may occur during the search execution. In the except block, log the error appropriately and return a meaningful error message or response to ensure the function handles failures gracefully.cognee/infrastructure/databases/graph/networkx/adapter.py (1)
45-47: 🛠️ Refactor suggestion
Type-hint transition to
UUIDis incomplete – API now mixesUUIDandstr.
has_node& many new signatures correctly useUUID, but methods such ashas_edge,add_edge,add_edges, etc. still usestrparameters.
Down-stream callers now need to cast back-and-forth which is error-prone and defeats the goal of stronger typing.Action items:
- Update remaining public methods to accept
UUID(or genericHashable) consistently.- Add mypy/pyright to CI to prevent future drift.
-async def has_edge(self, from_node: str, to_node: str, edge_label: str) -> bool: +async def has_edge(self, from_node: UUID, to_node: UUID, edge_label: str) -> bool:Repeat for
add_edge,add_edges, etc.Also applies to: 62-64
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/networkx/adapter.py around lines 45 to 47 and 62 to 64, the type hints for methods like has_edge, add_edge, and add_edges still use str instead of UUID, causing inconsistent typing and forcing callers to cast types. Update these method signatures to accept UUID or a generic Hashable type consistently to align with has_node and improve type safety. Additionally, integrate mypy or pyright into the CI pipeline to enforce consistent type usage and prevent future type hint drift.cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
121-127: 🛠️ Refactor suggestion
Avoid double round-trips when retrieving a collection
get_collection()first callshas_collection(), which itself performslist_collections().
Immediately afterwards we callclient.get_collection(). For every lookup this results in two HTTP calls:
list_collections(to check existence)get_collectionWhen the caller only needs the collection handler, we can skip the first call and rely on the Chroma client to raise an error – or catch
NotFoundonce – eliminating ~50 % of the latency per call.-async def get_collection(self, collection_name: str) -> AsyncHttpClient: - if not await self.has_collection(collection_name): - raise CollectionNotFoundError(f"Collection '{collection_name}' not found!") - - client = await self.get_connection() - return await client.get_collection(collection_name) +async def get_collection(self, collection_name: str): + client = await self.get_connection() + try: + return await client.get_collection(collection_name) + except Exception as exc: # replace with the specific Chroma *NotFound once available + raise CollectionNotFoundError( + f"Collection '{collection_name}' not found!" + ) from exc📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def get_collection(self, collection_name: str): client = await self.get_connection() try: return await client.get_collection(collection_name) except Exception as exc: # replace with the specific Chroma *NotFound once available raise CollectionNotFoundError( f"Collection '{collection_name}' not found!" ) from exc🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py around lines 121 to 127, the get_collection method currently calls has_collection which triggers a list_collections HTTP request before calling client.get_collection, causing two HTTP calls per lookup. To fix this, remove the call to has_collection and directly call client.get_collection, then handle the potential NotFound error from the client to raise CollectionNotFoundError. This eliminates the redundant list_collections call and reduces latency by half.cognee-mcp/src/server.py (1)
87-93:
⚠️ Potential issueBackground task fails immediately: wrong keyword argument
cognify_task()expectsdata, but we passtext.
At runtime this raisesTypeError: cognify_task() got an unexpected keyword argument 'text', and the background job never starts.- asyncio.create_task( - cognify_task( - text=data, - graph_model_file=graph_model_file, - graph_model_name=graph_model_name, - ) - ) + asyncio.create_task( + cognify_task( + data=data, + graph_model_file=graph_model_file, + graph_model_name=graph_model_name, + ) + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.asyncio.create_task( cognify_task( - text=data, + data=data, graph_model_file=graph_model_file, graph_model_name=graph_model_name, ) )🤖 Prompt for AI Agents
In cognee-mcp/src/server.py around lines 87 to 93, the call to cognify_task uses the keyword argument 'text', but the function expects 'data'. To fix this, rename the keyword argument from 'text' to 'data' in the asyncio.create_task call so it matches the function signature and prevents the TypeError.cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (1)
101-103:
⚠️ Potential issueCreating data points fails if the collection does not exist
create_data_points()goes directly toget_collection(), which raisesCollectionNotFoundErrorwhen the collection is missing.
Other adapters (e.g.,ChromaDBAdapter) first callcreate_collection()to guarantee existence.- collection = await self.get_collection(collection_name) + await self.create_collection(collection_name) + collection = await self.get_collection(collection_name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.await self.create_collection(collection_name) collection = await self.get_collection(collection_name) try:🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py around lines 101 to 103, the create_data_points() method calls get_collection() directly, which raises CollectionNotFoundError if the collection does not exist. To fix this, modify create_data_points() to first check if the collection exists and call create_collection() to create it if missing before calling get_collection(). This ensures the collection is always present before adding data points.cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
205-211: 🛠️ Refactor suggestion
Row-by-row deletes are very slow
delete_data_pointsissues onecollection.deleteper ID which yieldsO(n)separate commits and network calls.If LanceDB supports a single
DELETE WHERE id IN [...]you can slash latency and lock contention:- # Delete one at a time to avoid commit conflicts - for data_point_id in data_point_ids: - await collection.delete(f"id = '{data_point_id}'") + # Bulk delete in one shot + if data_point_ids: + ids_tuple = tuple(map(str, data_point_ids)) + await collection.delete(f"id IN {ids_tuple}")Test on a staging dataset – on SSD the difference is multiple orders of magnitude for 10k+ IDs.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def delete_data_points(self, collection_name: str, data_point_ids: list[str]): collection = await self.get_collection(collection_name) # Bulk delete in one shot if data_point_ids: ids_tuple = tuple(map(str, data_point_ids)) await collection.delete(f"id IN {ids_tuple}")🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py around lines 205 to 211, the delete_data_points method deletes data points one by one, causing slow performance due to multiple commits and network calls. To fix this, modify the method to perform a single delete operation using a query with "id IN [...]" that includes all data_point_ids at once, reducing latency and lock contention significantly.cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (7)
120-124:
⚠️ Potential issueMalformed
DELETEpattern – extra colon & bracesMATCH (node: {{id: $node_id}}) DETACH DELETE node
- Double braces render invalid syntax.
- The colon implies a label before the map pattern.
Correct form:
-MATCH (node: {{id: $node_id}}) DETACH DELETE node +MATCH (node {id: $node_id}) DETACH DELETE node🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 120 to 124, the Cypher query syntax is incorrect due to double braces and misuse of the colon before the property map. Replace the query string with correct Cypher syntax by removing the colon and double braces, using parentheses with a property map like MATCH (node {id: $node_id}) DETACH DELETE node to properly match the node by its id property.
82-88:
⚠️ Potential issueSame parameterised-label issue in bulk node insertion
ON CREATE SET n:node.labelrepeats the forbidden pattern.You can UNWIND and then
SET n:viaCASEor APOC, or run separate UNWIND per label group.🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 82 to 88, the Cypher query uses `ON CREATE SET n:node.label` which incorrectly tries to parameterize labels. To fix this, avoid setting labels directly from parameters in the query. Instead, restructure the query to either use a CASE statement or APOC procedures to conditionally set labels, or group nodes by label and run separate UNWIND queries per label group to apply labels correctly without parameterization.
415-418:
⚠️ Potential issue
{id}/{edge_label}will raiseNameErrorin f-stringMATCH (node:`{id}`)<-[r:{edge_label}]-(successor)
idis undefined in Python scope, so the adapter crashes before hitting the DB.Remove the
fprefix and restructure as parameterised Cypher:- query = f""" + query = """ UNWIND $node_ids AS nid MATCH (node)<-[r]-(successor) WHERE node.id = nid AND type(r) = $edge_label DELETE r """🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 415 to 418, the Cypher query uses f-string placeholders `{id}` and `{edge_label}` which cause a NameError because these variables are not defined in Python. To fix this, remove the f-string formatting and rewrite the query to use parameterized Cypher syntax, passing `node_ids` and `edge_label` as parameters to safely inject values without Python variable interpolation.
430-434: 🛠️ Refactor suggestion
Mutable default argument triggers Ruff B006
serialize_properties(self, properties=dict())shares a single dict between calls.- def serialize_properties(self, properties=dict()): + def serialize_properties(self, properties: Optional[dict] = None): serialized_properties = {} - for property_key, property_value in properties.items(): + for property_key, property_value in (properties or {}).items():Avoids surprising state bleed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def serialize_properties(self, properties: Optional[dict] = None): serialized_properties = {} for property_key, property_value in (properties or {}).items(): if isinstance(property_value, UUID): ...🧰 Tools
🪛 Ruff (0.8.2)
430-430: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 430 to 434, the method serialize_properties uses a mutable default argument properties=dict(), which can cause unexpected shared state between calls. Change the default value to None and inside the method, initialize properties to an empty dictionary if it is None to avoid state bleed.
67-72:
⚠️ Potential issueCypher does not allow parameterised labels – query will throw
node:$node_labelattempts to bind a label from a parameter, but Neo4j/Memgraph disallow this for security & plan-cache reasons.Replace with string interpolation (with validated input) or use
SET node:after the merge:- MERGE (node {id: $node_id}) - ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp() - ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp() + MERGE (node {id: $node_id}) + SET node += $properties, + node.updated_at = timestamp() + WITH node + CALL apoc.create.addLabels(id(node), [$node_label]) YIELD node // or string-interpolate safely + RETURN ID(node) AS internal_id, node.id AS nodeIdFailing to correct this will raise
SyntaxError: Parameters cannot be used for labels.Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 67 to 72, the Cypher query incorrectly uses a parameter for a label with node:$node_label, which is not allowed and causes a syntax error. To fix this, remove the parameterized label from the MERGE clause and instead apply the label using a separate SET statement after the MERGE, ensuring the label is inserted via validated string interpolation to avoid injection risks.
247-257:
⚠️ Potential issueRelationship tuple indexing fragile
result["r"][1]assumes the second item of the relationship tuple is always the relationship type.
In Neo4j the returnedRelationshipis an object, not a tuple, so indexing yieldsTypeError.Use
type(r)in Cypher orresult["r"].type:- (result["n"]["id"], result["m"]["id"], {"relationship_name": result["r"][1]}) + ( + result["n"]["id"], + result["m"]["id"], + {"relationship_name": result["r"].type} + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def get_edges(self, node_id: str): query = """ MATCH (n {id: $node_id})-[r]-(m) RETURN n, r, m """ results = await self.query(query, dict(node_id=node_id)) return [ ( result["n"]["id"], result["m"]["id"], {"relationship_name": result["r"].type} ) for result in results ]🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 247 to 257, the code incorrectly accesses the relationship type using result["r"][1], which causes a TypeError because the relationship is an object, not a tuple. Fix this by replacing result["r"][1] with result["r"].type or by modifying the Cypher query to return type(r) and accessing that instead, ensuring the relationship type is retrieved correctly without indexing errors.
401-405:
⚠️ Potential issue
MATCH (node {id: nid})inside UNWIND is invalidInside a UNWIND you must either use
WHERE node.id = nidor APOC.
Also the entire string is wrapped in an f-string but no Python variables are interpolated – drop thefto avoid accidental formatting.- query = f""" + query = """ UNWIND $node_ids AS nid - MATCH (node {id: nid})-[r]->(predecessor) - WHERE type(r) = $edge_label + MATCH (node)-[r]->(predecessor) + WHERE node.id = nid AND type(r) = $edge_label DELETE r """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.query = """ UNWIND $node_ids AS nid MATCH (node)-[r]->(predecessor) WHERE node.id = nid AND type(r) = $edge_label DELETE r """🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 401 to 405, the Cypher query uses an invalid pattern MATCH (node {id: nid}) inside UNWIND. Replace this with MATCH (node) WHERE node.id = nid to fix the syntax. Also, remove the unnecessary f-string prefix since no Python variables are interpolated in the query string to prevent accidental formatting.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.